How to learn Joining Tables for SQL Basics in Data Visualization Engineer for free

Why this matters

As a Data Visualization Engineer, you rarely plot a single table. You blend facts and dimensions: orders with customers, sessions with campaigns, tickets with agents. Joining tables is how you shape the dataset your charts depend on.

Build a customer 360 view by combining demographics, transactions, and support data.
Create dashboard-ready datasets with the right grain (e.g., daily revenue by product).
Avoid silent data errors like double-counting from many-to-many joins.

Who this is for

Beginners who know basic SELECTs and filters.
Analysts/BI and data viz practitioners moving beyond single-table queries.
Engineers standardizing datasets for dashboards.

Prerequisites

Basic SQL: SELECT, WHERE, GROUP BY, ORDER BY.
Know what a primary key and foreign key are.
Comfort reading small sample tables.

Concept explained simply

A JOIN combines rows from two tables when a condition matches (usually a shared key like customer_id). Think of it as lining up two spreadsheets by a matching column and gluing matching rows together.

Mental model

INNER JOIN: Keep only pairs that match on both sides. Intersection.
LEFT JOIN: Keep all rows from the left table; bring matches from the right; unmatched become NULL. Preserve the left.
RIGHT JOIN: Mirror of LEFT JOIN. Less common; prefer LEFT for readability.
FULL OUTER JOIN: Keep all rows from both sides; unmatched become NULL on the other side.
CROSS JOIN: All combinations (cartesian product). Rare; use with caution.

Tip: USING vs ON

USING (key) is shorthand when both tables have the same column name for the join key. It removes duplicate key columns in the result. ON gives you full control and is often clearer in complex joins.

Worked examples

1) INNER JOIN customers to orders

Goal: show each order with the customer name.

-- Tables
customers(customer_id, customer_name)
orders(order_id, customer_id, order_total)

SELECT o.order_id, c.customer_name, o.order_total
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;

Result: Only orders with a valid customer appear.

2) LEFT JOIN to preserve all customers

Goal: list all customers and their last order total, even if they never ordered.

SELECT c.customer_id, c.customer_name, o.order_total
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
ORDER BY c.customer_id;

Customers with no orders will have NULL order_total.

3) Many-to-many join trap (double counting) and fix

Goal: revenue by product. Facts are at order_item grain; products is at product grain. If you join orders to order_items and then to products, summing totals from orders will overcount. Fix: aggregate first at the desired grain.

-- Wrong: multiplies order_total by number of items in the order
SELECT p.product_category, SUM(o.order_total) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.product_category;

-- Correct: sum at item grain (or pre-aggregate), not order_total
SELECT p.product_category, SUM(oi.item_total) AS revenue
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.product_category;

Why this works

The correct query sums item_total which matches the grain of order_items. Alternatively, pre-aggregate order_items by order_id first, then join to orders (if you need order-level fields) without duplicating totals.

4) Self JOIN (bonus)

Goal: find pairs of customers in the same city.

SELECT a.customer_name AS customer_a, b.customer_name AS customer_b, a.city
FROM customers a
JOIN customers b ON a.city = b.city AND a.customer_id < b.customer_id;

Using a < prevents duplicates and self-pairs.

How to do it in practice

Decide the target grain: one row per what? (customer, order, day, session)
Choose the preserving table: which side must keep all rows? (LEFT JOIN that table)
Identify join keys: sometimes composite keys like (order_id, product_id)
Pre-aggregate detail tables to the target grain before joining
Validate counts and sums before/after

Quick self-check checklist

Did I accidentally multiply rows? Check row counts after each join.
Do my sums match a trusted single-table source?
Are NULLs expected? If not, investigate unmatched keys.
Am I joining on the correct keys and data types?

Exercises

These mirror the graded exercises below. Do them here first, then open the collapsible solutions to compare.

Exercise 1 — INNER vs LEFT JOIN basics (matches ex1)

Use the sample data, write two queries: one INNER JOIN to list only customers with orders; one LEFT JOIN to list all customers with their order totals (NULL if none).

Sample data

customers

customer_id	customer_name
1	Ada
2	Ben
3	Chen

orders

order_id	customer_id	order_total
10	1	25.00
11	1	30.00
12	3	50.00

Show suggestion

Use INNER JOIN for the first query.
Use LEFT JOIN for the second. Consider SUM(order_total) GROUP BY customer.

Exercise 2 — Avoid double counting with pre-aggregation (matches ex2)

Compute total revenue per customer from order_items without double counting. Then show the top result.

Sample data

customers

customer_id	customer_name
1	Ada
2	Ben

orders

order_id	customer_id
100	1
101	1
102	2

order_items

order_id	product_id	item_total
100	p1	20.00
100	p2	5.00
101	p3	15.00
102	p2	12.00

Show suggestion

Aggregate order_items by order_id first.
Join aggregated totals to orders, then to customers.

Checklist before submitting:
- Target grain is customer.
- No multiplication of order totals.
- Sums match manual calculation.

Common mistakes and self-checks

Joining on names instead of IDs: text can vary; use stable keys.
SUM of order_total after joining to order_items: leads to double counting. Sum item_total or pre-aggregate first.
Filters in WHERE that nullify a LEFT JOIN: move filter to the join condition or explicitly allow NULL.
Forgetting composite keys: join on all necessary columns (e.g., store_id and product_id).

Self-check queries

-- Row growth check
SELECT 'orders' AS t, COUNT(*) FROM orders
UNION ALL
SELECT 'orders_join_items', COUNT(*) FROM orders o JOIN order_items oi ON oi.order_id = o.order_id;

-- Duped key check (should be unique)
SELECT order_id, COUNT(*) c
FROM orders
GROUP BY 1
HAVING COUNT(*) > 1;

-- Anti-join to find unmatched keys
SELECT o.order_id
FROM orders o
LEFT JOIN customers c ON c.customer_id = o.customer_id
WHERE c.customer_id IS NULL;

Practical projects

Sales dashboard dataset: Join customers, orders, order_items, products. Produce one row per day per product_category with revenue and order_count.
Marketing attribution view: Join sessions, campaigns, and conversions. Preserve sessions with LEFT JOIN; compute conversion rate by channel.
Support performance: Join tickets to agents and SLA targets. Flag breaches with a derived column.

Learning path

Start: INNER and LEFT joins on single keys.
Move to composite keys and self joins.
Handle many-to-many with pre-aggregation.
Add CASE expressions and derived metrics post-join.
Optimize: indexes on join keys; limit columns.

Next steps

Finish the exercises below and check solutions.
Take the Quick Test to confirm understanding. Progress saving is available for logged-in users; the test itself is available to everyone.
Apply joins to one of the Practical projects.

Mini challenge

You have tables: users(user_id, signup_date), events(user_id, event_date, event_type). Build a query that returns one row per user with: first_event_date, total_events, and a flag has_event_in_first_7_days (relative to signup). Use LEFT JOIN to preserve all users and be careful with aggregation.

Menu

Joining Tables

Table of Contents