Why this matters
Business Analysts frequently need to combine data across tables, like linking customers to their orders or tickets to agents. INNER JOIN and LEFT JOIN are the core tools to produce accurate metrics, funnel analyses, and customer lists without accidentally losing or duplicating rows.
- Create clean datasets for dashboards (orders by customer, tickets by channel).
- Answer stakeholder questions (Which customers have no orders? Which orders have missing customer data?).
- Avoid data pitfalls (false drops from filtering or inflated counts from duplicate joins).
Progress note: The quick test is available to everyone. Only logged-in learners have their test progress saved.
Who this is for
- Business Analysts starting with SQL who need reliable, joined datasets.
- Analysts moving from spreadsheets to relational databases.
- Anyone who must combine multiple sources without data loss.
Prerequisites
- Basic SELECT, WHERE, ORDER BY.
- Comfort with column references like table.column.
- Understanding of primary key (unique id) and foreign key (references another table).
Concept explained simply
Imagine two lists:
- List A: Customers.
- List B: Orders (each order points to a customer).
INNER JOIN = pairs only items that match in both lists. LEFT JOIN = all items from the left list (Customers), plus matching items from the right (Orders). If no match, you still keep the customer with NULLs for the order columns.
Mental model
Think of JOIN as aligning rows by a common key (e.g., customer_id). INNER JOIN keeps the overlapping area (like the middle of a Venn diagram). LEFT JOIN keeps the entire left circle, filling missing right-side values with NULL.
Core syntax
-- INNER JOIN: keep only matches a SELECT columns FROM tableA a INNER JOIN tableB b ON a.key = b.key [WHERE conditions] -- LEFT JOIN: keep all from left, match right when possible SELECT columns FROM tableA a LEFT JOIN tableB b ON a.key = b.key [WHERE conditions that do NOT unintentionally filter b to NULL]
ON vs WHERE (critical difference)
- Put join conditions between tables in the ON clause.
- Filtering a right-table column in the WHERE clause after a LEFT JOIN turns it into an INNER JOIN (because NULLs fail WHERE conditions). To filter rows from the right table while preserving unmatched left rows, move that filter into the ON clause.
Sample data used in examples
-- customers (c) +----+----------+---------+ | id | name | country | +----+----------+---------+ | 1 | Ava | US | | 2 | Ben | UK | | 3 | Chen | CN | | 4 | Dina | US | +----+----------+---------+ -- orders (o) +---------+-------------+------------+--------+ | order_id| customer_id | order_date | amount | +---------+-------------+------------+--------+ | 101 | 1 | 2023-01-15 | 120.00 | | 102 | 2 | 2023-03-02 | 80.00 | | 103 | 1 | 2024-02-01 | 55.00 | | 104 | 3 | 2023-06-20 | 230.00 | | 105 | 1 | 2023-11-05 | 42.00 | +---------+-------------+------------+--------+
Worked examples
Example 1: Basic INNER JOIN (only matching rows)
SELECT o.order_id, c.name AS customer_name FROM orders o INNER JOIN customers c ON c.id = o.customer_id ORDER BY o.order_id;
What it returns
order_id | customer_name ---------+-------------- 101 | Ava 102 | Ben 103 | Ava 104 | Chen 105 | Ava
No customers without orders appear.
Example 2: LEFT JOIN to include customers with no orders
SELECT c.name AS customer_name, o.order_id FROM customers c LEFT JOIN orders o ON o.customer_id = c.id ORDER BY c.name, o.order_id;
What it returns
customer_name | order_id --------------+--------- Ava | 101 Ava | 102? -- wait, that's Ben's order (we will not see this, just an example of why keys matter) Ben | 102 Chen | 104 Dina | NULL
Note how Dina appears with NULL because she has no orders.
Example 3: Preserve LEFT JOIN while filtering right table (filter in ON)
Goal: Show each customer with their 2023 orders only, but keep customers who had zero orders in 2023.
SELECT c.name AS customer_name, o.order_id FROM customers c LEFT JOIN orders o ON o.customer_id = c.id AND o.order_date >= '2023-01-01' AND o.order_date < '2024-01-01' ORDER BY c.name, o.order_id;
Why not filter in WHERE?
Putting o.order_date filters in WHERE would drop rows where o.* is NULL, turning the LEFT JOIN into an INNER JOIN unintentionally.
How to approach JOIN tasks (step-by-step)
Common mistakes and self-checks
- Forgetting the join key leads to a Cartesian product (row explosion). Self-check: large row counts or duplicates? Ensure ON a.key = b.key exists.
- Wrong place for filters on right table after LEFT JOIN turns it into INNER JOIN. Fix: move those filters into ON.
- Ambiguous column names (e.g., id). Always prefix with table alias (c.id, o.id).
- Double counting when joining a one-to-many table and then aggregating. Fix: aggregate the many-side first or use DISTINCT carefully.
- Mismatched data types (text vs int) causing no matches. Ensure compatible types or cast consistently.
Quick self-check list
- Do I have exactly one ON clause per join?
- Is the join condition using the correct keys?
- For LEFT JOIN, are right-table filters inside ON?
- Did the number of rows unexpectedly shrink or explode?
- Do sample rows match expectations (spot-check 5–10 rows)?
Exercises
Use the sample data above unless stated otherwise. Results may be shown in any logically sorted order unless specified.
- Exercise ex1 (INNER JOIN): List order_id and customer_name for orders placed in 2023. Sort by order_id ascending.
- Exercise ex2 (LEFT JOIN + aggregation): Show each customer with count of 2023 orders (including zero). Return columns: customer_name, orders_2023. Sort by customer_name.
Checklist before you run
- Did you write the ON condition first?
- For ex2, did you keep the LEFT JOIN and put the date filter in ON or inside an aggregate condition?
- Are columns properly prefixed with table aliases?
Practical projects
- Customer activation report: LEFT JOIN customers to orders; produce a table with first_order_date per customer. Highlight customers with NULL (not yet activated).
- Order quality check: INNER JOIN orders to customers; find orders with missing or invalid customer references (should be zero if referential integrity is enforced; otherwise detect NULLs when using RIGHT JOIN or by anti-join pattern with LEFT JOIN ... WHERE c.id IS NULL).
- Regional sales snapshot: LEFT JOIN customers to orders; compute 2023 revenue per country, including countries with no revenue (0).
Learning path
- Step 1: Practice INNER JOIN (simple one-to-one joins).
- Step 2: Practice LEFT JOIN (retaining all left rows) with careful ON filters.
- Step 3: Join then aggregate (GROUP BY), avoiding double counting.
- Step 4: Multi-table joins (customers → orders → products).
- Step 5: Self-check patterns and row-count validation.
Mini challenge
Goal: For each customer, show total 2023 amount and a flag has_2023_orders (Y/N). Include customers with zero total.
Show one possible solution
SELECT c.name AS customer_name, COALESCE(SUM(CASE WHEN o.order_date >= '2023-01-01' AND o.order_date < '2024-01-01' THEN o.amount END), 0) AS total_2023, CASE WHEN SUM(CASE WHEN o.order_date >= '2023-01-01' AND o.order_date < '2024-01-01' THEN 1 ELSE 0 END) > 0 THEN 'Y' ELSE 'N' END AS has_2023_orders FROM customers c LEFT JOIN orders o ON o.customer_id = c.id GROUP BY c.name ORDER BY c.name;
Next steps
- Try joins on your real datasets (tickets → agents, visits → users, products → categories).
- Add a third table to your practice queries (e.g., join orders → customers → products).
- Move on to filtering joined data safely and to aggregate joins.
Quick Test
Take the test to check your understanding. Available to everyone; only logged-in users have their progress saved.