luvv to helpDiscover the Best Free Online Tools
Topic 2 of 8

Joins Across Fact And Dimension Tables

Learn Joins Across Fact And Dimension Tables for free with explanations, exercises, and a quick test (for BI Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Why this matters

As a BI Analyst, most of your queries combine a central fact table (transactions, events) with descriptive dimensions (customer, product, date). Getting these joins right ensures your metrics are correct, repeatable, and fast to compute.

  • Build KPI dashboards: revenue by product category and customer segment.
  • Analyze cohorts: first order date vs. ship date to measure fulfillment speed.
  • Explain performance: identify which regions or segments drive growth.

Concept explained simply

Star schemas place a big fact table in the middle, surrounded by small dimension tables. The fact table stores numeric measures (quantity, price) and foreign keys to each dimension. Dimensions store human-readable attributes (names, categories, dates).

Mental model

  • Fact = events at a grain (one row per order line, per day, etc.).
  • Dimension = lookup details for each key (product, customer, date).
  • Join = attach readable attributes to facts using the keys.

Picture a hub-and-spoke wheel: the hub is your fact, the spokes are joins to dimensions.

Core patterns and join types

  • Inner join: keeps only fact rows with matching dimension keys. Good for clean, conformed data.
  • Left join: keeps all fact rows; unmatched dimensions show as NULL or an "Unknown" record. Safer for totals.
  • Role-playing date dimensions: join the same dim_date multiple times (e.g., order_date_key and ship_date_key) using aliases.
  • Surrogate keys: integers in facts that reference dimension primary keys. Avoid joining on names.
When to choose INNER vs LEFT
  • Use INNER when you are sure every fact has a dimension match and you want to exclude invalid data.
  • Use LEFT when you must preserve totals or investigate data quality gaps (NULLs reveal missing lookups).

Sample schema (for examples and exercises)

We will use a simplified star schema:

fact_sales
- order_id (int)
- order_line_id (int)
- order_date_key (int)        -- e.g., 20230115
- ship_date_key (int)
- customer_key (int)
- product_key (int)
- quantity (int)
- unit_price (decimal)
- status (text)               -- e.g., 'shipped', 'cancelled'

dim_customer
- customer_key (int PK)
- customer_id (text)
- segment (text)              -- e.g., 'Consumer', 'Corporate', 'SMB'
- region (text)

dim_product
- product_key (int PK)
- product_id (text)
- category (text)
- subcategory (text)

dim_date
- date_key (int PK)           -- yyyymmdd
- date (date)
- year (int)
- month (int)
- day (int)

Minimal example rows (conceptual):

dim_customer
(1,'C-001','Consumer','East')
(2,'C-002','Corporate','West')

dim_product
(10,'P-10','Furniture','Chairs')
(11,'P-11','Technology','Phones')

dim_date
(20230101,'2023-01-01',2023,1,1)
(20230102,'2023-01-02',2023,1,2)
(20230103,'2023-01-03',2023,1,3)

fact_sales
(100,1,20230101,20230102,1,10,2,100.00,'shipped')
(100,2,20230101,20230103,1,11,1,500.00,'shipped')
(101,1,20230102,20230103,2,10,1,120.00,'shipped')

Worked examples

Example 1 — Revenue by product category and customer segment (INNER JOIN)

SELECT
  p.category,
  c.segment,
  SUM(f.quantity * f.unit_price) AS revenue
FROM fact_sales f
JOIN dim_product  p ON f.product_key  = p.product_key
JOIN dim_customer c ON f.customer_key = c.customer_key
GROUP BY p.category, c.segment
ORDER BY revenue DESC;

Why it works: fact_sales holds measures; dimensions add labels. Grain stays at order line, but we aggregate safely by dimension attributes.

Example 2 — Preserve totals with LEFT JOIN and Unknowns

SELECT
  COALESCE(p.category,'Unknown') AS category,
  SUM(f.quantity) AS units,
  SUM(f.quantity * f.unit_price) AS revenue
FROM fact_sales f
LEFT JOIN dim_product p ON f.product_key = p.product_key
GROUP BY COALESCE(p.category,'Unknown');

Why it matters: Using LEFT JOIN ensures you do not lose fact rows when a product_key is missing in the dimension. Unknown bucket exposes data quality issues.

Example 3 — Role-playing date dimension to compute days to ship

SELECT
  o.year AS order_year,
  AVG(DATEDIFF(day, o.date, s.date)) AS avg_days_to_ship
FROM fact_sales f
JOIN dim_date o ON f.order_date_key = o.date_key
JOIN dim_date s ON f.ship_date_key  = s.date_key
WHERE f.status = 'shipped'
GROUP BY o.year
ORDER BY o.year;

Key idea: join dim_date twice with different aliases (o and s) because the fact references dates for different roles.

Common mistakes and self-checks

  • Mixing grains: joining a fact at order line grain to a dimension that has multiple rows per key (e.g., Type 2 history) without filtering the current row leads to duplicates. Self-check: count rows before and after joins.
  • Using names instead of keys: product_id names can change; always use surrogate keys. Self-check: review ON clauses.
  • Accidentally using INNER join: dropping rows and undercounting totals. Self-check: compare LEFT vs INNER row counts.
  • Forgetting role-playing aliases: joining dim_date once and reusing columns for multiple date roles yields nonsense. Self-check: each date_key → its own dim_date alias.

Exercises

Use the sample schema above. Aim to write queries, verify row counts, and sanity-check outputs.

  • Exercise 1: Revenue by category and segment for shipped orders in 2023.
  • Exercise 2: On-time shipping rate (orders shipped within 2 days of order date).
Checklist before you run your queries
  • Are you joining on surrogate keys (e.g., product_key) and not names?
  • Does the join type match your intent (LEFT to preserve totals)?
  • Are date dimensions correctly aliased per role?
  • Did you confirm row counts before/after joins?

Practical projects

  • Build a weekly sales dashboard: revenue, units, average order value by region and category. Use LEFT joins and a date spine from dim_date.
  • Create a fulfillment report: average and percentile days-to-ship by segment and category. Use role-playing date dimension joins.
  • Data quality notebook: counts of unknown keys by dimension and week; alert when unknowns exceed a threshold.

Learning path

  1. Review star schema basics: facts, dimensions, surrogate keys.
  2. Practice INNER vs LEFT joins and confirm row counts.
  3. Master role-playing date joins and date differences.
  4. Add aggregations safely (GROUP BY at the correct grain).
  5. Harden queries with Unknown buckets and data quality checks.

Who this is for and prerequisites

Who this is for

  • Aspiring and junior BI Analysts working with star schemas.
  • Data-savvy PMs and analysts who need trustworthy metrics.

Prerequisites

  • Basic SQL: SELECT, WHERE, GROUP BY.
  • Understanding of primary/foreign keys.

Next steps

  • Do the exercises below, then take the quick test. Note: anyone can take the test for free; only logged-in users will have their progress saved.
  • Apply the patterns to your real warehouse tables.

Mini challenge

Find the top 3 product subcategories by revenue for Corporate segment in 2023, and include the average days to ship for those subcategories.

Hint

Join fact_sales to dim_customer, dim_product, and two aliases of dim_date. Filter year = 2023 on the order date alias. Use DATEDIFF and GROUP BY subcategory.

One possible approach
WITH joined AS (
  SELECT
    p.subcategory,
    SUM(f.quantity * f.unit_price) AS revenue,
    AVG(DATEDIFF(day, o.date, s.date)) AS avg_days_to_ship
  FROM fact_sales f
  JOIN dim_customer c ON f.customer_key = c.customer_key
  JOIN dim_product  p ON f.product_key  = p.product_key
  JOIN dim_date     o ON f.order_date_key = o.date_key
  JOIN dim_date     s ON f.ship_date_key  = s.date_key
  WHERE c.segment = 'Corporate' AND o.year = 2023 AND f.status = 'shipped'
  GROUP BY p.subcategory
)
SELECT *
FROM joined
ORDER BY revenue DESC
FETCH FIRST 3 ROWS ONLY; -- or LIMIT 3

Practice Exercises

2 exercises to complete

Instructions

Using the sample schema, write a query that returns revenue by product category and customer segment for shipped orders in 2023. Use:

  • Joins: fact_sales to dim_product and dim_customer.
  • Role-playing date: join dim_date to order_date_key and filter year = 2023.
  • Aggregate: SUM(quantity * unit_price).
Expected Output
Columns: category, segment, revenue. Example rows: (Furniture, Consumer, 320.00), (Technology, Consumer, 500.00), (Furniture, Corporate, 120.00). Values will vary with your data.

Joins Across Fact And Dimension Tables — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Joins Across Fact And Dimension Tables?

AI Assistant

Ask questions about this tool