How to learn Joins Across Fact And Dimension Tables for SQL in BI Analyst for free

Why this matters

As a BI Analyst, most of your queries combine a central fact table (transactions, events) with descriptive dimensions (customer, product, date). Getting these joins right ensures your metrics are correct, repeatable, and fast to compute.

Build KPI dashboards: revenue by product category and customer segment.
Analyze cohorts: first order date vs. ship date to measure fulfillment speed.
Explain performance: identify which regions or segments drive growth.

Concept explained simply

Star schemas place a big fact table in the middle, surrounded by small dimension tables. The fact table stores numeric measures (quantity, price) and foreign keys to each dimension. Dimensions store human-readable attributes (names, categories, dates).

Mental model

Fact = events at a grain (one row per order line, per day, etc.).
Dimension = lookup details for each key (product, customer, date).
Join = attach readable attributes to facts using the keys.

Picture a hub-and-spoke wheel: the hub is your fact, the spokes are joins to dimensions.

Core patterns and join types

Inner join: keeps only fact rows with matching dimension keys. Good for clean, conformed data.
Left join: keeps all fact rows; unmatched dimensions show as NULL or an "Unknown" record. Safer for totals.
Role-playing date dimensions: join the same dim_date multiple times (e.g., order_date_key and ship_date_key) using aliases.
Surrogate keys: integers in facts that reference dimension primary keys. Avoid joining on names.

When to choose INNER vs LEFT

Use INNER when you are sure every fact has a dimension match and you want to exclude invalid data.
Use LEFT when you must preserve totals or investigate data quality gaps (NULLs reveal missing lookups).

Sample schema (for examples and exercises)

We will use a simplified star schema:

fact_sales
- order_id (int)
- order_line_id (int)
- order_date_key (int)        -- e.g., 20230115
- ship_date_key (int)
- customer_key (int)
- product_key (int)
- quantity (int)
- unit_price (decimal)
- status (text)               -- e.g., 'shipped', 'cancelled'

dim_customer
- customer_key (int PK)
- customer_id (text)
- segment (text)              -- e.g., 'Consumer', 'Corporate', 'SMB'
- region (text)

dim_product
- product_key (int PK)
- product_id (text)
- category (text)
- subcategory (text)

dim_date
- date_key (int PK)           -- yyyymmdd
- date (date)
- year (int)
- month (int)
- day (int)

Minimal example rows (conceptual):

dim_customer
(1,'C-001','Consumer','East')
(2,'C-002','Corporate','West')

dim_product
(10,'P-10','Furniture','Chairs')
(11,'P-11','Technology','Phones')

dim_date
(20230101,'2023-01-01',2023,1,1)
(20230102,'2023-01-02',2023,1,2)
(20230103,'2023-01-03',2023,1,3)

fact_sales
(100,1,20230101,20230102,1,10,2,100.00,'shipped')
(100,2,20230101,20230103,1,11,1,500.00,'shipped')
(101,1,20230102,20230103,2,10,1,120.00,'shipped')

Worked examples

Example 1 — Revenue by product category and customer segment (INNER JOIN)

SELECT
  p.category,
  c.segment,
  SUM(f.quantity * f.unit_price) AS revenue
FROM fact_sales f
JOIN dim_product  p ON f.product_key  = p.product_key
JOIN dim_customer c ON f.customer_key = c.customer_key
GROUP BY p.category, c.segment
ORDER BY revenue DESC;

Why it works: fact_sales holds measures; dimensions add labels. Grain stays at order line, but we aggregate safely by dimension attributes.

Example 2 — Preserve totals with LEFT JOIN and Unknowns

SELECT
  COALESCE(p.category,'Unknown') AS category,
  SUM(f.quantity) AS units,
  SUM(f.quantity * f.unit_price) AS revenue
FROM fact_sales f
LEFT JOIN dim_product p ON f.product_key = p.product_key
GROUP BY COALESCE(p.category,'Unknown');

Why it matters: Using LEFT JOIN ensures you do not lose fact rows when a product_key is missing in the dimension. Unknown bucket exposes data quality issues.

Example 3 — Role-playing date dimension to compute days to ship

SELECT
  o.year AS order_year,
  AVG(DATEDIFF(day, o.date, s.date)) AS avg_days_to_ship
FROM fact_sales f
JOIN dim_date o ON f.order_date_key = o.date_key
JOIN dim_date s ON f.ship_date_key  = s.date_key
WHERE f.status = 'shipped'
GROUP BY o.year
ORDER BY o.year;

Key idea: join dim_date twice with different aliases (o and s) because the fact references dates for different roles.

Common mistakes and self-checks

Mixing grains: joining a fact at order line grain to a dimension that has multiple rows per key (e.g., Type 2 history) without filtering the current row leads to duplicates. Self-check: count rows before and after joins.
Using names instead of keys: product_id names can change; always use surrogate keys. Self-check: review ON clauses.
Accidentally using INNER join: dropping rows and undercounting totals. Self-check: compare LEFT vs INNER row counts.
Forgetting role-playing aliases: joining dim_date once and reusing columns for multiple date roles yields nonsense. Self-check: each date_key → its own dim_date alias.

Exercises

Use the sample schema above. Aim to write queries, verify row counts, and sanity-check outputs.

Exercise 1: Revenue by category and segment for shipped orders in 2023.
Exercise 2: On-time shipping rate (orders shipped within 2 days of order date).

Checklist before you run your queries

Are you joining on surrogate keys (e.g., product_key) and not names?
Does the join type match your intent (LEFT to preserve totals)?
Are date dimensions correctly aliased per role?
Did you confirm row counts before/after joins?

Practical projects

Build a weekly sales dashboard: revenue, units, average order value by region and category. Use LEFT joins and a date spine from dim_date.
Create a fulfillment report: average and percentile days-to-ship by segment and category. Use role-playing date dimension joins.
Data quality notebook: counts of unknown keys by dimension and week; alert when unknowns exceed a threshold.

Learning path

Review star schema basics: facts, dimensions, surrogate keys.
Practice INNER vs LEFT joins and confirm row counts.
Master role-playing date joins and date differences.
Add aggregations safely (GROUP BY at the correct grain).
Harden queries with Unknown buckets and data quality checks.

Who this is for and prerequisites

Who this is for

Aspiring and junior BI Analysts working with star schemas.
Data-savvy PMs and analysts who need trustworthy metrics.

Prerequisites

Basic SQL: SELECT, WHERE, GROUP BY.
Understanding of primary/foreign keys.

Next steps

Do the exercises below, then take the quick test. Note: anyone can take the test for free; only logged-in users will have their progress saved.
Apply the patterns to your real warehouse tables.

Mini challenge

Find the top 3 product subcategories by revenue for Corporate segment in 2023, and include the average days to ship for those subcategories.

Hint

Join fact_sales to dim_customer, dim_product, and two aliases of dim_date. Filter year = 2023 on the order date alias. Use DATEDIFF and GROUP BY subcategory.

One possible approach

WITH joined AS (
  SELECT
    p.subcategory,
    SUM(f.quantity * f.unit_price) AS revenue,
    AVG(DATEDIFF(day, o.date, s.date)) AS avg_days_to_ship
  FROM fact_sales f
  JOIN dim_customer c ON f.customer_key = c.customer_key
  JOIN dim_product  p ON f.product_key  = p.product_key
  JOIN dim_date     o ON f.order_date_key = o.date_key
  JOIN dim_date     s ON f.ship_date_key  = s.date_key
  WHERE c.segment = 'Corporate' AND o.year = 2023 AND f.status = 'shipped'
  GROUP BY p.subcategory
)
SELECT *
FROM joined
ORDER BY revenue DESC
FETCH FIRST 3 ROWS ONLY; -- or LIMIT 3

Menu

Joins Across Fact And Dimension Tables

Table of Contents

Why this matters

Concept explained simply

Mental model

Core patterns and join types

Sample schema (for examples and exercises)

Worked examples

Example 1 — Revenue by product category and customer segment (INNER JOIN)

Example 2 — Preserve totals with LEFT JOIN and Unknowns

Example 3 — Role-playing date dimension to compute days to ship

Common mistakes and self-checks

Exercises

Practical projects

Learning path

Who this is for and prerequisites

Who this is for

Prerequisites

Next steps

Mini challenge

Practice Exercises

Revenue by category and segment (2023 shipped orders)

Instructions

Expected Output

On-time shipping rate (within 2 days)

Joins Across Fact And Dimension Tables — Quick Test

Have questions about Joins Across Fact And Dimension Tables?

AI Assistant