luvv to helpDiscover the Best Free Online Tools
Topic 2 of 8

Joins Across Facts And Dimensions

Learn Joins Across Facts And Dimensions for free with explanations, exercises, and a quick test (for BI Developer).

Published: December 24, 2025 | Updated: December 24, 2025

Who this is for

BI Developers and Analytics Engineers who need to write reliable SQL across star/snowflake schemas, combine multiple fact tables, and power dashboards without double counting.

Prerequisites

  • Comfort with SELECT, GROUP BY, WHERE, and basic JOINs
  • Familiarity with star schema ideas: fact tables, dimension tables, surrogate keys
  • Basic understanding of data types and NULL handling

Why this matters

In BI, most real questions span multiple tables: sales by customer segment, marketing spend vs. sales uplift, returns rate by product category and month. Getting joins across facts and dimensions right prevents silent errors like double counting, missing rows, or mismatched time grains that break KPIs and erode trust.

Real tasks you will handle
  • Build a revenue dashboard by day, product, customer segment
  • Attribute promo uplift by joining Sales and Promotions facts via Date and Product dimensions
  • Calculate conversion rate by joining Web Sessions and Orders facts at aligned grain
  • Handle role-playing dates (order_date vs ship_date) with a single Date dimension

Concept explained simply

Think of facts as events or numeric measurements (sales, sessions, ad spend). Dimensions describe those events (date, product, customer). You usually join facts to dimensions on surrogate keys (e.g., product_key).

Mental model

  • Start from a fact at a chosen grain (e.g., one row per order line)
  • Join to dimensions with INNER JOIN if you only want fully described facts, or LEFT JOIN if you want to keep all facts (including unknown/late dimensions)
  • When combining two facts, never naively join fact-to-fact on an ID unless it truly matches the same grain. Prefer: aggregate each fact to a shared dimensional grain (e.g., by date_key and product_key), then join the aggregates
  • Watch out for many-to-many relationships and SCD Type 2 dimensions (potentially multiple rows per key): filter to the correct time slice or bridge table

Core patterns and rules

  • Fact → Dimension: join on surrogate keys (e.g., fact_sales.product_key = dim_product.product_key)
  • Two facts together: aggregate both to a common grain via the same conformed dimensions, then join on those dimension keys
  • Role-playing dimensions: alias the same dimension twice (e.g., order_date vs ship_date)
  • Unknown members: use the dimension’s Unknown row (often key=0 or -1) and LEFT JOIN to keep facts
  • Many-to-many: use a bridge table, sometimes with a weighting column, to avoid exploding counts
  • SCD Type 2: pick the correct version with date filters (e.g., fact.transaction_date BETWEEN effective_from AND effective_to)

Worked examples

Example 1: Sales by day and customer segment (single fact → dimensions)

-- Grain: order line (fact_sales). Keep all facts even if some dims are missing.
SELECT
  d.calendar_date,
  c.segment,
  SUM(s.gross_amount - s.discount_amount) AS net_revenue
FROM fact_sales s
LEFT JOIN dim_date d       ON s.date_key = d.date_key
LEFT JOIN dim_customer c   ON s.customer_key = c.customer_key
LEFT JOIN dim_product p    ON s.product_key = p.product_key
WHERE d.calendar_date BETWEEN DATE '2024-01-01' AND DATE '2024-01-31'
GROUP BY d.calendar_date, c.segment
ORDER BY d.calendar_date, c.segment;

Why this works: We join at fact grain to descriptive attributes and aggregate once. LEFT JOIN keeps sales even with unknown customers.

Example 2: Promo uplift (joining two facts via dimensions)

-- Goal: revenue and promo impressions by day and product category.
WITH sales AS (
  SELECT s.date_key, p.category_key, SUM(s.gross_amount - s.discount_amount) AS net_revenue
  FROM fact_sales s
  JOIN dim_product p ON s.product_key = p.product_key
  GROUP BY s.date_key, p.category_key
), promo AS (
  SELECT pr.date_key, p.category_key, SUM(pr.impressions) AS promo_impressions
  FROM fact_promotions pr
  JOIN dim_product p ON pr.product_key = p.product_key
  GROUP BY pr.date_key, p.category_key
)
SELECT d.calendar_date, c.category_name, s.net_revenue, pr.promo_impressions
FROM sales s
FULL JOIN promo pr
  ON s.date_key = pr.date_key AND s.category_key = pr.category_key
JOIN dim_date d ON COALESCE(s.date_key, pr.date_key) = d.date_key
JOIN dim_category c ON COALESCE(s.category_key, pr.category_key) = c.category_key
ORDER BY d.calendar_date, c.category_name;

Why this works: We aggregate each fact to the same dimensional grain (date_key, category_key) before joining. FULL JOIN preserves dates/categories where only one fact has data.

Example 3: Many-to-many with a bridge (products ↔ tags)

-- fact_sales joins dim_product which connects to tag bridge (product_tag_bridge).
-- We compute revenue by tag using weights to avoid double counting.
WITH sales AS (
  SELECT s.product_key, s.date_key, SUM(s.gross_amount - s.discount_amount) AS net_revenue
  FROM fact_sales s
  GROUP BY s.product_key, s.date_key
)
SELECT d.calendar_month, t.tag_name,
       SUM(s.net_revenue * b.weight) AS weighted_revenue
FROM sales s
JOIN product_tag_bridge b ON s.product_key = b.product_key
JOIN dim_tag t            ON b.tag_key = t.tag_key
JOIN dim_date d           ON s.date_key = d.date_key
GROUP BY d.calendar_month, t.tag_name
ORDER BY d.calendar_month, t.tag_name;

Why this works: The bridge table prevents revenue from being counted multiple times per product that has multiple tags. The weight (e.g., 1/N tags) allocates revenue proportionally.

Practice exercises

Solve these in your SQL environment. Then compare with the solutions.

  1. Exercise 1 (ex1): Daily net revenue by customer segment
    Tables: fact_sales(sale_id, date_key, product_key, customer_key, quantity, gross_amount, discount_amount), dim_date(date_key, calendar_date), dim_customer(customer_key, segment).
    Task: Return calendar_date, segment, net_revenue for January 2024. Sort by date, segment.
  2. Exercise 2 (ex2): Sessions vs orders by day and device
    Tables: fact_web_sessions(date_key, device_key, sessions), fact_orders(date_key, device_key, order_id), dim_date(date_key, calendar_date), dim_device(device_key, device_type).
    Task: Compute daily sessions, orders, and conversion_rate = orders/sessions by device_type.
Checklist before you run your query
  • Did you pick a clear grain (e.g., by day and segment or by day and device_type)?
  • Are joins from fact to dimensions on surrogate keys?
  • For two-fact queries, did you aggregate each fact first to the same dimensional grain?
  • Are you avoiding duplicate counting from many-to-many joins?
  • Did you choose INNER vs LEFT/FULL JOIN intentionally?

Common mistakes and self-check

  • Naive fact-to-fact joins on product_key and date without pre-aggregation → duplicates. Fix: aggregate each fact to the same grain first.
  • Joining to SCD Type 2 dimensions without time filter → multiple dimension rows per key. Fix: filter to active row at the fact date.
  • Exploded counts from many-to-many attributes (e.g., multiple tags per product). Fix: bridge table and, if needed, weighting.
  • Inner joins dropping facts with late/unknown dimensions. Fix: LEFT JOIN and map to the Unknown member.
  • Mixing grains (day vs month). Fix: derive and use one target grain, group only once at the end.
  • Counting rows after multiple joins. Fix: COUNT(DISTINCT fact_id) at the fact grain or aggregate before joining other dimensions.
Self-check prompts
  • If you removed all dimension joins, would your fact totals match after re-adding them?
  • Does any dimension join multiply fact rows? Where and why?
  • Can you explain your chosen grain in one sentence?

Practical projects

  • Revenue dashboard: Build a daily revenue and margin view by product category and customer segment. Include Unknown segments.
  • Marketing attribution mini-mart: Join ad_spend and sales facts via date and channel dimensions to compute simple ROAS by week.
  • Operations KPI: Combine shipments and orders facts via date and product, ensuring grains match, to compute on-time rate and backlog.

Learning path

  1. Revisit dimensional modeling basics: facts, dimensions, surrogate keys, grain.
  2. Master single-fact joins: consistent use of INNER/LEFT, handling Unknowns.
  3. Two-fact patterns: pre-aggregate to conformed dimensions, then join.
  4. Advanced cases: role-playing dates, SCD Type 2 filters, bridge tables.
  5. Performance: only necessary columns, filter early, aggregate sensibly.

Next steps

  • Refactor one of your existing dashboards to ensure all joins follow a clear grain
  • Add tests or simple row-count checks to catch duplication when dimensions change
  • Tackle the mini challenge below, then take the Quick Test

Mini challenge

You have fact_orders (order_id, order_date_key, customer_key, product_key, revenue), fact_returns (order_id, return_date_key, product_key, return_amount), dim_date, dim_product, dim_customer.

Build a query that reports, by calendar_month and product category, revenue, return_amount, and return_rate = return_amount/revenue. Ensure you do not double count orders with multiple return lines. Hint: aggregate each fact first to month + category, then join.

Take the Quick Test

Answer short questions to check your understanding. Everyone can take the test; only logged-in users have their progress saved.

Practice Exercises

2 exercises to complete

Instructions

Use fact_sales(sale_id, date_key, product_key, customer_key, quantity, gross_amount, discount_amount), dim_date(date_key, calendar_date), dim_customer(customer_key, segment).

Return: calendar_date, segment, net_revenue for 2024-01-01 to 2024-01-31. Sort by date, segment.

Expected Output
calendar_date | segment | net_revenue 2024-01-01 | Enterprise | 15234.50 2024-01-01 | SMB | 9344.10 2024-01-01 | Consumer | 4820.00 ... 2024-01-31 | Consumer | 5012.75

Joins Across Facts And Dimensions — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Joins Across Facts And Dimensions?

AI Assistant

Ask questions about this tool