Why this matters
Fact and dimension tables are the backbone of star schemas. As a BI Analyst, you will define what gets measured, how it can be sliced, and how reports aggregate correctly. Real tasks include:
- Designing a Sales star schema for dashboards
- Choosing the right grain for events vs. monthly snapshots
- Handling changing product or customer attributes over time
- Writing queries that join facts to dimensions without duplicates
Quick note about progress: the quick test on this page is available to everyone; if you are logged in, your progress will be saved automatically.
Concept explained simply
• Fact table: a collection of business events or numeric observations. Examples: sales line items, pageviews, monthly account balances.
• Dimension table: descriptive context used to slice and filter facts. Examples: date, customer, product, store, campaign.
Mental model
Think of a fact as a row in a diary of events. Dimensions are labeled sticky notes you attach to each event so you can later group by them. The most important decision is the grain: the exact definition of one row in the fact table.
Grain — the irreversible decision
State the grain in one precise sentence before adding columns. Examples:
- Sales fact: one row per order line item
- Pageview fact: one row per pageview event
- Subscription snapshot: one row per subscription per month
Change the grain later and all downstream metrics change. Document it in the table comment.
Keys that make joins safe
- Use surrogate keys (integer IDs) in dimensions to decouple from source IDs and to manage history (SCD).
- Fact tables store foreign keys to dimensions (product_key, customer_key, date_key, etc.).
- Degenerate dimensions: identifiers like order_number can live in the fact when there is no separate dimension.
Types of facts and measures
- Transactional fact: one row per event. Measures are typically fully additive.
- Periodic snapshot: regular intervals (e.g., daily inventory, monthly MRR). Some measures are semi-additive (add across stores but not across time).
- Accumulating snapshot: tracks a workflow with milestones (e.g., order from placed to delivered).
- Measures: additive (quantity), semi-additive (balance), non-additive (ratios like conversion rate).
Slowly Changing Dimensions (SCD)
- Type 1: overwrite value; no history. Good for correcting errors.
- Type 2: add a new row per change; keeps history with effective dates and current flag.
- Type 0: never change (e.g., date dimension).
Designing facts and dimensions (step-by-step)
- Pick the business process (sales, billing, support tickets, web events).
- Declare the grain in one sentence.
- Choose measures consistent with the grain (e.g., sales_amount, quantity).
- Identify dimensions to analyze by (date, product, customer, store, campaign).
- Assign keys: surrogate keys in dimensions; FKs in facts. Include degenerate identifiers when needed.
- Plan SCD strategy per dimension attribute.
- Handle unknowns: provide default dimension rows like “Unknown Product”.
- Conform dimensions so multiple fact tables can share them (same product_key across sales and returns).
Worked examples
Example 1 — Retail sales (transactional fact)
Grain: one row per order line item.
Fact_Sales columns: date_key, product_key, customer_key, store_key, promo_key, order_number (degenerate), line_number, quantity, sales_amount, discount_amount, cost_amount.
Dimensions: Dim_Date, Dim_Product (SCD2 for category/brand changes), Dim_Customer (SCD2 for address tier), Dim_Store, Dim_Promo.
-- Query: sales by category for last month
SELECT d.month_name,
p.category,
SUM(f.sales_amount) AS revenue,
SUM(f.quantity) AS units
FROM Fact_Sales f
JOIN Dim_Date d ON f.date_key = d.date_key
JOIN Dim_Product p ON f.product_key = p.product_key
WHERE d.is_last_month = 1
GROUP BY d.month_name, p.category;
Example 2 — Subscription MRR (periodic snapshot)
Grain: one row per subscription per month.
Fact_Subscription_Snapshot: snapshot_month_key, customer_key, plan_key, mrr, is_active_flag, seats, churn_risk_score.
Notes: mrr is semi-additive (add across customers, not across months without care). Dim_Customer and Dim_Plan are SCD2 so historical MRR aligns to historical attributes.
Example 3 — Web analytics pageviews (transactional fact)
Grain: one row per pageview.
Fact_Pageview: date_key, time_key, user_key, session_key, page_key, referrer_key, device_key, time_on_page_seconds, scroll_depth_pct.
Notes: Bounce rate is a derived ratio, not stored directly in the fact. Store raw counts so ratios can be recomputed correctly.
Pre-flight checklists
Use these before building or publishing.
Grain and measures
- I can state the fact table grain in one sentence.
- All measures make sense at the chosen grain.
- No derived ratios are stored as measures unless justified.
Keys and joins
- Every fact FK points to a conformed dimension key.
- Degenerate identifiers (like order_number) are in the fact when no dimension is needed.
- Unknown dimension rows exist to prevent null FKs.
History strategy
- Each dimension has a defined SCD policy (Type 1/2/0).
- Surrogate keys support SCD2 where required.
- Measures that are semi-additive are clearly labeled in docs.
Exercises
Do these after reading the examples. Suggested time: 20–30 minutes.
Exercise 1 — Define grain and columns
Scenario: You need a returns fact for retail. Analysts want to see returned amount by reason, by product, by month. A single order can have multiple returned items and multiple reasons.
- Write the grain sentence.
- List the fact table columns: FKs, measures, degenerate IDs.
- List the required dimensions and SCD needs.
Exercise 2 — Choose SCD types
For each attribute, pick SCD Type 1 or Type 2 and state why:
- Product brand
- Customer email
- Store manager name
- Plan monthly price
Self-check (quick)
- My grain statement is unambiguous and testable.
- Measures are additive/semi-additive classifications are noted.
- Each dimension’s SCD choice is justified.
Need a hint?
- If multiple returned items per order can exist, your grain likely includes the line item.
- If analysts need historical context (e.g., price at the time), lean SCD2.
Common mistakes and how to self-check
- Vague grain: If you cannot say the grain in one sentence, stop and define it. Test by asking: would two events ever share the same row?
- Ratios as measures: Storing conversion_rate invites double-counting. Store counts and compute ratios in queries.
- Missing unknown dimensions: Null FKs break joins. Add default rows like product_key = 0 for Unknown.
- Overusing Type 2: Track history only when business logic requires it; Type 2 increases table size and join complexity.
- Mixing grains in one fact: Do not blend daily snapshots with transactions in the same table.
Self-check query idea
-- Look for orphaned keys
SELECT COUNT(*) AS orphan_rows
FROM Fact_Sales f
LEFT JOIN Dim_Product p ON f.product_key = p.product_key
WHERE p.product_key IS NULL;
Practical projects
- Design a Sales and Returns star schema. Deliver: grain statements, column lists, SCD choices, and 3 example queries.
- Build a Monthly Subscription Snapshot. Deliver: snapshot job spec, schema DDL, and a report showing churn and expansion by cohort.
- Create a Web Events star. Deliver: event dictionary, Fact_Pageview schema, Dim_User privacy-safe attributes, and a dashboard mockup.
Quick test
Take the quick test to check understanding. Everyone can take it. If you are logged in, your answers and progress will be saved.
Mini challenge
Design a Promotions effectiveness model:
- Pick a grain for Fact_Promo_Performance.
- List 3 measures and 4 dimensions.
- Decide which dimension attributes need SCD2.
Example approach (hidden)
Grain: one row per store per product per day with promo status. Measures: sales_lift_amount, units, promo_days_active. Dimensions: Date, Store, Product, Promo. Product category uses SCD2.
Who this is for
- BI Analysts and Analytics Engineers building dashboards and models
- Data Analysts translating business metrics into warehouse schemas
Prerequisites
- Comfort with SQL joins and aggregations
- Basic understanding of data types and primary/foreign keys
Learning path
- Grain and measures fundamentals
- Fact types: transactional, snapshot, accumulating
- Dimensions and SCD strategies
- Conformed dimensions across multiple fact tables
Next steps
- Apply these steps to a real dataset and document your grain statements
- Review with stakeholders to confirm dimensions match how they analyze the business
- Take the quick test and revisit weak areas identified