Why this matters
Show solution
SELECT d.year, d.month, p.category, SUM(f.net_revenue) AS revenue FROM fact_sales f JOIN dim_date d ON f.date_key = d.date_key JOIN dim_product p ON f.product_key = p.product_key WHERE d.year = EXTRACT(YEAR FROM CURRENT_DATE) GROUP BY d.year, d.month, p.category ORDER BY d.year, d.month, p.category;
Self-check: Only two joins (date and product), filters on date dimension, group by categorical attributes, aggregate a numeric measure.
Exercise checklist
- Grain is unambiguous and written as a sentence.
- Dimensions cover who/what/when/where/how.
- Measures are additive at the chosen grain.
- Queries join on surrogate keys and filter on dimension attributes.
Common mistakes and self-check
- Unclear grain: If you cannot describe the fact grain in one sentence, pause and rewrite it.
- Mixing levels: Putting order-level and line-level measures in the same fact. Keep separate facts or allocate.
- Over-normalizing: Splitting dimensions into many tables, creating a snowflake. Start denormalized.
- No surrogate keys: Depending on source IDs that change. Always add dimension surrogate keys.
- Missing Date dimension: Relying on raw timestamps makes grouping messy.
- Attribute drift: Changing category definitions without versioning. Use SCD Type 2 when history matters.
Self-check prompts
- Can two rows in the fact point to the same dimension row? Yes, by design.
- Does every measure make sense at the fact grain? If not, move or allocate.
- Can you run top 5 by category last month with two joins? If it needs many joins, you may be snowflaking.
Practical projects
- Build a mini sales mart: fact_sales + dim_date + dim_product + dim_customer. Load 1–2 months of sample data and verify KPIs.
- Add refunds: Create fact_refund and confirm net sales = sales − refunds by month and by category.
- Conform dimensions: Reuse dim_date and dim_product across both facts; prove consistency with combined queries.
Learning path
- Star schema basics (this lesson)
- Choosing the right grain and fact types (transactional, snapshot, accumulating)
- Dimension design: surrogate keys, hierarchies, SCD Type 1 vs Type 2
- Date/time dimensions and fiscal calendars
- Conformed dimensions and data marts
- Performance tips: indexing, partitions, pre-aggregations
Who this is for
- BI Analysts, Data Analysts, and junior Data Engineers building dashboards and reports.
Prerequisites
- Basic SQL (SELECT, JOIN, GROUP BY)
- Understanding of business KPIs (revenue, margin, churn)
Next steps
- Model a small star schema from a dataset you already have access to.
- Run 3–5 realistic stakeholder queries and compare results to your current reports.
- Expand with one more dimension and validate performance.
Mini challenge
Given a streaming service scenario, draft a star:
- Fact grain: one row per user watch event
- List 4 dimensions and 4 measures
- Write one query to get hours watched by genre per week
Show an example answer
Dimensions: dim_date, dim_user, dim_title (genre, series), dim_device
Measures: seconds_watched, completed_flag, ad_impressions, buffering_seconds
SELECT d.week_start, t.genre, SUM(f.seconds_watched)/3600.0 AS hours FROM fact_watch f JOIN dim_date d ON f.date_key = d.date_key JOIN dim_title t ON f.title_key = t.title_key GROUP BY d.week_start, t.genre ORDER BY d.week_start, t.genre;
Quick test info
The quick test below is available to everyone. If you are logged in, your progress will be saved automatically.