Star Schema Basics

Learn Star Schema Basics for free with explanations, exercises, and a quick test (for BI Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Why this matters

Show solution

SELECT d.year, d.month, p.category, SUM(f.net_revenue) AS revenue
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
WHERE d.year = EXTRACT(YEAR FROM CURRENT_DATE)
GROUP BY d.year, d.month, p.category
ORDER BY d.year, d.month, p.category;

Self-check: Only two joins (date and product), filters on date dimension, group by categorical attributes, aggregate a numeric measure.

Exercise checklist

Grain is unambiguous and written as a sentence.
Dimensions cover who/what/when/where/how.
Measures are additive at the chosen grain.
Queries join on surrogate keys and filter on dimension attributes.

Common mistakes and self-check

Unclear grain: If you cannot describe the fact grain in one sentence, pause and rewrite it.
Mixing levels: Putting order-level and line-level measures in the same fact. Keep separate facts or allocate.
Over-normalizing: Splitting dimensions into many tables, creating a snowflake. Start denormalized.
No surrogate keys: Depending on source IDs that change. Always add dimension surrogate keys.
Missing Date dimension: Relying on raw timestamps makes grouping messy.
Attribute drift: Changing category definitions without versioning. Use SCD Type 2 when history matters.

Self-check prompts

Can two rows in the fact point to the same dimension row? Yes, by design.
Does every measure make sense at the fact grain? If not, move or allocate.
Can you run top 5 by category last month with two joins? If it needs many joins, you may be snowflaking.

Practical projects

Build a mini sales mart: fact_sales + dim_date + dim_product + dim_customer. Load 1–2 months of sample data and verify KPIs.
Add refunds: Create fact_refund and confirm net sales = sales − refunds by month and by category.
Conform dimensions: Reuse dim_date and dim_product across both facts; prove consistency with combined queries.

Learning path

Star schema basics (this lesson)
Choosing the right grain and fact types (transactional, snapshot, accumulating)
Dimension design: surrogate keys, hierarchies, SCD Type 1 vs Type 2
Date/time dimensions and fiscal calendars
Conformed dimensions and data marts
Performance tips: indexing, partitions, pre-aggregations

Who this is for

BI Analysts, Data Analysts, and junior Data Engineers building dashboards and reports.

Prerequisites

Basic SQL (SELECT, JOIN, GROUP BY)
Understanding of business KPIs (revenue, margin, churn)

Next steps

Model a small star schema from a dataset you already have access to.
Run 3–5 realistic stakeholder queries and compare results to your current reports.
Expand with one more dimension and validate performance.

Mini challenge

Given a streaming service scenario, draft a star:

Fact grain: one row per user watch event
List 4 dimensions and 4 measures
Write one query to get hours watched by genre per week

Show an example answer

Dimensions: dim_date, dim_user, dim_title (genre, series), dim_device

Measures: seconds_watched, completed_flag, ad_impressions, buffering_seconds

SELECT d.week_start, t.genre, SUM(f.seconds_watched)/3600.0 AS hours
FROM fact_watch f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_title t ON f.title_key = t.title_key
GROUP BY d.week_start, t.genre
ORDER BY d.week_start, t.genre;

Quick test info

The quick test below is available to everyone. If you are logged in, your progress will be saved automatically.

Practice Exercises

3 exercises to complete

Instructions

Scenario: A food delivery app wants a star schema for post-checkout analytics. Available fields: order_id, order_line_id, customer_id, restaurant_id, city, cuisine, device_type, order_datetime, item_id, item_name, item_category, quantity, item_price, delivery_fee, discount_amount, payment_method.

Define the fact table grain
List measures
List dimensions with key attributes

Expected Output

A clear grain sentence, a fact table with measures at that grain, and 4–6 dimensions with surrogate key intent.