Why this matters
Star schemas make BI reports fast, simple, and consistent. As a BI Developer, you will design models that analysts and dashboards rely on daily. Typical tasks include:
- Designing a sales or revenue model that refreshes reliably and drives executive dashboards.
- Creating a customer 360 view that joins orders, interactions, and support data.
- Optimizing slow reports by simplifying snowflaked models into clean star schemas.
- Aligning KPIs across teams using conformed dimensions (e.g., one shared Date or Customer dimension).
Who this is for
BI Developers, Analytics Engineers, and Data Analysts who need to produce trustworthy, scalable reports without complex joins.
Prerequisites
- Comfort with basic SQL (SELECT, JOIN, GROUP BY).
- Know basic database concepts (tables, primary keys, foreign keys).
- Familiar with BI tools concepts (measures, filters, hierarchies).
Concept explained simply
A star schema has one central fact table with numeric measures (like sales amount) and several surrounding dimension tables (like Date, Customer, Product) connected by foreign keys. Dimensions are denormalized and easy to filter by; facts are optimized for aggregations.
Mental model: a hub with spokes
Picture the fact as the hub of a wheel (events, quantities, amounts). Each spoke is a dimension that describes the event (who, what, when, where, how). Because the spokes are wide (denormalized), it’s easy to slice and dice data without many joins.
Core components of a star schema
- Grain: The single level of detail of the fact table (e.g., one row per order line). Define this first.
- Fact table: Contains foreign keys to dimensions and additive/semi-additive measures (e.g., revenue, quantity).
- Dimension tables: Denormalized descriptors with surrogate keys (e.g., product_name, category, brand).
- Surrogate keys (SK): Stable integer keys that avoid issues with changing natural keys.
- Conformed dimensions: Shared dimensions used by multiple fact tables for consistent slicing.
- Degenerate dimension: A business identifier stored directly on the fact (e.g., order_number) when there’s no separate descriptive table.
- Slowly Changing Dimensions (SCD): Type 1 overwrite vs. Type 2 history-tracking with effective dates; choose per business need.
Design process (step-by-step)
- List business questions: What decisions will this model support?
- Choose the grain: One clear event level (e.g., order line, subscription day, ticket interaction).
- Define the fact: Measures and foreign keys that match the grain.
- Select dimensions: Date, Customer, Product, Channel, Geography, etc. Denormalize where practical.
- Keying: Use surrogate keys in dimensions; store their FKs in the fact. Consider a degenerate dimension for business IDs.
- Handle changes: Decide SCD strategy (Type 1 vs Type 2). Keep dates to bound history if Type 2.
- Validate with sample queries: Can you answer the key questions with simple SELECT + GROUP BY?
- Performance checks: Index keys, precompute necessary aggregates if needed, ensure date dimension is comprehensive.
Worked examples
Example 1: E-commerce sales
Questions: Daily revenue by product category, by country, and by device?
- Grain: One row per order line.
- Fact: fact_order_line(order_line_sk, order_sk, date_sk, customer_sk, product_sk, channel_sk, quantity, unit_price, discount_amount, gross_revenue, net_revenue). Include degenerate order_number on fact for convenience.
- Dimensions: dim_date(date_sk,...), dim_customer(customer_sk, country, segment,...), dim_product(product_sk, name, category, brand,...), dim_channel(channel_sk, device, source,...)
- Why it works: Slicing by category or country is a single join from fact to dimension; aggregations are additive at the chosen grain.
Sample query
SELECT d.calendar_date, p.category, c.country, SUM(f.net_revenue) AS revenue FROM fact_order_line f JOIN dim_date d ON f.date_sk = d.date_sk JOIN dim_product p ON f.product_sk = p.product_sk JOIN dim_customer c ON f.customer_sk = c.customer_sk GROUP BY 1,2,3;
Example 2: Subscription analytics
Questions: MRR by plan and region, churn by cohort, upgrades vs downgrades?
- Grain: One row per subscription-day snapshot (periodic snapshot fact) or one row per subscription event (transaction fact). Choose one.
- Fact (event-based): fact_sub_event(date_sk, customer_sk, subscription_sk, plan_sk, event_type, delta_mrr, seats_change).
- Dimensions: dim_date, dim_customer, dim_plan(plan_sk, plan_name, price_tier), dim_geography.
- Tip: Event-based facts let you net MRR changes over time; periodic snapshots simplify point-in-time reporting.
Sample query (MRR change by plan)
SELECT d.calendar_month, pl.plan_name, SUM(f.delta_mrr) AS mrr_change FROM fact_sub_event f JOIN dim_date d ON f.date_sk = d.date_sk JOIN dim_plan pl ON f.plan_sk = pl.plan_sk GROUP BY 1,2;
Example 3: Support tickets
Questions: Average resolution time by priority and agent team; backlog trend?
- Grain: One row per ticket status change (transaction) or ticket-day (snapshot). For average resolution time, transaction grain works well.
- Fact: fact_ticket_event(date_sk, ticket_sk, agent_sk, priority_sk, event_type, seconds_in_status).
- Dimensions: dim_date, dim_ticket(ticket_sk, category, channel), dim_agent(agent_sk, team), dim_priority(priority_sk, level).
- Note: For backlog trend, add a periodic snapshot fact_ticket_day with open_count.
Sample query (avg resolution time)
SELECT p.level AS priority, a.team, AVG(f.seconds_in_status) AS avg_secs FROM fact_ticket_event f JOIN dim_priority p ON f.priority_sk = p.priority_sk JOIN dim_agent a ON f.agent_sk = a.agent_sk WHERE f.event_type = 'resolved' GROUP BY 1,2;
Modeling tips for BI tools
- Prefer star over snowflake for reporting models. Denormalize dimensions to reduce joins and cardinality issues.
- Create a rich Date dimension with day, week, month, quarter, year, fiscal periods, holiday flags.
- Use integer surrogate keys for stable relationships and better performance.
- Define clear hierarchies (e.g., Category > Subcategory > Product) within a single dimension.
- Conform dimensions so multiple facts filter consistently by the same Date/Customer/Product tables.
- Track SCDs thoughtfully: Use Type 2 when historical analysis needs past attributes; filter by effective dates in your joins if needed.
Common mistakes and how to self-check
- Ambiguous grain: If a query double-counts, your grain isn’t clear. Self-check: Can you describe one and only one event per fact row?
- Too many snowflake joins: Slow dashboards. Self-check: Are most descriptors kept in a single wide dimension?
- Measures not additive: Ratios in facts cause errors. Self-check: Store base numerators/denominators; compute ratios in BI.
- Missing conformed dimensions: Inconsistent KPIs across facts. Self-check: Do sales and subscriptions share the same Date and Customer dimensions?
- No degenerate dimension for IDs: Hard to drill into orders/tickets. Self-check: Is the business ID readily filterable from the fact?
Exercises
Do the exercise below, then compare with the solution. Keep it simple and consistent with the design process.
- State the business questions (at least 3).
- Pick the fact grain.
- List the fact columns (keys + measures).
- List 3–5 dimensions with sample attributes.
- Write one sample SQL query you expect analysts to run.
Checklist before you review
- Grain is explicit and singular.
- All measures are additive at the chosen grain.
- Dimensions are denormalized and practical.
- Includes a degenerate business ID if useful.
- Sample query uses simple joins and a GROUP BY.
Practical projects
- Sales reporting mart: Build a star with fact_order_line and dimensions for Date, Customer, Product, Channel. Deliver a revenue by category dashboard.
- Subscription MRR hub: Event-based fact for plan changes with Date, Customer, Plan dims. Validate MRR rollforward from events.
- Support ops insights: Ticket event fact + Agent, Priority, Date dims. Show SLA breaches by team and priority trend.
Learning path
- Master star schema fundamentals (this lesson).
- Practice with one domain (e-commerce) until queries are effortless.
- Add SCD Type 2 to one dimension and learn history-aware joins.
- Introduce a second fact sharing conformed dimensions.
- Tune performance: indexes/partitioning and pre-aggregations where needed.
Next steps
- Refactor an existing snowflake into a star to speed up a slow report.
- Add a Date dimension with fiscal calendar and holidays.
- Model a second domain (marketing, finance, or support) with the same conformed Date and Customer dimensions.
Mini challenge
Design a star schema for marketing campaign performance. Questions to answer: cost per acquisition by channel and month, ROI by campaign, and lead-to-customer conversion by segment. Choose your grain, list fact/dim tables, and write one sample query.
Quick test: how it works
The quick test below is available to everyone for free. Only logged-in users will have their progress and results saved.