How to learn Data Modeling Dimensional Star for Analytics Engineer for free

Why this skill matters for Analytics Engineers

Dimensional star modeling is the core technique Analytics Engineers use to transform raw data into clear, queryable structures for BI and self-serve analytics. Good star schemas are fast to query, easy to understand, and stable as the business evolves. Mastering this skill lets you deliver reliable dashboards, consistent metrics, and scalable data models that downstream analysts trust.

Quick recap: Star vs. Snowflake (when to use which)

Star schema keeps dimensions denormalized (one hop from fact to each dimension). It’s simpler and typically faster for BI tools. Snowflake normalizes some dimensions into multiple related tables. Prefer star for most analytics; snowflake sparingly to reduce extreme duplication or to handle very large, reusable hierarchies.

Who this is for

Analytics Engineers and BI Developers building semantic layers and marts.
Data Engineers who need business-friendly models on top of staging layers.
Data Analysts who want consistent, well-modeled sources for dashboards and ad-hoc queries.

Prerequisites

Comfortable with SQL SELECTs, JOINs, GROUP BY, and window functions.
Basic data warehousing concepts (staging, marts, ETL/ELT).
Familiarity with core business processes (e.g., orders, subscriptions, support tickets).

Learning path

Step 1 — Define grain and business process

Choose one clear process (e.g., an order line). Write a one-sentence grain: “One row per order line per day.” Grain drives every other decision.

Step 2 — Design the fact table

Select only additive or semi-additive measures and the foreign keys that link to each dimension. Avoid attributes in the fact.

Step 3 — Design dimensions

Group descriptive attributes by business entities (customer, product, date). Add surrogate keys to decouple from source keys.

Step 4 — Handle change (SCDs)

Type 1 for corrections; Type 2 for history. Use effective_from/effective_to and is_current flags for Type 2.

Step 5 — Conform across marts

Re-use shared dimensions (e.g., dim_date, dim_customer) so metrics remain consistent across business areas.

Step 6 — Snapshots and periodic facts

Model point-in-time states (e.g., daily inventory levels) with snapshot facts; define clear snapshot frequency and measures.

Step 7 — Naming & maintainability

Adopt consistent names, column casing, and tested patterns. Document grain and business rules in the model.

Mini tasks to practice each step

Write the grain statement for a refunds process (1–2 lines).
List three additive measures for order lines.
Draft 8–12 attributes for a product dimension.
Mark which product attributes should be Type 2 vs. Type 1.
Identify two conformed dimensions you can share between Sales and Marketing.
Define the snapshot frequency and measures for inventory (daily count, value).

Worked examples

Example 1: Sales star schema (order line grain)

-- Fact: one row per order_line_id
create table fact_order_lines as
select
  ol.order_line_id,
  ol.order_id,
  ol.product_id as product_sk,     -- will join to dim_product.surrogate_key
  o.customer_id as customer_sk,    -- will join to dim_customer.surrogate_key
  o.order_date   as date_sk,       -- will join to dim_date.date_key
  ol.quantity,
  ol.unit_price,
  ol.quantity * ol.unit_price as gross_revenue
from stg_order_lines ol
join stg_orders o on o.order_id = ol.order_id;

-- Dimension example (surrogate key pattern)
create table dim_product as
select
  row_number() over (order by product_id) as product_sk,
  product_id as product_nk,  -- natural key from source
  product_name,
  category_name,
  brand_name
from stg_products;

-- Query: total revenue by category
select
  p.category_name,
  sum(f.gross_revenue) as revenue
from fact_order_lines f
join dim_product p on f.product_sk = p.product_sk
group by 1
order by 2 desc;

Why this works

The fact table contains measures and foreign keys only. Descriptions live in dimensions. Analysts can slice quickly by any dimension attribute.

Example 2: Surrogate vs natural keys

-- Natural key from ERP: product_code (may be reused or edited)
-- Surrogate key (product_sk) is stable and internal.

-- Joining via surrogate keys protects facts if product_code changes.
select f.order_line_id, p.product_name
from fact_order_lines f
join dim_product p on f.product_sk = p.product_sk;

-- If we joined on natural keys and they changed, historical facts could "move" rows.
-- Surrogate keys prevent this.

Example 3: SCD Type 2 customer dimension

create table dim_customer as
select
  surrogate_key,
  customer_id as customer_nk,
  customer_name,
  segment,
  effective_from,
  effective_to,
  is_current
from (
  -- Use change detection logic in your ELT to insert a new row when tracked attributes change
) scd2;

-- Joining fact to Type 2 using the version active at event time
select f.order_line_id, c.segment
from fact_order_lines f
join dim_customer c
  on f.customer_sk = c.surrogate_key and c.is_current = true; -- or join on date between effective_from and effective_to if you bake dating into the surrogate

Tip: Choosing Type 1 vs Type 2

Type 1 for corrections and non-analytic attributes; Type 2 for attributes you analyze over time (e.g., customer segment, region).

Example 4: Periodic snapshot fact for inventory

-- Grain: one row per product per warehouse per day
create table fact_inventory_daily as
select
  p.product_sk,
  w.warehouse_sk,
  d.date_key as date_sk,
  inv.on_hand_qty,
  inv.on_hand_qty * p.standard_cost as inventory_value
from stg_inventory_snapshot inv
join dim_product p on inv.product_id = p.product_nk
join dim_warehouse w on inv.warehouse_id = w.warehouse_nk
join dim_date d on inv.snapshot_date = d.date_actual;

-- Query: month-end inventory value
select
  d.calendar_month,
  sum(f.inventory_value) as month_end_value
from fact_inventory_daily f
join dim_date d on f.date_sk = d.date_key
where d.is_month_end = true
group by 1;

Example 5: Conformed customer across Sales and Support

-- Both marts use the same dim_customer (same surrogate keys)
-- Sales metric: revenue by segment vs. Support metric: tickets by segment

select c.segment, sum(sales.gross_revenue) as revenue, count(t.ticket_id) as tickets
from fact_order_lines sales
join dim_customer c on sales.customer_sk = c.surrogate_key
left join fact_support_tickets t on t.customer_sk = c.surrogate_key
group by 1;

Why conformance matters

Shared dimensions guarantee that a segment, region, or product means the same thing across departments. This preserves metric consistency.

Drills and exercises

Write a one-sentence grain for a subscription billing fact (renewal events).
List five attributes for dim_date that enable common time filters (e.g., week number, is_month_end).
Mark three product attributes as Type 2 candidates and justify why.
Sketch a dim_campaign used by both Sales and Marketing (conformed).
Define snapshot grain and frequency for daily active users (DAU) by app and country.
Rename three columns to meet your naming conventions (snake_case, consistent suffixes like _sk and _nk).

Common mistakes and debugging tips

Unclear grain: If queries double-count, check for duplicate grain (e.g., order vs. order line). Write the grain at the top of the model file.
Attributes in facts: Move descriptive fields to dimensions; keep facts lean with measures and foreign keys.
Joining on natural keys: Switch to surrogate keys to preserve history and stability.
Overusing Type 2: Only track attributes that matter for historical analysis; otherwise Type 1.
Missing conformance: Standardize and reuse dim_date, dim_customer, and other shared dimensions.
Ambiguous naming: Adopt a convention (snake_case, _sk for surrogate keys, _nk for natural keys, _at/_date for timestamps).
Snapshot confusion: Ensure snapshot frequency and effective dating align with reporting cadence (e.g., daily vs. month-end).

Debugging joins checklist

Confirm each foreign key in fact matches a surrogate in dimension (row counts after join).
Check for orphan facts (left join dims and filter where dim.surrogate_key is null).
Validate SCD dating: for Type 2, ensure the event date falls between effective_from and effective_to.
Test aggregation logic with a small, known subset first.

Mini project: Sales and Inventory marts

Goal: Build a Sales star (order line grain) and an Inventory periodic snapshot, both using conformed dim_date and dim_product.

Define grains: order line for Sales, product-warehouse-day for Inventory.
Create dim_date (date_key, date_actual, calendar_month, is_month_end).
Create dim_product with surrogate key and a few attributes (brand, category). Choose Type 1 or 2 for category changes.
Build fact_order_lines with measures (quantity, revenue) and foreign keys.
Build fact_inventory_daily with snapshot measures (on_hand_qty, inventory_value).
Write two queries: revenue by category month-over-month; month-end inventory value.
Document grain and SCD choices in comments at the top of each model.

Acceptance criteria

Both facts join cleanly to dimensions with zero orphan rows.
Time filters via dim_date work in both marts the same way.
Category rollups match between Sales and Inventory queries.

Subskills

Defining Grain And Business Processes — State the exact row-level definition and the event/process modeled. Everything else depends on this.
Designing Fact Tables — Keep facts to measures and foreign keys. Ensure measures are additive or semi-additive as appropriate.
Designing Dimension Tables — Organize descriptive attributes; include surrogate keys; ensure clean hierarchies.
Surrogate Keys And Natural Keys — Use surrogate keys for stable joins; retain natural keys for lineage and deduplication.
Slowly Changing Dimensions Type 1 Type 2 Basics — Decide which attributes need historical tracking; implement effective dating for Type 2.
Conformed Dimensions Across Marts — Reuse shared dims to keep metrics consistent across domains.
Snapshot Facts And Periodic Snapshots — Model point-in-time tables with clear frequencies and semi-additive measures.
Naming Conventions And Model Maintainability — Consistent names, casing, suffixes, and documentation keep models scalable.

Next steps

Refactor one existing report source into a proper star schema with a documented grain.
Add SCD Type 2 to a high-impact dimension (e.g., customer or product) and validate historical queries.
Introduce a conformed dimension to align two departments’ metrics.
Take the exam below to check readiness. Everyone can take it; log in to save progress.

Menu

Data Modeling Dimensional Star

Table of Contents

Why this skill matters for Analytics Engineers

Who this is for

Prerequisites

Learning path

Worked examples

Example 1: Sales star schema (order line grain)

Example 2: Surrogate vs natural keys

Example 3: SCD Type 2 customer dimension

Example 4: Periodic snapshot fact for inventory

Example 5: Conformed customer across Sales and Support

Drills and exercises

Common mistakes and debugging tips

Mini project: Sales and Inventory marts

Subskills

Next steps

Data Modeling Dimensional Star — Skill Exam

Topics

Naming Conventions And Model Maintainability

Defining Grain And Business Processes

Designing Fact Tables

Designing Dimension Tables

Surrogate Keys And Natural Keys

Slowly Changing Dimensions Type 1 Type 2 Basics

Conformed Dimensions Across Marts

Snapshot Facts And Periodic Snapshots

Have questions about Data Modeling Dimensional Star?

AI Assistant