Why this matters
Dimension tables turn raw identifiers into business-friendly context. As an Analytics Engineer, you use them to power dashboards, self-serve analytics, and reliable metrics. Real tasks you will do include:
- Model customer, product, and date dimensions for star schemas.
- Capture history (Slowly Changing Dimensions) so analysts can see "what we knew when".
- Standardize attributes into conformed dimensions used across multiple facts.
- Set safe defaults like unknown members to avoid broken joins.
Concept explained simply
A dimension table describes the who/what/where/when of your business. Facts store measurable events (orders, clicks). Dimensions provide the descriptive context (customer name, product category, order date).
Key ideas:
- Grain: one row per entity at a specific level (for SCD2, per entity per time range).
- Keys: use a surrogate key (integer) as the primary key for joins; store the natural key from source for lineage.
- Conformed dimensions: shared, standardized dimensions used by multiple facts (e.g., Customer, Product, Date).
- Role-playing dimensions: the same dimension used multiple times in a fact (e.g., Order Date and Ship Date both join to Date).
- Slowly Changing Dimensions (SCD):
- Type 0: never change.
- Type 1: overwrite to latest value (no history).
- Type 2: add a new row and mark old as not current to keep history.
- Junk dimension: bundle low-cardinality flags/indicators into one small dimension.
- Unknown/not applicable members: pre-seeded rows to keep joins from dropping facts.
Mental model
Think of a dimension as a biography of an entity. For SCD2, each chapter (row) covers a time range when certain details were true. Facts reference the chapter that was valid on the event date.
Design checklist (use before you build)
- ☐ Declare the grain (e.g., one row per customer per history period).
- ☐ Choose a surrogate key (integer) and retain the natural key.
- ☐ Decide SCD behavior per attribute (Type 1 vs Type 2 vs static).
- ☐ Add history columns: effective_from, effective_to, is_current, version.
- ☐ Seed unknown/not applicable rows with stable surrogate keys.
- ☐ Denormalize hierarchies into the dimension (avoid snowflaking unless truly needed).
- ☐ Standardize names, data types, and default values.
- ☐ Ensure conformance if the dimension will be shared across facts.
Worked examples
Example 1 — Customer dimension with SCD2 and Type 1 mix
Goal: Keep history for location changes; overwrite email corrections.
- Grain: one row per customer per location period.
- Keys: customer_sk surrogate, customer_nk from source CRM.
- SCD: city/state/country = Type 2; email = Type 1.
Columns: customer_sk, customer_nk, full_name, email, city, state, country, effective_from, effective_to, is_current, scd_version, inserted_at, updated_at.
Example 2 — Product dimension with hierarchy
Goal: Expose category hierarchy and keep history of re-categorization.
- Hierarchy flattened into columns: category, subcategory, brand.
- Type 2 for category/subcategory; Type 1 for descriptive text fixes.
- Seed unknown product row product_sk=0.
Example 3 — Date dimension and role-playing
Goal: Use the same dim_date for order_date and ship_date.
- dim_date has date_sk (YYYYMMDD), date, day_name, week, month, quarter, year, is_weekend.
- In queries, alias dim_date twice: dim_order_date and dim_ship_date.
Step-by-step method
- Declare the business grain and which attributes must track history.
- List attributes and assign SCD type per attribute.
- Design keys and history columns; plan unknown/default members.
- Flatten hierarchies; avoid snowflaking unless performance/maintenance requires it.
- Define DDL (types, constraints, defaults).
- Implement upsert logic (MERGE or staged INSERT/UPDATE) for SCD types.
- Validate with sample facts and date-effective joins.
Exercises
Do these to apply the concepts. Hints and full solutions are expandable below. Your progress in the quick test is available to everyone; only logged-in users will have progress saved.
Exercise 1 — Design a Customer SCD2 Dimension
You receive staging data stg_customers(customer_id, full_name, email, city, state, country, created_at, updated_at). Requirements:
- Create a Customer dimension with a surrogate key and store the natural key.
- Track history (Type 2) for city, state, country.
- Overwrite (Type 1) for email fixes.
- Add effective_from, effective_to, is_current, scd_version, inserted_at, updated_at.
- Seed an unknown customer row.
Deliverables: DDL for the dimension and an example upsert process (MERGE or two-step) that handles Type 1 and Type 2 as specified.
Exercise 2 — Build a Date Dimension and Use Role-Playing Aliases
Design dim_date with date_sk (int YYYYMMDD) and helpful attributes (day_name, is_weekend, week_of_year, month, quarter, year). Then show how fact_orders(order_date, ship_date, total_amount) joins to dim_date twice: once as order date and once as ship date.
Deliverables: DDL for dim_date and a sample SELECT that demonstrates the two joins with aliases.
Self-check after exercises
- Did you state the grain explicitly for each dimension?
- Do your SCD2 columns make it easy to find the current row?
- Did you include an unknown member and stable surrogate key?
- Are Type 1 attributes overwritten without duplicating history rows?
Common mistakes and how to self-check
- Mixing grains: Ensure one row per entity per history period. If your dimension allows duplicates for the same time range, fix the SCD logic.
- No surrogate key: Relying only on natural keys makes joins brittle across systems.
- Missing unknown rows: Without them, facts may drop in joins when data arrives late or dirty.
- Snowflaking too early: Keep star-shaped, denormalized dimensions unless there’s a strong reason (extreme cardinality, governance constraints).
- Unbounded current rows: Always set effective_to for closed rows and a clear is_current flag.
Quick self-audit checklist
- ☐ Surrogate key present and integer.
- ☐ Natural key captured for lineage.
- ☐ SCD types clearly defined per attribute.
- ☐ effective_from/effective_to/is_current/scd_version implemented.
- ☐ Unknown and not-applicable members exist.
Practical projects
- E-commerce model: Build dim_customer (SCD2), dim_product (SCD2 for category), dim_date, and connect to fact_orders. Validate a cohort analysis (by customer city at order time).
- Subscription analytics: dim_account (SCD2 for plan), dim_date, fact_invoices, fact_usage. Verify revenue by plan over time.
- Support analytics: Junk dimension for ticket flags (priority, has_attachment, is_escalated) plus dim_agent and fact_tickets. Confirm SLA compliance by priority.
Who this is for
Analytics Engineers, BI Developers, and Data Analysts who design or maintain star schemas and want stable, trustworthy dimensional models.
Prerequisites
- Comfort with SQL DDL/DML (CREATE TABLE, INSERT/UPDATE/MERGE).
- Basic understanding of star schemas (facts vs dimensions).
- Familiarity with data ingestion and staging layers.
Learning path
- Design reliable dimension tables (this lesson).
- Model fact tables with correct grain and foreign keys.
- Implement SCD pipelines and backfills safely.
- Optimize for query performance and documentation/semantic layers.
Next steps
Try the quick test to confirm you got the essentials. Remember: the test is available to everyone; only logged-in users have saved progress.
Mini challenge
You have order-level flags: is_gift, is_expedited, has_coupon (each Y/N), plus fraud_risk_level with values {low, medium, high}. Propose a junk dimension design (columns and sample rows). Note how the fact table will reference it.
Quick Test
When you’re ready, take the quick test below.