luvv to helpDiscover the Best Free Online Tools
Topic 4 of 8

Designing Dimension Tables

Learn Designing Dimension Tables for free with explanations, exercises, and a quick test (for Analytics Engineer).

Published: December 23, 2025 | Updated: December 23, 2025

Why this matters

Dimension tables turn raw identifiers into business-friendly context. As an Analytics Engineer, you use them to power dashboards, self-serve analytics, and reliable metrics. Real tasks you will do include:

  • Model customer, product, and date dimensions for star schemas.
  • Capture history (Slowly Changing Dimensions) so analysts can see "what we knew when".
  • Standardize attributes into conformed dimensions used across multiple facts.
  • Set safe defaults like unknown members to avoid broken joins.

Concept explained simply

A dimension table describes the who/what/where/when of your business. Facts store measurable events (orders, clicks). Dimensions provide the descriptive context (customer name, product category, order date).

Key ideas:

  • Grain: one row per entity at a specific level (for SCD2, per entity per time range).
  • Keys: use a surrogate key (integer) as the primary key for joins; store the natural key from source for lineage.
  • Conformed dimensions: shared, standardized dimensions used by multiple facts (e.g., Customer, Product, Date).
  • Role-playing dimensions: the same dimension used multiple times in a fact (e.g., Order Date and Ship Date both join to Date).
  • Slowly Changing Dimensions (SCD):
    • Type 0: never change.
    • Type 1: overwrite to latest value (no history).
    • Type 2: add a new row and mark old as not current to keep history.
  • Junk dimension: bundle low-cardinality flags/indicators into one small dimension.
  • Unknown/not applicable members: pre-seeded rows to keep joins from dropping facts.

Mental model

Think of a dimension as a biography of an entity. For SCD2, each chapter (row) covers a time range when certain details were true. Facts reference the chapter that was valid on the event date.

Design checklist (use before you build)

  • ☐ Declare the grain (e.g., one row per customer per history period).
  • ☐ Choose a surrogate key (integer) and retain the natural key.
  • ☐ Decide SCD behavior per attribute (Type 1 vs Type 2 vs static).
  • ☐ Add history columns: effective_from, effective_to, is_current, version.
  • ☐ Seed unknown/not applicable rows with stable surrogate keys.
  • ☐ Denormalize hierarchies into the dimension (avoid snowflaking unless truly needed).
  • ☐ Standardize names, data types, and default values.
  • ☐ Ensure conformance if the dimension will be shared across facts.

Worked examples

Example 1 — Customer dimension with SCD2 and Type 1 mix

Goal: Keep history for location changes; overwrite email corrections.

  • Grain: one row per customer per location period.
  • Keys: customer_sk surrogate, customer_nk from source CRM.
  • SCD: city/state/country = Type 2; email = Type 1.

Columns: customer_sk, customer_nk, full_name, email, city, state, country, effective_from, effective_to, is_current, scd_version, inserted_at, updated_at.

Example 2 — Product dimension with hierarchy

Goal: Expose category hierarchy and keep history of re-categorization.

  • Hierarchy flattened into columns: category, subcategory, brand.
  • Type 2 for category/subcategory; Type 1 for descriptive text fixes.
  • Seed unknown product row product_sk=0.
Example 3 — Date dimension and role-playing

Goal: Use the same dim_date for order_date and ship_date.

  • dim_date has date_sk (YYYYMMDD), date, day_name, week, month, quarter, year, is_weekend.
  • In queries, alias dim_date twice: dim_order_date and dim_ship_date.

Step-by-step method

  1. Declare the business grain and which attributes must track history.
  2. List attributes and assign SCD type per attribute.
  3. Design keys and history columns; plan unknown/default members.
  4. Flatten hierarchies; avoid snowflaking unless performance/maintenance requires it.
  5. Define DDL (types, constraints, defaults).
  6. Implement upsert logic (MERGE or staged INSERT/UPDATE) for SCD types.
  7. Validate with sample facts and date-effective joins.

Exercises

Do these to apply the concepts. Hints and full solutions are expandable below. Your progress in the quick test is available to everyone; only logged-in users will have progress saved.

Exercise 1 — Design a Customer SCD2 Dimension

You receive staging data stg_customers(customer_id, full_name, email, city, state, country, created_at, updated_at). Requirements:

  • Create a Customer dimension with a surrogate key and store the natural key.
  • Track history (Type 2) for city, state, country.
  • Overwrite (Type 1) for email fixes.
  • Add effective_from, effective_to, is_current, scd_version, inserted_at, updated_at.
  • Seed an unknown customer row.

Deliverables: DDL for the dimension and an example upsert process (MERGE or two-step) that handles Type 1 and Type 2 as specified.

Exercise 2 — Build a Date Dimension and Use Role-Playing Aliases

Design dim_date with date_sk (int YYYYMMDD) and helpful attributes (day_name, is_weekend, week_of_year, month, quarter, year). Then show how fact_orders(order_date, ship_date, total_amount) joins to dim_date twice: once as order date and once as ship date.

Deliverables: DDL for dim_date and a sample SELECT that demonstrates the two joins with aliases.

Self-check after exercises
  • Did you state the grain explicitly for each dimension?
  • Do your SCD2 columns make it easy to find the current row?
  • Did you include an unknown member and stable surrogate key?
  • Are Type 1 attributes overwritten without duplicating history rows?

Common mistakes and how to self-check

  • Mixing grains: Ensure one row per entity per history period. If your dimension allows duplicates for the same time range, fix the SCD logic.
  • No surrogate key: Relying only on natural keys makes joins brittle across systems.
  • Missing unknown rows: Without them, facts may drop in joins when data arrives late or dirty.
  • Snowflaking too early: Keep star-shaped, denormalized dimensions unless there’s a strong reason (extreme cardinality, governance constraints).
  • Unbounded current rows: Always set effective_to for closed rows and a clear is_current flag.
Quick self-audit checklist
  • ☐ Surrogate key present and integer.
  • ☐ Natural key captured for lineage.
  • ☐ SCD types clearly defined per attribute.
  • ☐ effective_from/effective_to/is_current/scd_version implemented.
  • ☐ Unknown and not-applicable members exist.

Practical projects

  • E-commerce model: Build dim_customer (SCD2), dim_product (SCD2 for category), dim_date, and connect to fact_orders. Validate a cohort analysis (by customer city at order time).
  • Subscription analytics: dim_account (SCD2 for plan), dim_date, fact_invoices, fact_usage. Verify revenue by plan over time.
  • Support analytics: Junk dimension for ticket flags (priority, has_attachment, is_escalated) plus dim_agent and fact_tickets. Confirm SLA compliance by priority.

Who this is for

Analytics Engineers, BI Developers, and Data Analysts who design or maintain star schemas and want stable, trustworthy dimensional models.

Prerequisites

  • Comfort with SQL DDL/DML (CREATE TABLE, INSERT/UPDATE/MERGE).
  • Basic understanding of star schemas (facts vs dimensions).
  • Familiarity with data ingestion and staging layers.

Learning path

  1. Design reliable dimension tables (this lesson).
  2. Model fact tables with correct grain and foreign keys.
  3. Implement SCD pipelines and backfills safely.
  4. Optimize for query performance and documentation/semantic layers.

Next steps

Try the quick test to confirm you got the essentials. Remember: the test is available to everyone; only logged-in users have saved progress.

Mini challenge

You have order-level flags: is_gift, is_expedited, has_coupon (each Y/N), plus fraud_risk_level with values {low, medium, high}. Propose a junk dimension design (columns and sample rows). Note how the fact table will reference it.

Quick Test

When you’re ready, take the quick test below.

Practice Exercises

2 exercises to complete

Instructions

You receive staging data stg_customers(customer_id, full_name, email, city, state, country, created_at, updated_at). Requirements:

  • Create a Customer dimension with a surrogate key and store the natural key.
  • Track history (Type 2) for city, state, country.
  • Overwrite (Type 1) for email fixes.
  • Add effective_from, effective_to, is_current, scd_version, inserted_at, updated_at.
  • Seed an unknown customer row.

Deliverables: DDL for the dimension and an example upsert process (MERGE or two-step) that handles Type 1 and Type 2 as specified.

Expected Output
A CREATE TABLE statement with: customer_sk (surrogate, integer), customer_nk (natural key), full_name, email, city, state, country, effective_from, effective_to, is_current, scd_version, inserted_at, updated_at. Plus an upsert example that overwrites email (Type 1) and closes/inserts rows for location changes (Type 2). Unknown customer row should exist with customer_sk=0.

Designing Dimension Tables — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Designing Dimension Tables?

AI Assistant

Ask questions about this tool