Menu

Topic 7 of 8

Slowly Changing Dimensions Patterns

Learn Slowly Changing Dimensions Patterns for free with explanations, exercises, and a quick test (for Data Architect).

Published: January 18, 2026 | Updated: January 18, 2026

Why this matters

As a Data Architect, you decide how analytical systems remember changes to business entities like customers, products, and employees. The right Slowly Changing Dimension (SCD) pattern ensures accurate historical reporting, point-in-time analysis, and trustworthy dashboards.

  • Regulatory and audit needs: prove what was true at a past date.
  • Attribution: tie revenue to the customer segment valid at the time of purchase.
  • Operations: show a sales rep’s current book vs. historical assignments.

Concept explained simply

A Slowly Changing Dimension is a way to record how descriptive attributes change over time. You choose whether to overwrite, add versioned rows, or keep snapshots elsewhere.

  • Type 0 (Fixed): never change after initial load. Good for immutable facts like birthdate.
  • Type 1 (Overwrite): update in place; no history. Good for corrections and non-historical KPIs.
  • Type 2 (Row versioning): add a new row with date ranges and a current flag. Good for time-travel analysis.
  • Type 3 (Previous value): keep limited history in extra columns (e.g., previous_department).
  • Type 4 (History table): current table + separate history table for full changes.
  • Type 6 (Hybrid 1+2+3): versioned rows plus current and previous columns for convenience.
  • Type 7 (Dual key): expose both surrogate key (SK) and business key (BK) to support flexible joins with either current or historical context.

Mental model

Think of a dimension as a passport:

  • Type 1: you edit the passport—only the latest details exist.
  • Type 2: you get a new passport each change, with validity dates.
  • Type 3: you keep the current passport and note the previous detail on a sticky note.
  • Type 4: the archive office stores all old passports; you carry only the current one.
Pattern selection quick helper
  • Need accurate point-in-time reporting? Choose Type 2 or 6.
  • Only need the latest value and corrections? Choose Type 1.
  • Need current plus “what it used to be” for one attribute? Choose Type 3.
  • Worried about dimension size but need history? Consider Type 4.
  • Need both BK-based and SK-based joins? Consider Type 7.

Schema building blocks

  • Surrogate key (e.g., customer_sk): integer identity for each versioned row.
  • Business key (e.g., customer_id): stable natural identifier from source.
  • Effective dating: valid_from_date, valid_to_date (often use high date like 9999-12-31 for open-ended).
  • Current flag: is_current = 1 for the active row.
  • Audit columns: inserted_at, updated_at, source_system, change_reason.

Worked examples

Example 1: Customer address changes

Need: Report historical sales by the region that was true at the time of purchase.

  • Choose: Type 2.
  • Why: Region is derived from address and affects historical attribution.
  • Design: customer_sk (PK), customer_id (BK), address, city, region, valid_from_date, valid_to_date, is_current.
  • Impact: Facts join to customer_sk from the transaction date via point-in-time lookup.
Example 2: Product name correction (typo)

Need: Fix display names; history not required.

  • Choose: Type 1.
  • Why: It’s a correction; no business need to analyze past misspellings.
  • Design: Update product_name in place; no new rows.
Example 3: Employee department reassignment

Need: Show historical department for audits and also quickly filter current and previous department.

  • Choose: Type 6 (1+2+3 hybrid).
  • Why: Combines full history (Type 2) with convenience columns (current/previous dept).
  • Design: employee_sk, employee_id, department, prev_department, valid_from_date, valid_to_date, is_current; overwrite convenience columns (Type 1) while adding new row (Type 2).
Example 4: Customer marketing preferences history offloaded

Need: Keep the dimension slim, but store full opt-in/opt-out history.

  • Choose: Type 4.
  • Why: Current table stays lean; history table stores changes.
  • Design: dim_customer_current + dim_customer_history with effective dating.

Implementation steps (Type 2 core)

  1. Stage changes: Land source deltas with a change hash for tracked attributes.
  2. Detect new vs. changed: Compare by business key; if hash differs, it’s a change.
  3. Close old row: Set valid_to_date = change_date - 1 day, is_current = 0.
  4. Insert new row: Copy attributes, new surrogate key, valid_from_date = change_date, valid_to_date = 9999-12-31, is_current = 1.
  5. Maintain indexes: BK + is_current; BK + valid_from_date for PIT joins.
Sample SQL skeleton (ANSI-like)
-- Assume staging_customer has the latest snapshot for the day, one row per customer_id.
-- dim_customer is Type 2.

-- 1) Close existing rows where change detected
UPDATE dim_customer d
JOIN staging_customer s ON d.customer_id = s.customer_id AND d.is_current = 1
SET d.valid_to_date = DATE_SUB(s.extract_date, INTERVAL 1 DAY),
    d.is_current = 0
WHERE MD5(CONCAT_WS('|', d.name, d.address, d.city, d.region))
    <> MD5(CONCAT_WS('|', s.name, s.address, s.city, s.region));

-- 2) Insert new versions for changed or new customers
INSERT INTO dim_customer (
  customer_id, name, address, city, region, valid_from_date, valid_to_date, is_current, source_system
)
SELECT s.customer_id, s.name, s.address, s.city, s.region,
       s.extract_date, DATE('9999-12-31'), 1, s.source_system
FROM staging_customer s
LEFT JOIN dim_customer d
  ON d.customer_id = s.customer_id AND d.is_current = 1
WHERE d.customer_id IS NULL -- new
   OR MD5(CONCAT_WS('|', d.name, d.address, d.city, d.region))
      <> MD5(CONCAT_WS('|', s.name, s.address, s.city, s.region));

Common mistakes and self-check

  • Forgetting to close prior row (leads to overlapping date ranges). Self-check: For a given business key, there must be at most one row with is_current = 1 and no overlapping valid_from/to ranges.
  • Using natural keys as surrogate keys. Self-check: Facts should join to SK; BK remains for change detection.
  • Versioning non-analytic attributes (e.g., phone formatting). Self-check: Only version attributes that drive analysis or compliance.
  • Not handling late-arriving data. Self-check: Ensure point-in-time logic can backfill effective dates correctly.
  • Ignoring null semantics. Self-check: Normalize null vs. empty strings before hashing/comparison.

Who this is for

  • Data Architects designing marts and governance for historical accuracy.
  • Data Engineers implementing CDC and dimension upserts.
  • Analytics Engineers modeling semantic layers for BI.

Prerequisites

  • Relational modeling basics and star schemas.
  • Intermediate SQL (JOIN, MERGE/UPSERT, window functions).
  • High-level understanding of CDC or batch snapshots.

Exercises you can do now

Do these before the quick test. Everyone can take the test; only logged-in users will have their progress saved.

  • Exercise 1: Choose SCD types and design columns for multiple attributes.
  • Exercise 2: Write a Type 2 merge for a customer dimension.

Exercise 1: Pattern selection and schema sketch

You own dim_customer. For each attribute, pick an SCD type and list the necessary columns or handling:

  • customer_status (Active, Suspended): analysts need point-in-time churn reporting.
  • email_address: corrections only; no need to analyze past emails.
  • loyalty_tier: need current tier and the previous tier for retention dashboards; also need time-travel sales attribution.

Deliverable: Short justification per attribute and a column list for the dimension (include keys, dates, flags).

Exercise 2: Implement a Type 2 upsert

Given staging_customer_daily (one row per customer_id per run) with columns: customer_id, name, city, region, extract_date. Implement SQL to:

  • Detect changes in name, city, or region.
  • Close old versions and insert new versions with valid_from/to and is_current.
  • Ensure no overlapping date ranges.

Self-checklist

  • I can explain when to use Types 1, 2, 3, 4, 6, and 7.
  • I can design a dimension with surrogate keys and effective dating.
  • I can implement a Type 2 merge without overlapping validity.
  • I can justify pattern choice with business questions.

Mini challenge

Your product dimension has attributes: product_name, brand, category, subcategory. Marketing wants only the latest names, but Finance needs historical category/subcategory for margin analysis. Propose a modeling approach (single dimension vs. hybrid), the SCD types per attribute, and how facts will point to the correct version. Keep it to 5-7 sentences.

Learning path

  1. Review star schema fundamentals: dimensions vs. facts and surrogate keys.
  2. Learn SCD types 0–7 and typical use cases.
  3. Practice Type 2 with effective dates, current flags, and hashing for change detection.
  4. Handle late-arriving and early-arriving dimensions; backfill dates safely.
  5. Integrate with CDC pipelines; reconcile deletes vs. soft-deletes.
  6. Add governance: data quality checks for overlaps, duplicate BKs, and null handling.

Practical projects

  • Build a Type 2 customer dimension and join a sales fact table for point-in-time region reporting.
  • Implement a Type 6 employee dimension and create a dashboard showing current and previous department headcount.
  • Migrate a Type 1 product dimension to Type 2 for category changes; validate metrics before and after.
  • Add automated checks: no overlapping validity, one current row per BK, and deterministic hashing.

Next steps

  • Finish the exercises above.
  • Take the quick test to confirm understanding.
  • Apply one pattern to a real dataset in your environment and measure impact on a key KPI.

Practice Exercises

2 exercises to complete

Instructions

Decide SCD types and design columns for dim_customer:

  • customer_status requires time-travel analysis.
  • email_address only needs latest value.
  • loyalty_tier needs both time-travel attribution and easy access to previous tier.

Write: chosen type per attribute, justification, and a column list including keys, effective dates, flags, and any convenience columns.

Expected Output
A short design note listing SCD type per attribute with reasoning, plus a column schema for dim_customer.

Slowly Changing Dimensions Patterns — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Slowly Changing Dimensions Patterns?

AI Assistant

Ask questions about this tool