Topic Not Found

Why this matters

Business data changes: customers move, products get rebranded, employees change departments. Slowly Changing Dimensions (SCD) let you choose whether to keep history or overwrite attributes in your dimension tables. As a BI Analyst, this directly affects trend accuracy, retention analyses, regulatory reports, and stakeholder trust.

Marketing: track how a customer’s region changed over time for cohort analyses.
Finance: report sales by the product category that was valid at the time of sale.
Operations: measure performance before/after organizational changes.

Who this is for

BI Analysts and Analytics Engineers building star schemas.
Data Analysts who need trustworthy time-aware reporting.
Newcomers to dimensional modeling who want a practical foundation.

Prerequisites

Basic SQL (JOINs, CASE, aggregates).
Understanding of star schemas (facts and dimensions).
Familiarity with unique identifiers and data quality concepts.

Concept explained simply

An SCD is a rule for handling changes in dimension attributes over time. The core decision: Should we overwrite the value, or keep history? Different SCD types encode different answers.

Mental model

Think of a dimension record as a card in a file box. When a change happens, you can either edit the card (overwrite) or add a new card behind it (new version) and date-stamp it. Facts then point to the right card based on the transaction date.

The SCD types (practical guide)

Type 0 — Retain original (no changes)

Use when an attribute should never change historically (e.g., date of birth).

Behavior: Ignore incoming changes.
Pros: Simple, stable.
Cons: Risk of stale data if source is wrong.

Type 1 — Overwrite

Use for corrections or non-analytical attributes (e.g., email typo fix).

Behavior: Update existing row; no history.
Pros: Simple, storage-efficient.
Cons: Past reports reflect new value; can break as-of analysis.

Type 2 — Add new row (track full history)

Use for attributes you must analyze historically (e.g., region, customer segment, product category).

Behavior: Insert a new dimension row with new surrogate key; date ranges mark validity.
Typical fields: surrogate_key, natural_key, effective_start_date, effective_end_date, is_current, version.
Pros: Accurate history; time-travel queries.
Cons: More storage and complexity; careful joins required.

Type 3 — Limited history with previous value

Use when you only need the immediate prior value (e.g., Previous Region for simple before/after comparisons).

Behavior: Add columns like previous_region, change_date; keep one row.
Pros: Simple before/after reporting.
Cons: Only one step of history; loses older changes.

Type 4 — History in a mini-dimension

Use for rapidly changing attributes (e.g., customer preferences) stored in a separate mini-dimension linked from the big dimension or fact.

Pros: Keeps main dimension slim; flexible changes.
Cons: More joins; modeling effort.

Type 6 — Hybrid (1+2+3)

Use when you want full history (Type 2) plus convenience overwrites (Type 1) and a previous value (Type 3) in the current row.

Pros: Powerful, supports many queries easily.
Cons: Most complex to implement and maintain.

Modeling patterns and key fields

Surrogate key (e.g., customer_key): internal unique id per version.
Natural/business key (e.g., customer_id): stable real-world id.
effective_start_date / effective_end_date: validity window.
is_current flag: 1 for current row, 0 for historical.
version: 1, 2, 3… per natural key.
Audit columns: created_at, updated_at, source_system.

Worked examples

Example 1: Customer address change

Scenario: Customer 42 moves from "NY" to "CA" on 2023-04-10.

Type 1 outcome

customer_key  customer_id  region  is_current  start    end      version
101           42           CA      1          null     null     1
-- NY is lost; all past reports show CA

Type 2 outcome

customer_key  customer_id  region  is_current  start        end          version
101           42           NY      0          2021-09-01   2023-04-09   1
202           42           CA      1          2023-04-10   9999-12-31   2
-- Facts dated before 2023-04-10 join to NY; after join to CA

Example 2: Product category rename

Product A moves from "Snacks" to "Healthy Snacks" on 2022-11-01.

If analysis must reflect category at sale time: Type 2.
If it’s a pure naming correction and historical recoding is desired: Type 1.

Type 3 alternative

product_key product_id category          previous_category  change_date
5001        A          Healthy Snacks    Snacks            2022-11-01
-- Supports before/after comparisons only

Example 3: Employee department change

Employee 77 moves from Support to Sales on 2024-02-15.

Performance over time by department: Type 2.
Only need current department: Type 1.
Org churn high and detailed history needed: consider Type 2 or a mini-dimension (Type 4) if many fast-changing attributes.

Implementation steps (safe defaults)

Define attribute policies: list each attribute and assign SCD type.
Add fields: surrogate key, natural key, start/end dates, is_current, version.
Build change detection: compare incoming record to current dimension row by natural key and tracked attributes.
Apply actions:
- Type 1: UPDATE the current row.
- Type 2: UPDATE old row’s end_date and is_current=0; INSERT a new row with start_date today, is_current=1, version+1.
Audit and dedupe: guard against out-of-order events; handle multiple changes per day if needed.
Test with edge cases: same-day multiple updates, null-to-value transitions, late-arriving facts.

SQL pattern cheat-sheet (conceptual)

Detect changes

-- Compare tracked attributes
WHERE COALESCE(src.region,'') <> COALESCE(dim.region,'')

Type 1 update

UPDATE dim_customer d
SET region = s.region, updated_at = CURRENT_DATE
FROM staging_customer s
WHERE d.customer_id = s.customer_id
  AND COALESCE(d.region,'') <> COALESCE(s.region,'');

Type 2 upsert

-- Close current row
UPDATE dim_customer d
SET effective_end_date = CURRENT_DATE - INTERVAL '1 day', is_current = 0
FROM staging_customer s
WHERE d.customer_id = s.customer_id
  AND d.is_current = 1
  AND (COALESCE(d.region,'') <> COALESCE(s.region,''));

-- Insert new version
INSERT INTO dim_customer (
  customer_key, customer_id, region,
  effective_start_date, effective_end_date, is_current, version
)
SELECT nextval('seq_customer_key'), s.customer_id, s.region,
       CURRENT_DATE, DATE '9999-12-31', 1,
       COALESCE(d.version,0) + 1
FROM staging_customer s
LEFT JOIN dim_customer d
  ON d.customer_id = s.customer_id AND d.is_current = 1
WHERE COALESCE(d.region,'') <> COALESCE(s.region,'') OR d.customer_id IS NULL;

Choosing the right type (checklist)

Will past reports need the value that was true at the time? Choose Type 2.
Is it a correction that should fix history? Choose Type 1.
Only need one prior value for comparison? Consider Type 3.
Do attributes change very frequently? Consider Type 4 mini-dimension.
Need both convenience and full history? Consider Type 6.

Exercises

Do Exercise 1 (below) and use this checklist to self-validate:

You identified which attributes are Type 1 vs Type 2.
You produced versioned rows with correct start/end dates.
You flagged current rows correctly and kept surrogate keys stable per version.
You verified fact-to-dimension joins return expected historical values.

Common mistakes

Missing date bounds: Forgetting to close old rows (end_date) leads to two current rows. Self-check: Ensure only one is_current=1 per natural key.
Using natural key as primary key: Breaks Type 2 history. Self-check: Confirm separate surrogate key exists.
Comparing all columns: Causes false changes from audit fields. Self-check: Compare only tracked attributes.
Not handling nulls: Null vs empty string looks like a change. Self-check: Use COALESCE in comparisons.
Late-arriving facts: Facts join to wrong version. Self-check: Use transaction date in joins, not load date.

Practical projects

Customer dimension v1: Implement Type 2 for region and loyalty_tier; Type 1 for email. Acceptance: Queries can show sales by historical region and current email.
Product dimension with rename: Track historical category (Type 2) and maintain a previous_category column (Type 3). Acceptance: Before/after reports show both views.
Employee dimension with mini-dimension: Move fast-changing contact preferences to a mini-dimension (Type 4). Acceptance: Main dimension remains slim; joins still support history.

Learning path

Before: Star schema basics, surrogate vs natural keys, basic ELT.
Now: SCD Types 0/1/2/3/4/6 and when to use them.
Next: Late-arriving dimensions, fact versioning, snapshot facts, and slowly changing hierarchies.

Next steps

Pick one real dimension and classify attributes by SCD type.
Implement a pilot Type 2 pipeline for 1–2 attributes.
Validate with sample facts and reconcile with stakeholders.

Mini challenge

Retail scenario: Attributes for Customer — email, region, loyalty_tier, marketing_opt_in.

Decide SCD type for each attribute and justify.
List the fields you would add to the dimension for Type 2.
Describe how a fact on 2024-03-05 should join for a region change that happened 2024-03-01.

When you’re ready, take the quick test at the end of this page. The test is available to everyone; sign in if you want your progress saved.

Instructions

You have an existing dim_customer and a staging_customer daily snapshot. Track historical changes to region as Type 2. Treat email as Type 1. Use today as the load date.

Create/confirm fields in dim_customer: customer_key (surrogate), customer_id (natural), email, region, effective_start_date, effective_end_date (default 9999-12-31), is_current (1/0), version.
Detect changes where region differs between staging and the current dim row.
For Type 1 (email): overwrite the current row.
For Type 2 (region): close current row (end_date = today - 1, is_current=0) and insert a new row with start_date = today, is_current=1, version+1.
Test with this data:

dim_customer (before)
customer_key customer_id email              region start        end          is_current version
101          C42        a@x.com            NY     2023-01-01  9999-12-31   1         1

staging_customer (today)
customer_id email              region
C42         a_new@x.com        CA

Expected: email overwrites; region triggers a new version row.

Menu

Slowly Changing Dimensions Basics

Table of Contents