Why this matters
Business users ask questions like "What was revenue by customer segment at the time of purchase?" or "How did product category change over time?" Slowly Changing Dimensions (SCDs) give correct answers by preserving or updating history in your dimension tables.
- Customer 360: Track address or status changes without breaking historical reports.
- Product analytics: See how category reclassifications affect past and future sales.
- Regulatory reporting: Reproduce what the business knew at a point in time.
Who this is for
- BI Developers building star schemas and dashboards.
- Analytics Engineers owning ELT/ETL pipelines.
- Data Analysts who need trustworthy historical reporting.
Prerequisites
- Basic SQL (JOINs, INSERT/UPDATE or MERGE).
- Understanding of star schemas: facts vs. dimensions.
- Familiarity with primary keys and surrogate keys.
Concept explained simply
An SCD controls how a dimension row changes when a source attribute changes.
- Type 0 (Fixed): Never change the row after it’s loaded.
- Type 1 (Overwrite): Update the row in place. No history kept.
- Type 2 (Add row): Insert a new row with dates/version. Full history kept.
- Type 3 (Previous value): Keep current and previous value in the same row. Limited history.
When to use each type
- Type 0: Immutable IDs, birth dates, first activation date.
- Type 1: Corrections or attributes where only the latest value matters (e.g., email spelling fix, product name typo).
- Type 2: Attributes where historical reporting matters (e.g., address, segment, price band, category).
- Type 3: Simple “before/after” analysis, when you only need the previous state.
Mental model
Think of a dimension as a timeline for each business entity (customer/product). For Type 2, each change creates a new segment on the timeline with its own surrogate key. Facts link to the surrogate key valid at the time of the event, so historical reports remain correct.
Core patterns and columns
- surrogate_key: Technical primary key (e.g., integer).
- natural_key: Business key (e.g., customer_id from source).
- effective_from, effective_to: Validity window (dates or timestamps).
- is_current: 1 for current row, 0 for historical rows.
- version: Increments with each change (optional but helpful).
- attribute_hash (optional): Hash of tracked attributes to detect changes efficiently.
Detecting changes reliably
- Normalize source values (trim, case rules) before hashing.
- Hash only the tracked SCD attributes to avoid false positives.
- Compare current hash to last stored hash per natural_key.
Worked examples
Example 1 — Type 1 (overwrite) product name correction
Scenario: Product P-100 name changed from "Red Mug" to "Red Ceramic Mug" due to a typo fix. We don’t need historical name reporting.
-- Pseudocode SQL for Type 1 overwrite
UPDATE dim_product
SET product_name = src.product_name,
updated_at = CURRENT_TIMESTAMP
FROM staging_product src
WHERE dim_product.natural_key = src.product_id
AND dim_product.product_name <> src.product_name;
Result: Single row updated; facts always show the latest name.
Example 2 — Type 2 (add row) customer address change
Scenario: Customer C-001 moves from City A to City B. Historical orders should still report City A for past purchases.
-- Step 1: Find current active row
SELECT * FROM dim_customer dc
WHERE dc.natural_key = 'C-001' AND dc.is_current = 1;
-- Step 2: If address changed, close current row and insert new one
-- Close old row
UPDATE dim_customer
SET effective_to = CURRENT_DATE - INTERVAL '1 day',
is_current = 0
WHERE natural_key = 'C-001' AND is_current = 1;
-- Insert new row
INSERT INTO dim_customer (
surrogate_key, natural_key, address, city, effective_from, effective_to, is_current, version
) VALUES (
nextval('dim_customer_seq'), 'C-001', '123 New St', 'City B', CURRENT_DATE, DATE '9999-12-31', 1, previous_version + 1
);
Result: Two rows for C-001: historical (City A) and current (City B). Facts dated before the change link to the historical row; future facts link to the new row.
Example 3 — Type 3 (limited history) employee department
Scenario: Track employee’s current and previous department only.
-- On department change
UPDATE dim_employee
SET previous_department = current_department,
current_department = src.department,
updated_at = CURRENT_TIMESTAMP
FROM staging_employee src
WHERE dim_employee.natural_key = src.employee_id
AND dim_employee.current_department <> src.department;
Result: Only two states are kept (current and previous). Anything older is lost.
MERGE pattern (vendor-neutral pseudocode)
MERGE INTO dim_customer d
USING staged_changes s
ON (d.natural_key = s.natural_key AND d.is_current = 1)
WHEN MATCHED AND d.attribute_hash = s.attribute_hash THEN
-- No change
DO NOTHING
WHEN MATCHED AND d.attribute_hash <> s.attribute_hash THEN
-- Close current row and insert new one (Type 2)
UPDATE SET d.effective_to = s.change_date - INTERVAL '1 day', d.is_current = 0
INSERT (surrogate_key, natural_key, attrs..., attribute_hash, effective_from, effective_to, is_current, version)
VALUES (nextval('seq'), s.natural_key, s.attrs..., s.attribute_hash, s.change_date, DATE '9999-12-31', 1, d.version + 1)
WHEN NOT MATCHED THEN
-- New natural_key
INSERT (...)
VALUES (...);
Implementation steps
- Choose SCD type per attribute: Decide which attributes are Type 1 vs. Type 2 (or 3). Document your choices.
- Add core columns: surrogate_key, natural_key, effective_from/to, is_current, version, attribute_hash.
- Build staging: Cleanse, normalize, and compute attribute_hash per natural_key.
- Change detection: Compare staged hash to current active row’s hash.
- Apply writes: Type 1 = UPDATE; Type 2 = close current row + INSERT new row.
- Backfill facts (if needed): Ensure fact foreign keys point to the correct surrogate key for the event date.
- Schedule and monitor: Run daily/hourly; validate counts and change rates.
Exercises
These mirror the exercises below. Try them first; then reveal the solutions.
- Exercise 1: Design a Type 2 customer dimension for address changes.
- Exercise 2: Implement a Type 1 correction for product names.
- [ ] I can explain when to use Type 0/1/2/3.
- [ ] I can design Type 2 columns and logic.
- [ ] I can write an UPDATE for Type 1 and a close+insert for Type 2.
- [ ] I can validate historical correctness with a fact table sample.
Common mistakes and self-check
- Forgetting to close the old row in Type 2: Always set effective_to and is_current=0.
- Using event time incorrectly: effective_from should match the business effective time, not just load time (when available).
- Changing surrogate keys on Type 1 updates: Surrogate key must remain stable for Type 1.
- Hashing unnormalized data: Trimming/casing differences cause false changes.
- Not indexing natural_key + is_current: Leads to slow merges.
- Fact misalignment: Make sure fact foreign keys resolve using event date within [effective_from, effective_to].
Self-check procedure
- Pick 5 natural_keys with known changes. Ensure exactly one current row per key.
- Confirm that effective_from of the new row = change date.
- Ensure no overlapping windows for the same natural_key.
- Join facts by event date to dimension windows; eyeball 10 samples.
Practical projects
- Retail mini-warehouse: Build dim_customer (Type 2 for address, segment), dim_product (Type 1 name, Type 2 category), and a fact_sales table. Validate historical category reporting.
- People analytics: dim_employee with Type 2 job_level and Type 3 department. Show a dashboard of headcount by department over time.
- SaaS subscriptions: dim_account with Type 2 plan_tier and region. Compare MRR by tier before/after migrations.
Learning path
- Before: Star schema fundamentals, surrogate keys, and basic ELT.
- This lesson: SCD Types 0/1/2/3 and core columns and logic.
- Next: Advanced SCDs (Type 4/6), snapshot facts, change-data-capture (CDC) integration, late-arriving dimensions.
Mini challenge
You inherit a dim_product that uses Type 1 for category. Analysts complain historical category reporting is wrong after reclassifications. Propose a change plan in 5 bullets.
See a sample answer
- Switch category to Type 2 (define tracked attributes and hash).
- Add effective_from/effective_to/is_current/version columns.
- Backfill history from change logs or infer from past snapshots.
- Update fact resolution to join by event date to the correct surrogate_key.
- Communicate dashboard impacts and provide a validation report.
Progress & test
The quick test below is available to everyone. If you are logged in, your progress is saved automatically.
Ready? Scroll to the Quick Test section.