Why this matters
Business data changes: customers move, products get rebranded, employees change departments. Slowly Changing Dimensions (SCD) let you choose whether to keep history or overwrite attributes in your dimension tables. As a BI Analyst, this directly affects trend accuracy, retention analyses, regulatory reports, and stakeholder trust.
- Marketing: track how a customer’s region changed over time for cohort analyses.
- Finance: report sales by the product category that was valid at the time of sale.
- Operations: measure performance before/after organizational changes.
Who this is for
- BI Analysts and Analytics Engineers building star schemas.
- Data Analysts who need trustworthy time-aware reporting.
- Newcomers to dimensional modeling who want a practical foundation.
Prerequisites
- Basic SQL (JOINs, CASE, aggregates).
- Understanding of star schemas (facts and dimensions).
- Familiarity with unique identifiers and data quality concepts.
Concept explained simply
An SCD is a rule for handling changes in dimension attributes over time. The core decision: Should we overwrite the value, or keep history? Different SCD types encode different answers.
Mental model
Think of a dimension record as a card in a file box. When a change happens, you can either edit the card (overwrite) or add a new card behind it (new version) and date-stamp it. Facts then point to the right card based on the transaction date.
The SCD types (practical guide)
Type 0 — Retain original (no changes)
Use when an attribute should never change historically (e.g., date of birth).
- Behavior: Ignore incoming changes.
- Pros: Simple, stable.
- Cons: Risk of stale data if source is wrong.
Type 1 — Overwrite
Use for corrections or non-analytical attributes (e.g., email typo fix).
- Behavior: Update existing row; no history.
- Pros: Simple, storage-efficient.
- Cons: Past reports reflect new value; can break as-of analysis.
Type 2 — Add new row (track full history)
Use for attributes you must analyze historically (e.g., region, customer segment, product category).
- Behavior: Insert a new dimension row with new surrogate key; date ranges mark validity.
- Typical fields: surrogate_key, natural_key, effective_start_date, effective_end_date, is_current, version.
- Pros: Accurate history; time-travel queries.
- Cons: More storage and complexity; careful joins required.
Type 3 — Limited history with previous value
Use when you only need the immediate prior value (e.g., Previous Region for simple before/after comparisons).
- Behavior: Add columns like previous_region, change_date; keep one row.
- Pros: Simple before/after reporting.
- Cons: Only one step of history; loses older changes.
Type 4 — History in a mini-dimension
Use for rapidly changing attributes (e.g., customer preferences) stored in a separate mini-dimension linked from the big dimension or fact.
- Pros: Keeps main dimension slim; flexible changes.
- Cons: More joins; modeling effort.
Type 6 — Hybrid (1+2+3)
Use when you want full history (Type 2) plus convenience overwrites (Type 1) and a previous value (Type 3) in the current row.
- Pros: Powerful, supports many queries easily.
- Cons: Most complex to implement and maintain.
Modeling patterns and key fields
- Surrogate key (e.g., customer_key): internal unique id per version.
- Natural/business key (e.g., customer_id): stable real-world id.
- effective_start_date / effective_end_date: validity window.
- is_current flag: 1 for current row, 0 for historical.
- version: 1, 2, 3… per natural key.
- Audit columns: created_at, updated_at, source_system.
Worked examples
Example 1: Customer address change
Scenario: Customer 42 moves from "NY" to "CA" on 2023-04-10.
Type 1 outcome
customer_key customer_id region is_current start end version 101 42 CA 1 null null 1 -- NY is lost; all past reports show CA
Type 2 outcome
customer_key customer_id region is_current start end version 101 42 NY 0 2021-09-01 2023-04-09 1 202 42 CA 1 2023-04-10 9999-12-31 2 -- Facts dated before 2023-04-10 join to NY; after join to CA
Example 2: Product category rename
Product A moves from "Snacks" to "Healthy Snacks" on 2022-11-01.
- If analysis must reflect category at sale time: Type 2.
- If it’s a pure naming correction and historical recoding is desired: Type 1.
Type 3 alternative
product_key product_id category previous_category change_date 5001 A Healthy Snacks Snacks 2022-11-01 -- Supports before/after comparisons only
Example 3: Employee department change
Employee 77 moves from Support to Sales on 2024-02-15.
- Performance over time by department: Type 2.
- Only need current department: Type 1.
- Org churn high and detailed history needed: consider Type 2 or a mini-dimension (Type 4) if many fast-changing attributes.
Implementation steps (safe defaults)
- Define attribute policies: list each attribute and assign SCD type.
- Add fields: surrogate key, natural key, start/end dates, is_current, version.
- Build change detection: compare incoming record to current dimension row by natural key and tracked attributes.
- Apply actions:
- Type 1: UPDATE the current row.
- Type 2: UPDATE old row’s end_date and is_current=0; INSERT a new row with start_date today, is_current=1, version+1.
- Audit and dedupe: guard against out-of-order events; handle multiple changes per day if needed.
- Test with edge cases: same-day multiple updates, null-to-value transitions, late-arriving facts.
SQL pattern cheat-sheet (conceptual)
Detect changes
-- Compare tracked attributes WHERE COALESCE(src.region,'') <> COALESCE(dim.region,'')
Type 1 update
UPDATE dim_customer d SET region = s.region, updated_at = CURRENT_DATE FROM staging_customer s WHERE d.customer_id = s.customer_id AND COALESCE(d.region,'') <> COALESCE(s.region,'');
Type 2 upsert
-- Close current row
UPDATE dim_customer d
SET effective_end_date = CURRENT_DATE - INTERVAL '1 day', is_current = 0
FROM staging_customer s
WHERE d.customer_id = s.customer_id
AND d.is_current = 1
AND (COALESCE(d.region,'') <> COALESCE(s.region,''));
-- Insert new version
INSERT INTO dim_customer (
customer_key, customer_id, region,
effective_start_date, effective_end_date, is_current, version
)
SELECT nextval('seq_customer_key'), s.customer_id, s.region,
CURRENT_DATE, DATE '9999-12-31', 1,
COALESCE(d.version,0) + 1
FROM staging_customer s
LEFT JOIN dim_customer d
ON d.customer_id = s.customer_id AND d.is_current = 1
WHERE COALESCE(d.region,'') <> COALESCE(s.region,'') OR d.customer_id IS NULL;Choosing the right type (checklist)
- Will past reports need the value that was true at the time? Choose Type 2.
- Is it a correction that should fix history? Choose Type 1.
- Only need one prior value for comparison? Consider Type 3.
- Do attributes change very frequently? Consider Type 4 mini-dimension.
- Need both convenience and full history? Consider Type 6.
Exercises
Do Exercise 1 (below) and use this checklist to self-validate:
- You identified which attributes are Type 1 vs Type 2.
- You produced versioned rows with correct start/end dates.
- You flagged current rows correctly and kept surrogate keys stable per version.
- You verified fact-to-dimension joins return expected historical values.
Common mistakes
- Missing date bounds: Forgetting to close old rows (end_date) leads to two current rows. Self-check: Ensure only one is_current=1 per natural key.
- Using natural key as primary key: Breaks Type 2 history. Self-check: Confirm separate surrogate key exists.
- Comparing all columns: Causes false changes from audit fields. Self-check: Compare only tracked attributes.
- Not handling nulls: Null vs empty string looks like a change. Self-check: Use COALESCE in comparisons.
- Late-arriving facts: Facts join to wrong version. Self-check: Use transaction date in joins, not load date.
Practical projects
- Customer dimension v1: Implement Type 2 for region and loyalty_tier; Type 1 for email. Acceptance: Queries can show sales by historical region and current email.
- Product dimension with rename: Track historical category (Type 2) and maintain a previous_category column (Type 3). Acceptance: Before/after reports show both views.
- Employee dimension with mini-dimension: Move fast-changing contact preferences to a mini-dimension (Type 4). Acceptance: Main dimension remains slim; joins still support history.
Learning path
- Before: Star schema basics, surrogate vs natural keys, basic ELT.
- Now: SCD Types 0/1/2/3/4/6 and when to use them.
- Next: Late-arriving dimensions, fact versioning, snapshot facts, and slowly changing hierarchies.
Next steps
- Pick one real dimension and classify attributes by SCD type.
- Implement a pilot Type 2 pipeline for 1–2 attributes.
- Validate with sample facts and reconcile with stakeholders.
Mini challenge
Retail scenario: Attributes for Customer — email, region, loyalty_tier, marketing_opt_in.
- Decide SCD type for each attribute and justify.
- List the fields you would add to the dimension for Type 2.
- Describe how a fact on 2024-03-05 should join for a region change that happened 2024-03-01.
When you’re ready, take the quick test at the end of this page. The test is available to everyone; sign in if you want your progress saved.