luvv to helpDiscover the Best Free Online Tools
Topic 4 of 8

Surrogate Keys And Natural Keys

Learn Surrogate Keys And Natural Keys for free with explanations, exercises, and a quick test (for BI Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Who this is for

You are a BI Analyst or aspiring analyst who designs reports, models data for dashboards, or validates warehouse outputs. You need reliable joins across facts and dimensions.

Prerequisites

  • Basic understanding of star schemas (facts and dimensions)
  • Comfort with SQL joins and primary/foreign keys
  • Awareness of Slowly Changing Dimensions (Type 1 and Type 2)

Why this matters

Real tasks you will face:

  • Joining sales facts to the correct customer version after an address change
  • Merging two CRMs where customer IDs overlap
  • Handling late-arriving facts that reference old business identifiers
  • Producing consistent, auditable KPIs even when source systems rename/reuse codes
Quick example of impact

If a product code is reused for a different item, reports joined by that code will mix two products. A surrogate key prevents this by assigning a new, unique warehouse key per product version.

Concept explained simply

Natural key (NK): A business identifier that already exists in source data, like customer_number, email, SKU, or ISO country code.

Surrogate key (SK): A warehouse-generated, meaningless identifier (often an integer) used to join facts to the right dimension row, regardless of how source identifiers change.

Mental model

  • Natural key = "name on the door" (useful to humans, can change)
  • Surrogate key = "internal badge ID" (used for secure, stable access)
  • Facts carry the badge ID (SK) so they always point to the exact dimension row/version you intended

When to use which

  • Use SK for most dimensions, especially with Type 2 history. This guarantees stable joins to the correct row over time.
  • Use NK for lookup/search and deduplication logic. Keep it as an attribute and often as a business key in staging.
  • Sometimes the NK is perfect as the key (e.g., Date dimension with yyyymmdd). In that case, you can use the natural date key directly.
Decision guide
  • If the business identifier can change, be reused, or vary across sources → Create an SK.
  • If you maintain history (Type 2) → Facts must reference SK.
  • If the identifier is global, stable, and tiny (e.g., ISO country code) → NK can work; SK still acceptable.

Worked examples

Example 1: Customer dimension (Type 2)

Source provides customer_number and attributes. When address changes, a new dimension row is created with a new SK. Facts for a given date join to the SK that was active on that date. The customer_number (NK) is kept as a descriptive attribute.

Example 2: Product code reuse

Supplier reuses PRODUCT_CODE "A100" for a new product. With NK-only joins, historical sales mix old and new items. With SK, the warehouse creates a new SK for the new product row. Historical facts still point to the old SK; new facts point to the new SK.

Example 3: Merging two CRMs

CRM1 and CRM2 both have customer_id = 123 but refer to different people. A conformed Customer dimension uses a matching process to identify unique persons, assigns one SK per person-version, and stores original IDs as attributes. Facts from both CRMs join via SK, avoiding collisions.

Example 4: Date dimension

Using a natural key like 20250131 is fine. It is stable, compact, and meaningful. This is a common exception to the SK rule.

Hands-on: Exercises

These mirror the exercises below. Try them now; solutions are provided in expandable sections.

Exercise 1: Choose keys and map facts

  1. Given the Customer source data:
    customer_number | email              | city      | effective_from | effective_to
    C001            | a@x.com           | Boston    | 2024-01-01     | 2024-06-30
    C001            | a@x.com           | Chicago   | 2024-07-01     | 9999-12-31
    C002            | b@example.org     | Miami     | 2024-02-10     | 9999-12-31
  2. Design the dimension keys and explain why SK is needed.
  3. Map these facts to SKs based on transaction_date:
    transaction_id | customer_number | amount | transaction_date
    T1             | C001            | 120    | 2024-03-15
    T2             | C001            | 95     | 2024-09-01
    T3             | C002            | 40     | 2024-03-01

Exercise 2: Late-arriving fact and NK change

  1. Suppose product_code P9 was renamed to PX9 on 2024-05-01. The fact arrives late, referencing product_code P9 for a sale on 2024-05-20.
  2. Decide how to resolve the foreign key so the fact points to the correct SK.
  3. Explain what to do if no matching dimension row exists at fact load time.
Self-check checklist
  • Facts always carry SK (except deliberate NK cases like Date)
  • Type 2 changes create new SKs
  • NKs are preserved as attributes for search and audit
  • Late-arriving facts use effective-dated NK lookups to pick the right SK
  • Unknown or missing NKs map to a standard unknown SK row

Common mistakes and how to self-check

  • Joining facts on NK instead of SK for Type 2 dimensions. Self-check: Does your fact table store the dimension SK? If not, fix ETL to resolve SK at load time.
  • Assuming NKs are globally unique across sources. Self-check: Do multiple systems use the same code space? If yes, include source-system qualifiers or use matching to a conformed SK.
  • Not handling NK changes or reuse. Self-check: Do you maintain effective_from/to in the dimension and create new SKs upon change?
  • No unknown or inferred members. Self-check: Do you have a default SK (e.g., -1) for unknown? Late facts should not fail the load.
  • Forgetting that Date dimension can use an NK. Self-check: Are you overcomplicating dates with unnecessary SKs?

Practical projects

  • Build a small Customer dimension with Type 2 history and load a Sales fact table. Verify joins return the correct city per sale date.
  • Conform two Product files with overlapping codes. Create SKs, track Type 2 changes, and show a report that separates old-vs-new reused codes.
  • Implement late-arriving fact handling with an inferred Product member that is later updated when full attributes arrive.

Learning path

  • Start: Surrogate vs Natural keys (this lesson)
  • Next: Slowly Changing Dimensions (Type 1/2/3) mechanics
  • Then: Late-arriving facts and inferred members
  • Finally: Conformed dimensions across multiple source systems

Next steps

  • Complete the exercises above, then take the Quick Test below.
  • Note: The quick test is available to everyone; only logged-in users get saved progress.
  • Apply the patterns in your next data model: pick SKs for change-prone dimensions, and use NKs for business search.

Mini challenge

Your supplier introduces a new color variant but keeps the same product_code for marketing continuity. Design how your Product dimension handles this. Specify when a new SK is created, how historical facts remain intact, and what attribute changes trigger Type 2 vs Type 1 updates.

Practice Exercises

2 exercises to complete

Instructions

You are modeling a Customer dimension with Type 2 history. Use the source and facts provided to decide on keys and perform the mapping.

  1. Choose the NK and explain why it is not safe for joins.
  2. Create SKs for each version of the customer.
  3. Resolve foreign keys for each fact row based on transaction_date.
Customer source:
customer_number | email              | city      | effective_from | effective_to
C001            | a@x.com           | Boston    | 2024-01-01     | 2024-06-30
C001            | a@x.com           | Chicago   | 2024-07-01     | 9999-12-31
C002            | b@example.org     | Miami     | 2024-02-10     | 9999-12-31

Facts:
transaction_id | customer_number | amount | transaction_date
T1             | C001            | 120    | 2024-03-15
T2             | C001            | 95     | 2024-09-01
T3             | C002            | 40     | 2024-03-01
Expected Output
A mapping table that shows each transaction_id with a resolved customer_sk pointing to the correct dimension row based on effective dates.

Surrogate Keys And Natural Keys — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Surrogate Keys And Natural Keys?

AI Assistant

Ask questions about this tool