Topic Not Found

Why this matters

Source to Target Mapping (STM) is the contract between business needs and the ETL you build. It tells you exactly which source fields feed which target columns, how to transform them, and how to handle edge cases. In real projects you will:

Consolidate data from CRMs, ERPs, and logs into a clean warehouse model
Define business rules (currency conversion, code standardization, SCD handling)
Prevent data loss with explicit null handling, defaults, and data type decisions
Enable reproducible, testable pipelines by documenting every rule upfront

Reality check: What goes wrong without an STM?

Hidden assumptions cause mismatched counts or broken joins
Disagreements with stakeholders on calculated fields
Rework after deployment due to unclear rules

Concept explained simply

An STM is a structured specification that maps each target column to its source and the transformation logic. Think of it as a recipe: ingredients (source fields), steps (transformations), and serving rules (data types, null/default policies, keys, and load strategy).

Mental model

Columns as questions: For each target column, ask: Where does it come from? How do I compute it? What if it is missing? What type and length must it be?
Rows as commitments: Every row in your STM is a commitment you can test and automate.
Traceability: Anyone should be able to trace a number in the report back to source fields and rules.

Minimal STM fields to include

Target table and column (name, type, nullable, description)
Source system, table, and column(s)
Transformation rule (business logic, expressions, lookups)
Keys (business/natural keys, surrogate key generation)
Load strategy (full, incremental; SCD handling if dimension)
Data quality rules (valid ranges, code sets, null/default policy)
Dependency notes (joins, lookups, reference data)

Worked examples

Example 1: Customer dimension (SCD Type 2)

Goal: Map CRM customer to warehouse dim_customer with history tracking.

Target: dim_customer(customer_sk, customer_id, full_name, email, country_code, is_active, effective_from, effective_to, is_current)
Source: crm.customers(id, first_name, last_name, email, country, status, updated_at)

Key rules:

customer_sk: surrogate key generated by warehouse
customer_id: crm.customers.id
full_name: trim(first_name) || ' ' || trim(last_name)
email: lower(email); reject if not valid format
country_code: map country using reference table ref_country_map
is_active: status in ('active','trial') => true else false
SCD2: new row if any of (email, country_code, is_active) changes

Example 2: Orders fact (incremental by watermark)

Goal: Map orders to fact_orders with daily incremental load.

Target: fact_orders(order_id, order_ts, customer_sk, net_amount_usd, item_count)
Sources: erp.orders(order_id, order_date, customer_id, total_amount, currency, items_count, updated_at), fx_rates(date, currency, usd_rate), dim_customer

Key rules:

Incremental filter: erp.orders.updated_at > last_watermark
customer_sk: lookup in dim_customer by customer_id where is_current = true
net_amount_usd: total_amount / usd_rate from fx_rates by order_date and currency
item_count: items_count default 0 if null

Example 3: Web events (semi-structured JSON)

Goal: Map raw JSON events to a clean table.

Target: stg_pageviews(event_id, user_id, page_url, referrer, ts, device_type)
Source: logs.raw_events(json) with fields path('eventId','user.id','url','ref','ts','device.type')

Key rules:

event_id: json->>'eventId' (must be unique; drop duplicates)
user_id: cast json->'user'->>'id' to bigint
page_url: normalize by removing trailing slashes
ts: parse ISO-8601; default to ingestion_ts if missing
device_type: map to enum set {desktop, mobile, tablet, other}; else 'other'

How to create a solid STM (step-by-step)

Understand targets: Confirm the target model and reporting requirements. Clarify metrics and grain.
Profile sources: Check data types, value ranges, null rates, code sets, and potential keys.
Define keys and grain: Pick natural/business keys and surrogate keys. Specify SCD if dimension.
Map columns: For each target column, record source fields and transformation rules.
Specify load strategy: Full vs incremental, watermarks, change detection fields.
Set data quality rules: Validations, defaults, and rejection criteria with actions.
Review with stakeholders: Walk through tricky transformations and edge cases.
Version and test: Version the STM, create unit tests and data checks from it.

Copy-friendly STM row template

Target Table: 
Target Column: 
Type/Length: 
Nullable: 
Business Meaning: 
Source System: 
Source Table(s): 
Source Column(s): 
Transformation Rule: 
Default/Null Handling: 
Keys/Joins: 
Load Strategy Notes: 
Data Quality Checks: 
Comments:

Exercises you can do now

Do these before you move on. They mirror the exercises below and help you build your own STM quickly.

Exercise 1: Customer dimension mapping
Exercise 2: Orders fact mapping with watermark and FX conversion

Checklist for your STM draft

Every target column has a source and a rule
Data types and lengths are specified
Null/default policy is explicit per column
Keys and join rules are clear
Incremental/watermark defined (if applicable)
Data quality checks listed and testable

Common mistakes and self-check

Ambiguous grain: If you cannot answer “one row represents what?”, fix the grain statement.
Hidden lookups: Any code mapping or reference data must be written as a rule with the lookup source.
Loose data types: Define exact types and lengths; specify truncation vs reject policy.
Forgotten null handling: Write a default or rejection for each nullable source field.
Incremental gaps: Always define watermark field, data type, and how late-arriving data is handled.

Self-check prompts

Can a new teammate implement your STM without asking you questions?
Can QA derive test cases directly from your STM?
Can you trace each report field back to a source column?

Mini challenge

Pick one of your own tables. In 15 minutes, write STM rows for three target columns: a direct map, a calculated field, and a field requiring a lookup. Include null handling and a test you would run.

Who this is for

ETL/ELT Developers and Data Engineers who design pipelines
Analytics Engineers documenting transformations
Data Analysts formalizing business rules before automation

Prerequisites

Basic SQL (SELECT, JOIN, CAST)
Understanding of data types and null semantics
Familiarity with dimensional modeling concepts (facts, dimensions, SCD basics)

Learning path

Start: Requirements gathering and target model understanding
Then: Source profiling and data quality exploration
Now: Source to Target Mapping (this lesson)
Next: Incremental strategies, SCD implementation, and testing

Practical projects

Build a dim_customer STM and load with SCD Type 2
Create a fact_orders STM using watermark and currency conversion
Design an STM for JSON events with strict enum mapping

Next steps

Refine your STM into reusable templates
Generate validation SQL from STM rules
Share your STM with stakeholders and iterate quickly

Quick test

Take the quick test below to check your understanding. Anyone can take it for free; only logged-in users have their progress saved.

Instructions

You are mapping CRM customers into dim_customer with SCD Type 2. Use the schema and write STM rows for each target column.

Source: crm.customers(id, first_name, last_name, email, country, status, updated_at)
Ref: ref_country_map(src_value, iso_alpha2)
Target: dim_customer(customer_sk, customer_id, full_name, email, country_code, is_active, effective_from, effective_to, is_current)

Rules to enforce:

New row when email, country_code, or is_active changes
email must be lowercase and valid; invalid => reject row
country_code via ref_country_map; if not found => 'UN'
is_active: status in ('active','trial') => true, else false

Write STM rows covering: source, transformation, types, null/defaults, SCD notes.

Menu

Source To Target Mapping

Table of Contents

Why this matters

Concept explained simply

Mental model

Worked examples

How to create a solid STM (step-by-step)

Exercises you can do now

Common mistakes and self-check

Mini challenge

Who this is for

Prerequisites

Learning path

Practical projects

Next steps

Quick test

Practice Exercises

Design an STM for dim_customer (SCD2)

Instructions

Expected Output

Create an STM for fact_orders with watermark and FX conversion

Source To Target Mapping — Quick Test

Have questions about Source To Target Mapping?

AI Assistant