luvv to helpDiscover the Best Free Online Tools
Topic 3 of 8

Source To Target Mapping

Learn Source To Target Mapping for free with explanations, exercises, and a quick test (for ETL Developer).

Published: January 11, 2026 | Updated: January 11, 2026

Why this matters

Source to Target Mapping (STM) is the contract between business needs and the ETL you build. It tells you exactly which source fields feed which target columns, how to transform them, and how to handle edge cases. In real projects you will:

  • Consolidate data from CRMs, ERPs, and logs into a clean warehouse model
  • Define business rules (currency conversion, code standardization, SCD handling)
  • Prevent data loss with explicit null handling, defaults, and data type decisions
  • Enable reproducible, testable pipelines by documenting every rule upfront
Reality check: What goes wrong without an STM?
  • Hidden assumptions cause mismatched counts or broken joins
  • Disagreements with stakeholders on calculated fields
  • Rework after deployment due to unclear rules

Concept explained simply

An STM is a structured specification that maps each target column to its source and the transformation logic. Think of it as a recipe: ingredients (source fields), steps (transformations), and serving rules (data types, null/default policies, keys, and load strategy).

Mental model

  • Columns as questions: For each target column, ask: Where does it come from? How do I compute it? What if it is missing? What type and length must it be?
  • Rows as commitments: Every row in your STM is a commitment you can test and automate.
  • Traceability: Anyone should be able to trace a number in the report back to source fields and rules.
Minimal STM fields to include
  • Target table and column (name, type, nullable, description)
  • Source system, table, and column(s)
  • Transformation rule (business logic, expressions, lookups)
  • Keys (business/natural keys, surrogate key generation)
  • Load strategy (full, incremental; SCD handling if dimension)
  • Data quality rules (valid ranges, code sets, null/default policy)
  • Dependency notes (joins, lookups, reference data)

Worked examples

Example 1: Customer dimension (SCD Type 2)

Goal: Map CRM customer to warehouse dim_customer with history tracking.

  • Target: dim_customer(customer_sk, customer_id, full_name, email, country_code, is_active, effective_from, effective_to, is_current)
  • Source: crm.customers(id, first_name, last_name, email, country, status, updated_at)

Key rules:

  • customer_sk: surrogate key generated by warehouse
  • customer_id: crm.customers.id
  • full_name: trim(first_name) || ' ' || trim(last_name)
  • email: lower(email); reject if not valid format
  • country_code: map country using reference table ref_country_map
  • is_active: status in ('active','trial') => true else false
  • SCD2: new row if any of (email, country_code, is_active) changes
Example 2: Orders fact (incremental by watermark)

Goal: Map orders to fact_orders with daily incremental load.

  • Target: fact_orders(order_id, order_ts, customer_sk, net_amount_usd, item_count)
  • Sources: erp.orders(order_id, order_date, customer_id, total_amount, currency, items_count, updated_at), fx_rates(date, currency, usd_rate), dim_customer

Key rules:

  • Incremental filter: erp.orders.updated_at > last_watermark
  • customer_sk: lookup in dim_customer by customer_id where is_current = true
  • net_amount_usd: total_amount / usd_rate from fx_rates by order_date and currency
  • item_count: items_count default 0 if null
Example 3: Web events (semi-structured JSON)

Goal: Map raw JSON events to a clean table.

  • Target: stg_pageviews(event_id, user_id, page_url, referrer, ts, device_type)
  • Source: logs.raw_events(json) with fields path('eventId','user.id','url','ref','ts','device.type')

Key rules:

  • event_id: json->>'eventId' (must be unique; drop duplicates)
  • user_id: cast json->'user'->>'id' to bigint
  • page_url: normalize by removing trailing slashes
  • ts: parse ISO-8601; default to ingestion_ts if missing
  • device_type: map to enum set {desktop, mobile, tablet, other}; else 'other'

How to create a solid STM (step-by-step)

  1. Understand targets: Confirm the target model and reporting requirements. Clarify metrics and grain.
  2. Profile sources: Check data types, value ranges, null rates, code sets, and potential keys.
  3. Define keys and grain: Pick natural/business keys and surrogate keys. Specify SCD if dimension.
  4. Map columns: For each target column, record source fields and transformation rules.
  5. Specify load strategy: Full vs incremental, watermarks, change detection fields.
  6. Set data quality rules: Validations, defaults, and rejection criteria with actions.
  7. Review with stakeholders: Walk through tricky transformations and edge cases.
  8. Version and test: Version the STM, create unit tests and data checks from it.
Copy-friendly STM row template
Target Table: 
Target Column: 
Type/Length: 
Nullable: 
Business Meaning: 
Source System: 
Source Table(s): 
Source Column(s): 
Transformation Rule: 
Default/Null Handling: 
Keys/Joins: 
Load Strategy Notes: 
Data Quality Checks: 
Comments:

Exercises you can do now

Do these before you move on. They mirror the exercises below and help you build your own STM quickly.

  • Exercise 1: Customer dimension mapping
  • Exercise 2: Orders fact mapping with watermark and FX conversion
Checklist for your STM draft
  • Every target column has a source and a rule
  • Data types and lengths are specified
  • Null/default policy is explicit per column
  • Keys and join rules are clear
  • Incremental/watermark defined (if applicable)
  • Data quality checks listed and testable

Common mistakes and self-check

  • Ambiguous grain: If you cannot answer “one row represents what?”, fix the grain statement.
  • Hidden lookups: Any code mapping or reference data must be written as a rule with the lookup source.
  • Loose data types: Define exact types and lengths; specify truncation vs reject policy.
  • Forgotten null handling: Write a default or rejection for each nullable source field.
  • Incremental gaps: Always define watermark field, data type, and how late-arriving data is handled.
Self-check prompts
  • Can a new teammate implement your STM without asking you questions?
  • Can QA derive test cases directly from your STM?
  • Can you trace each report field back to a source column?

Mini challenge

Pick one of your own tables. In 15 minutes, write STM rows for three target columns: a direct map, a calculated field, and a field requiring a lookup. Include null handling and a test you would run.

Who this is for

  • ETL/ELT Developers and Data Engineers who design pipelines
  • Analytics Engineers documenting transformations
  • Data Analysts formalizing business rules before automation

Prerequisites

  • Basic SQL (SELECT, JOIN, CAST)
  • Understanding of data types and null semantics
  • Familiarity with dimensional modeling concepts (facts, dimensions, SCD basics)

Learning path

  • Start: Requirements gathering and target model understanding
  • Then: Source profiling and data quality exploration
  • Now: Source to Target Mapping (this lesson)
  • Next: Incremental strategies, SCD implementation, and testing

Practical projects

  • Build a dim_customer STM and load with SCD Type 2
  • Create a fact_orders STM using watermark and currency conversion
  • Design an STM for JSON events with strict enum mapping

Next steps

  • Refine your STM into reusable templates
  • Generate validation SQL from STM rules
  • Share your STM with stakeholders and iterate quickly

Quick test

Take the quick test below to check your understanding. Anyone can take it for free; only logged-in users have their progress saved.

Practice Exercises

2 exercises to complete

Instructions

You are mapping CRM customers into dim_customer with SCD Type 2. Use the schema and write STM rows for each target column.

  • Source: crm.customers(id, first_name, last_name, email, country, status, updated_at)
  • Ref: ref_country_map(src_value, iso_alpha2)
  • Target: dim_customer(customer_sk, customer_id, full_name, email, country_code, is_active, effective_from, effective_to, is_current)

Rules to enforce:

  • New row when email, country_code, or is_active changes
  • email must be lowercase and valid; invalid => reject row
  • country_code via ref_country_map; if not found => 'UN'
  • is_active: status in ('active','trial') => true, else false

Write STM rows covering: source, transformation, types, null/defaults, SCD notes.

Expected Output
A clear list of STM rows providing source fields, transformations, data types/lengths, null/default, key strategy, and SCD2 change conditions for each target column.

Source To Target Mapping — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Source To Target Mapping?

AI Assistant

Ask questions about this tool