Why this matters
Source to Target Mapping (STM) is the contract between business needs and the ETL you build. It tells you exactly which source fields feed which target columns, how to transform them, and how to handle edge cases. In real projects you will:
- Consolidate data from CRMs, ERPs, and logs into a clean warehouse model
- Define business rules (currency conversion, code standardization, SCD handling)
- Prevent data loss with explicit null handling, defaults, and data type decisions
- Enable reproducible, testable pipelines by documenting every rule upfront
Reality check: What goes wrong without an STM?
- Hidden assumptions cause mismatched counts or broken joins
- Disagreements with stakeholders on calculated fields
- Rework after deployment due to unclear rules
Concept explained simply
An STM is a structured specification that maps each target column to its source and the transformation logic. Think of it as a recipe: ingredients (source fields), steps (transformations), and serving rules (data types, null/default policies, keys, and load strategy).
Mental model
- Columns as questions: For each target column, ask: Where does it come from? How do I compute it? What if it is missing? What type and length must it be?
- Rows as commitments: Every row in your STM is a commitment you can test and automate.
- Traceability: Anyone should be able to trace a number in the report back to source fields and rules.
Minimal STM fields to include
- Target table and column (name, type, nullable, description)
- Source system, table, and column(s)
- Transformation rule (business logic, expressions, lookups)
- Keys (business/natural keys, surrogate key generation)
- Load strategy (full, incremental; SCD handling if dimension)
- Data quality rules (valid ranges, code sets, null/default policy)
- Dependency notes (joins, lookups, reference data)
Worked examples
Example 1: Customer dimension (SCD Type 2)
Goal: Map CRM customer to warehouse dim_customer with history tracking.
- Target: dim_customer(customer_sk, customer_id, full_name, email, country_code, is_active, effective_from, effective_to, is_current)
- Source: crm.customers(id, first_name, last_name, email, country, status, updated_at)
Key rules:
- customer_sk: surrogate key generated by warehouse
- customer_id: crm.customers.id
- full_name: trim(first_name) || ' ' || trim(last_name)
- email: lower(email); reject if not valid format
- country_code: map country using reference table ref_country_map
- is_active: status in ('active','trial') => true else false
- SCD2: new row if any of (email, country_code, is_active) changes
Example 2: Orders fact (incremental by watermark)
Goal: Map orders to fact_orders with daily incremental load.
- Target: fact_orders(order_id, order_ts, customer_sk, net_amount_usd, item_count)
- Sources: erp.orders(order_id, order_date, customer_id, total_amount, currency, items_count, updated_at), fx_rates(date, currency, usd_rate), dim_customer
Key rules:
- Incremental filter: erp.orders.updated_at > last_watermark
- customer_sk: lookup in dim_customer by customer_id where is_current = true
- net_amount_usd: total_amount / usd_rate from fx_rates by order_date and currency
- item_count: items_count default 0 if null
Example 3: Web events (semi-structured JSON)
Goal: Map raw JSON events to a clean table.
- Target: stg_pageviews(event_id, user_id, page_url, referrer, ts, device_type)
- Source: logs.raw_events(json) with fields path('eventId','user.id','url','ref','ts','device.type')
Key rules:
- event_id: json->>'eventId' (must be unique; drop duplicates)
- user_id: cast json->'user'->>'id' to bigint
- page_url: normalize by removing trailing slashes
- ts: parse ISO-8601; default to ingestion_ts if missing
- device_type: map to enum set {desktop, mobile, tablet, other}; else 'other'
How to create a solid STM (step-by-step)
- Understand targets: Confirm the target model and reporting requirements. Clarify metrics and grain.
- Profile sources: Check data types, value ranges, null rates, code sets, and potential keys.
- Define keys and grain: Pick natural/business keys and surrogate keys. Specify SCD if dimension.
- Map columns: For each target column, record source fields and transformation rules.
- Specify load strategy: Full vs incremental, watermarks, change detection fields.
- Set data quality rules: Validations, defaults, and rejection criteria with actions.
- Review with stakeholders: Walk through tricky transformations and edge cases.
- Version and test: Version the STM, create unit tests and data checks from it.
Copy-friendly STM row template
Target Table: Target Column: Type/Length: Nullable: Business Meaning: Source System: Source Table(s): Source Column(s): Transformation Rule: Default/Null Handling: Keys/Joins: Load Strategy Notes: Data Quality Checks: Comments:
Exercises you can do now
Do these before you move on. They mirror the exercises below and help you build your own STM quickly.
- Exercise 1: Customer dimension mapping
- Exercise 2: Orders fact mapping with watermark and FX conversion
Checklist for your STM draft
- Every target column has a source and a rule
- Data types and lengths are specified
- Null/default policy is explicit per column
- Keys and join rules are clear
- Incremental/watermark defined (if applicable)
- Data quality checks listed and testable
Common mistakes and self-check
- Ambiguous grain: If you cannot answer “one row represents what?”, fix the grain statement.
- Hidden lookups: Any code mapping or reference data must be written as a rule with the lookup source.
- Loose data types: Define exact types and lengths; specify truncation vs reject policy.
- Forgotten null handling: Write a default or rejection for each nullable source field.
- Incremental gaps: Always define watermark field, data type, and how late-arriving data is handled.
Self-check prompts
- Can a new teammate implement your STM without asking you questions?
- Can QA derive test cases directly from your STM?
- Can you trace each report field back to a source column?
Mini challenge
Pick one of your own tables. In 15 minutes, write STM rows for three target columns: a direct map, a calculated field, and a field requiring a lookup. Include null handling and a test you would run.
Who this is for
- ETL/ELT Developers and Data Engineers who design pipelines
- Analytics Engineers documenting transformations
- Data Analysts formalizing business rules before automation
Prerequisites
- Basic SQL (SELECT, JOIN, CAST)
- Understanding of data types and null semantics
- Familiarity with dimensional modeling concepts (facts, dimensions, SCD basics)
Learning path
- Start: Requirements gathering and target model understanding
- Then: Source profiling and data quality exploration
- Now: Source to Target Mapping (this lesson)
- Next: Incremental strategies, SCD implementation, and testing
Practical projects
- Build a dim_customer STM and load with SCD Type 2
- Create a fact_orders STM using watermark and currency conversion
- Design an STM for JSON events with strict enum mapping
Next steps
- Refine your STM into reusable templates
- Generate validation SQL from STM rules
- Share your STM with stakeholders and iterate quickly
Quick test
Take the quick test below to check your understanding. Anyone can take it for free; only logged-in users have their progress saved.