luvv to helpDiscover the Best Free Online Tools
Topic 4 of 8

Source To Target Mapping

Learn Source To Target Mapping for free with explanations, exercises, and a quick test (for Data Engineer).

Published: January 8, 2026 | Updated: January 8, 2026

Why this matters

Source-to-Target Mapping (STM) is the blueprint that explains how raw data from source systems becomes analytics-ready data in your warehouse, lakehouse, or marts. Teams rely on it to build pipelines consistently, review changes, audit lineage, and troubleshoot issues. Without an STM, you risk mismatched fields, broken transformations, and unreliable dashboards.

  • Real tasks you will do: define field mappings, document transformations, align business rules with data modeling, guide ETL/ELT code, and support QA/data governance.
  • Outcomes: fewer defects, faster onboarding for teammates, predictable releases, and clear lineage for audits.

Who this is for

  • Data engineers building or maintaining pipelines.
  • Analytics engineers and BI developers consuming curated data.
  • Data stewards and QA validating data quality and lineage.

Prerequisites

  • Basic SQL (SELECT, JOIN, CAST) and familiarity with data types.
  • Understanding of dimensional modeling (facts/dimensions) or data vault basics.
  • Comfort with profiling data (nulls, uniqueness, ranges).

Concept explained simply

An STM is a structured document that answers: From which source field does each target field come, and what transformation happens along the way? It also records data types, business rules, and data quality checks.

Mental model

Think of STM as a GPS route for data. The source is your starting point. The target is your destination. Transformations are the turns, speed limits, and detours. Quality checks are your safety checks. Versioning is the trip log.

Open a minimal STM row example
Target Table: dim_customer
Target Column: customer_key (BIGINT, PK)
Source: crm.customers.id
Transform: generate surrogate key via hash(id) or sequence
DQ: must be unique, not null
Notes: used to join facts

Core components of a Source-to-Target Map

  • Identifiers: source system, schema, table, column; target schema, table, column.
  • Data types and constraints: type, length/precision, nullability, primary/surrogate keys.
  • Transformations: expressions, standardization (trimming, case), business rules.
  • Joins and lookups: keys, reference tables, SCD handling (Type 1/2).
  • Defaults and fallbacks: constants, null-handling, error routing.
  • Data quality checks: uniqueness, referential integrity, domain lists, ranges.
  • Lineage and versions: who changed what, when, and why.
  • Load details: frequency, mode (full/incremental/CDC), late-arriving handling.
Reusable STM checklist
  • Every target column mapped or intentionally left unmapped (with reason)
  • Data types compatible and sized safely
  • Clear transformation logic with examples
  • Join keys and lookup logic documented
  • DQ rules with pass/fail actions
  • Load mode and schedule noted
  • Owner/reviewer and version/date recorded

Step-by-step: Build an STM

  1. Collect context: business purpose, target model (star, vault, or 3NF), SLAs.
  2. Profile sources: data types, null rates, distinct counts, min/max, example values.
  3. Define targets: table/field list with data types and constraints.
  4. Map fields: for each target column, pick a source and write transformations.
  5. Add quality rules: what to validate and how to handle failures.
  6. Plan loads: full vs incremental, CDC keys, watermark columns, time-zone handling.
  7. Document test cases: row-level examples and expected outputs.
  8. Review and version: peer review and change log.
Example transformation snippets
-- Standardize country
UPPER(TRIM(src.country)) AS country_name

-- Parse dates safely
TRY_CAST(src.signup_date AS DATE) AS signup_date

-- Currency conversion
ROUND(src.amount * fx.rate, 2) AS amount_usd

-- Surrogate key (hash example)
TO_HEX(SHA256(CONCAT(COALESCE(src.id,''), '|', COALESCE(src.system,'')))) AS customer_key

Worked examples

Example 1: CRM to dim_customer

  • Target: dim_customer(customer_key, natural_key, full_name, email, country, signup_date)
  • Sources: crm.customers(id, first_name, last_name, email, country, created_at)
Mapping rows
customer_key  <- HASH(id)
natural_key   <- id
full_name     <- TRIM(first_name) || ' ' || TRIM(last_name)
email         <- LOWER(TRIM(email))
country       <- UPPER(TRIM(country))  -- if null, default 'UNKNOWN'
signup_date   <- CAST(created_at AS DATE)
DQ: email contains '@'; natural_key unique; signup_date not in future

Example 2: Orders fact with currency

  • Target: fact_order(order_id, customer_key, order_ts_utc, amount_usd)
  • Sources: oms.orders(id, customer_id, order_ts_local, amount, currency), ref.fx_rates(currency, rate_to_usd, valid_date)
Mapping rows
order_id        <- id
customer_key    <- HASH(customer_id)
order_ts_utc    <- CONVERT_TZ(order_ts_local, src_tz, 'UTC')
amount_usd      <- ROUND(amount * fx.rate_to_usd, 2)  -- join on currency & date window
DQ: amount >= 0; currency in list; customer_key exists in dim_customer

Example 3: JSON events to parquet table

  • Target: events_flat(event_id, user_id, event_type, event_ts, device_os)
  • Source: raw.events(payload VARIANT/JSON)
Mapping rows
event_id   <- payload:id
user_id    <- payload:user.id
event_type <- UPPER(payload:type)
event_ts   <- TO_TIMESTAMP(payload:ts)
device_os  <- COALESCE(payload:device.os, 'UNKNOWN')
DQ: event_id unique; event_ts not null

Common mistakes and self-check

  • Mistake: Unmapped target columns. Fix: mark as derived/constant or document N/A explicitly.
  • Mistake: Implicit type casts. Fix: write explicit CAST/TRY_CAST and note expected failures.
  • Mistake: Missing time-zone normalization. Fix: standardize to UTC and record the rule.
  • Mistake: Weak DQ rules. Fix: add at least uniqueness, null checks, domain checks, and RI checks.
  • Mistake: No examples. Fix: include at least one before/after example per complex rule.
Self-check mini audit
  • Do two readers produce the same SQL from your STM?
  • Can QA test each rule without guessing?
  • Can you trace any dashboard field back to a source field?

Exercises

Do these to solidify your understanding. The Quick Test at the end checks core concepts.

Exercise ex1 — Customer Dimension STM (click to open)

Design an STM for a customer dimension with fields: customer_key (PK), natural_key, email_normalized, lifecycle_segment, signup_date. Sources: crm.customers(id, email, created_at, status). Rules: email_normalized is lower-cased and trimmed; lifecycle_segment is 'ACTIVE' if status in ('ACTIVE','PAID'), else 'INACTIVE'; signup_date is date(created_at). Add DQ rules and load mode.

  • I wrote one mapping row per target column.
  • I included explicit CAST/TRY_CAST.
  • I added at least three DQ checks.

Practical projects

Project 1 — Sales mart STM
  1. Model dim_customer, dim_product, fact_sales.
  2. Profile CRM and OMS sources; note quirks.
  3. Create detailed STM with transformations and DQ.
  4. Have a peer review it; iterate once.
Project 2 — CDC pipeline STM
  1. Define target for incremental loads (watermark last_updated).
  2. Document CDC keys, deleted-flag logic, and late-arriving handling.
  3. Specify idempotency and deduplication rules.
Project 3 — Events flattening STM
  1. List JSON paths to extract.
  2. Standardize timestamps and device fields.
  3. Document schema evolution policy and nullable handling.

Learning path

  • Start: Build an STM for one dimension (customers).
  • Next: Add one fact table with a lookup/join and DQ rules.
  • Advance: Incorporate CDC, SCD Type 2, and currency conversions.
  • Polish: Add versioning notes and example test cases.

Next steps

  • Learn dimensional modeling patterns and SCD strategies.
  • Practice writing SQL transforms that mirror your STM.
  • Adopt a consistent STM template for your team.
  • Automate checks where possible (pre/post-load validations).

Mini challenge

Pick any target field that uses multiple sources (e.g., choosing the freshest email from CRM or Support). Write a short STM rule to resolve conflicts: source priority, freshness window, and a DQ rule to flag mismatches.

Quick Test

Open to everyone. Your progress is saved if you are logged in.

Practice Exercises

1 exercises to complete

Instructions

Create an STM for target table dim_customer with columns:
customer_key (PK), natural_key, email_normalized, lifecycle_segment, signup_date.

Sources: crm.customers(id, email, created_at, status)

  • email_normalized = LOWER(TRIM(email))
  • lifecycle_segment = 'ACTIVE' if status in ('ACTIVE','PAID') else 'INACTIVE'
  • signup_date = CAST(created_at AS DATE)

Include data types, nullability, DQ checks (at least 3), and load mode (full vs incremental, with key).

Expected Output
A clear mapping row for each target column, explicit transformations, three data quality rules, and a stated load mode with keys/watermarks.

Source To Target Mapping — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Source To Target Mapping?

AI Assistant

Ask questions about this tool