How to learn Source To Target Documentation for Documentation And Handover in ETL Developer for free

Why this matters

Source-to-Target (STT) documentation is the contract that explains how data moves, transforms, and lands in your target models. As an ETL Developer, you rely on it to build pipelines, enable QA to test, help analysts trace lineage, and ensure smooth handovers and maintenance.

Add a new KPI or column? STT defines exact calculation and provenance.
Migrate systems or refactor pipelines? STT shows all downstream impacts.
Audit or incident review? STT proves where values come from and why.

Who this is for

ETL / ELT Developers and Data Engineers who design and build pipelines.
BI Developers who source data for semantic models and reports.
QA Engineers who validate transformations and lineage.
Analysts who need unambiguous definitions of metrics and attributes.

Prerequisites

Comfort with SQL (joins, aggregations, window functions).
Basic data modeling concepts (dimensions, facts, keys, constraints).
Familiarity with data types and null/empty value handling.

Concept explained simply

Think of STT mapping as a per-column recipe. For each target field, you list where it comes from, how it is transformed, when it is loaded, and how it is validated. If someone new joins the team, they can implement or change the pipeline just by reading the STT doc.

Mental model

Picture a map with arrows from sources to targets. Each arrow has a label: the transformation rule, filters, and business meaning. If any arrow is missing or unlabeled, the pipeline will be ambiguous.

Core elements of an STT map

Target table and column (name, data type, length/precision, nullable).
Source(s): table/view, column(s), and join keys.
Transformation/business rule: expressions, lookups, mappings, aggregations.
Filters and grain: row filters, aggregation level, deduplication rules.
Keys and constraints: primary key, surrogate keys, uniqueness expectations.
Default/null rules: what to do when data is missing or invalid.
Load type and timing: full vs incremental, SCD behavior if applicable.
Lineage note: brief why/where rationale (e.g., based on contract X).
Test cases: sample inputs and expected outputs.

Simple STT row template

Target: dim_customer.customer_type (VARCHAR(20), NOT NULL)
Source: crm.customers.customer_segment
Rule: UPPER(TRIM(customer_segment)); if NULL then 'UNKNOWN'
Filter/Grain: One row per customer_id (dedupe by updated_at desc)
Keys: PK dim_customer.customer_sk; NK crm.customers.customer_id
Load Type: Incremental by updated_at
Lineage Note: Follows CRM data contract v2.1
Test Case: ' retail ' -> 'RETAIL'; NULL -> 'UNKNOWN'

Worked examples

Example 1: Dimension attribute with lookup

Target: dim_customer.country_code (CHAR(2), NOT NULL)

Source: crm.customers.country_name
Rule: Map country name to ISO-2 via ref.country_lookup; default 'ZZ' if not found.
Join: crm.customers.country_name = ref.country_lookup.country_name
Load: Incremental, latest by updated_at
Test: 'United States' -> 'US'; 'Brasil' -> 'BR'; unknown -> 'ZZ'

Show mapping snippet

SELECT COALESCE(l.iso2, 'ZZ') AS country_code
FROM crm.customers c
LEFT JOIN ref.country_lookup l
  ON UPPER(TRIM(c.country_name)) = UPPER(TRIM(l.country_name))

Example 2: Fact with surrogate key lookups

Target: fact_orders

Columns: order_sk (PK), customer_sk, order_date_sk, revenue_amount
Sources: sales.orders, dim_customer, dim_date
Rules:
- customer_sk: lookup by natural key customer_id; if no match, send to quarantine stream.
- order_date_sk: join dim_date on order_date.
- revenue_amount: CAST(total as DECIMAL(18,2)); negative values rejected.
Load: Incremental by order_id > max_loaded_id
Test: order_id 123, customer_id 45, order_date 2023-09-01 -> customer_sk 991, order_date_sk 20230901

Show mapping snippet

customer_sk:  SELECT dc.customer_sk
             FROM dim_customer dc
             WHERE dc.customer_id = o.customer_id AND dc.is_current = TRUE
order_date_sk: SELECT dd.date_sk FROM dim_date dd WHERE dd.date = o.order_date
revenue_amount: CAST(o.total AS DECIMAL(18,2))

Example 3: Conversions and defaults

Target: dim_product.is_active (BOOLEAN, NOT NULL)

Source: erp.products.active_flag (CHAR(1))
Rule: 'Y' -> TRUE, 'N' -> FALSE, else NULL then default FALSE and log a data quality alert.
Test: 'y' -> TRUE; 'N' -> FALSE; '' -> FALSE with DQ alert

Show mapping snippet

CASE UPPER(TRIM(active_flag))
  WHEN 'Y' THEN TRUE
  WHEN 'N' THEN FALSE
  ELSE NULL
END -- Default to FALSE in load step if NULL; emit DQ event

Step-by-step: build an STT map

Identify target table and grain. Define one row per target column in your mapping.
List all source systems, tables, and key joins.
Write clear transformation rules for each target column.
Define filters and deduplication logic.
Specify keys and constraints (PK, FK, uniqueness).
Set defaults and null handling. State what happens and why.
Clarify load type: full, incremental, CDC, SCD behavior.
Add lineage notes and the business rationale.
Create at least two test cases per non-trivial rule.

Checklist to finish your STT

[ ] Every target column has a rule and a source
[ ] Keys and constraints documented
[ ] Null/default handling stated
[ ] Filters and grain are explicit
[ ] Incremental logic and SCD noted (if relevant)
[ ] At least 2 test cases per complex field
[ ] Version/date and owner recorded

Exercises

Do these exercises. Then check the solutions. The quick test at the end is available to everyone for free; only logged-in users get saved progress.

Exercise 1 (matches ex1)

Design an STT mapping for target table analytics.orders_clean built from sources sales.orders_raw and crm.customers:

Target columns:
- order_id (BIGINT, PK)
- customer_sk (BIGINT, NOT NULL)
- order_ts_utc (TIMESTAMP, NOT NULL)
- status (VARCHAR(12), NOT NULL)
- total_amount (DECIMAL(18,2), NOT NULL)
- is_new_customer (BOOLEAN, NOT NULL)
Business rules:
- Join customers on orders_raw.customer_id -> dim_customer.customer_sk where is_current = TRUE.
- Convert order_time (stored as UTC string) to TIMESTAMP; reject invalid timestamps.
- Status mapping: 'paid' -> 'PAID'; 'shipped' -> 'SHIPPED'; else 'PENDING'.
- total_amount: CAST(total as DECIMAL(18,2)); default 0.00 if NULL and log.
- is_new_customer: TRUE if customer's first_order_date = DATE(order_time), else FALSE.

Hints

Document default handling and DQ alerts.
State the join and the grain (one row per order_id).
Add at least two sample test cases.

Exercise 2 (matches ex2)

Fix the issues in this STT snippet for dim_product:

Problem snippet

Target: dim_product.product_price (DECIMAL(10,2), NULL)
Source: erp.products.price
Rule: price
Notes: -
---
Target: dim_product.product_category (VARCHAR(50), NOT NULL)
Source: erp.products.cat
Rule: TRIM(cat)
Default: NULL
---
Target: dim_product.product_sk (BIGINT, PK)
Source: -
Rule: use erp.products.product_id

Identify and correct at least three issues related to data type/precision, defaults, and key strategy. Provide the corrected rules.

Hints

Consider precision for price and nullability.
Default for NOT NULL must be non-null and meaningful.
Surrogate keys should not reuse natural keys.

Common mistakes and how to self-check

Missing grain: If you dont specify grain/deduplication, duplicates will slip in. Self-check: Can two source rows map to one target row? How is the winner chosen?
Hidden defaults: Defaults embedded in code but not in STT cause surprises. Self-check: For every nullable source, is the default documented?
Ambiguous joins: Unstated join keys produce wrong matches. Self-check: Are all join keys and filters explicit?
Type drift: Target types/lengths not stated or mismatched. Self-check: Do all casts/precisions appear in the mapping?
No test cases: Without examples, QA cant validate. Self-check: Does each complex rule have sample inputs and outputs?

Practical projects

Build STT for a customer dimension (SCD Type 2) plus a small fact table referencing it, including surrogate key lookups and effective dating.
Write STT for a GDPR delete/soft-delete process describing how identifiers propagate and how NULL/defaults are applied to sensitive attributes.
Create STT for a backfill job that re-aggregates 12 months of orders; include filters, aggregation grain, and reconciliation rules.

Handover and collaboration tips

Version the STT: include version, owner, and date. Note breaking changes.
Keep a consistent naming convention for sources/targets and keys.
Attach test data and expected output samples directly in the STT doc.
Review checklist before handover: completeness, constraints, null rules, lineage notes, and test coverage.

Learning path

Before this: Basic SQL transformations and data modeling.
Now: Source-to-Target documentation (this lesson).
Next: Operational runbooks and incident playbooks; Data contracts and change management.

Next steps

Write an STT for one existing pipeline in your team. Ask a peer to implement it using only your doc.
Introduce a Defaults and DQ section to every STT row you own.
Schedule a 20-minute review to align business rules with analytics stakeholders.

Mini challenge

Add target column fact_orders.tax_amount with rule: tax_amount = ROUND(total_amount * tax_rate, 2). tax_rate is from ref.tax_rate by country_code and order_date. Use 0.00 default if tax_rate missing and log DQ alert. Write two test cases.

Quick Test note

Take the quick test below to check your understanding. Its available to everyone for free; only logged-in users get saved progress.

Instructions

Design an STT mapping for target table analytics.orders_clean built from sources sales.orders_raw and crm.customers:

Target columns:
- order_id (BIGINT, PK)
- customer_sk (BIGINT, NOT NULL)
- order_ts_utc (TIMESTAMP, NOT NULL)
- status (VARCHAR(12), NOT NULL)
- total_amount (DECIMAL(18,2), NOT NULL)
- is_new_customer (BOOLEAN, NOT NULL)
Business rules:
- Join customers on orders_raw.customer_id -> dim_customer.customer_sk where is_current = TRUE.
- Convert order_time (UTC string) to TIMESTAMP; reject invalid timestamps.
- Status mapping: 'paid' -> 'PAID'; 'shipped' -> 'SHIPPED'; else 'PENDING'.
- total_amount: CAST(total as DECIMAL(18,2)); default 0.00 if NULL and log.
- is_new_customer: TRUE if customer's first_order_date = DATE(order_time), else FALSE.

Menu

Source To Target Documentation

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Core elements of an STT map

Worked examples

Example 1: Dimension attribute with lookup

Example 2: Fact with surrogate key lookups

Example 3: Conversions and defaults

Step-by-step: build an STT map

Exercises

Exercise 1 (matches ex1)

Exercise 2 (matches ex2)

Common mistakes and how to self-check

Practical projects

Handover and collaboration tips

Learning path

Next steps

Mini challenge

Quick Test note

Practice Exercises

Design STT for orders_clean

Instructions

Expected Output

Fix the dim_product STT snippet

Source To Target Documentation — Quick Test

Have questions about Source To Target Documentation?

AI Assistant