Why this matters
Clear modeling and transformation notes make BI work repeatable, auditable, and easy to hand off. You will use them to explain how data flows from sources to your dimensional model, why certain rules exist, and how to safely change things without breaking dashboards.
- Handing over a dataset to engineers or another analyst
- Explaining a KPI definition and its lineage to stakeholders
- Debugging data issues (e.g., sudden drops due to a new filter)
- Complying with audits where every metric needs a defined source and rule set
Concept explained simply
Modeling notes describe how data is structured for analysis (entities, relationships, grain). Transformation notes describe how raw inputs become analytics-ready tables (rules, joins, filters, casts, SCD handling).
Mental model: a travel itinerary for your data
Think of your dataset as a traveler. Modeling notes define where the traveler wants to end up (star schema, facts, dimensions). Transformation notes define the route and transport used (steps, rules, checkpoints). Together, they make the trip repeatable and understandable to anyone.
Core elements of modeling and transformation notes
- Purpose: Why this model or transformation exists
- Scope and grain: What is in/out, and the exact row-level grain (e.g., one row per order line)
- Inputs: Source tables/views and key fields
- Business rules: Filters, mappings, calculations, and assumptions
- Join logic: Keys, join type, and conflict handling
- Data types and units: Casts, currencies, time zones
- Slowly Changing Dimensions (if any): SCD type, keys, validity fields
- Data quality checks: Nulls, duplicates, referential integrity
- Outputs: Columns with descriptions and sample values
- Lineage and ownership: Where data came from and who owns maintenance
- Change log: What changed, when, and why
Worked examples
Example 1: Fact Sales modeling note (star schema)
Purpose: Support revenue reporting and conversion analysis.
Grain: One row per order_line_id.
Inputs: src.orders, src.order_lines, dim_customers, dim_products, dim_date
Join logic: order_lines.order_id = orders.id (inner), product_id to dim_products (left), customer_id to dim_customers (left), order_date to dim_date (left).
Business rules: Exclude test orders (orders.is_test = false). Use net_amount = gross_amount - discount_amount. Currency normalized to USD using fx_rate on order_date.
Columns: order_line_id (PK), order_id, customer_key, product_key, date_key, qty, net_amount_usd, channel, country
Data quality: Reject rows with null order_line_id or qty < 0.
Change log: 2024-10-10: Added currency normalization.
Example 2: Customer dedup transformation note
Purpose: Create unique customers by merging CRM and checkout users.
Inputs: crm.contacts, web.users
Matching rules (priority order): 1) email exact (lowercased); 2) phone E.164 match; 3) same name + same billing_postal + created_at within 7 days.
Survivorship: Prefer CRM name/phone; prefer web marketing_consent; prefer most recent address.
Outputs: dim_customers with customer_key (hash of canonical_id), is_active flag, source_system
Data quality: Flag if a source row maps to multiple canonical_ids.
Example 3: SCD Type 2 for Product Prices
Purpose: Track historical price changes for analysis.
Keys: product_id (business key), price_key (surrogate)
SCD: Type 2 using valid_from, valid_to, is_current
Rules: Close previous record when price changes; no gaps/overlaps permitted per product_id.
Checks: For each product_id, at most one is_current = true.
How to write a good note (quick steps)
- State the goal: One sentence on why this exists.
- Declare the grain: Exactly one row represents what?
- List inputs and ownership: Table names and responsible teams.
- Enumerate rules: Filters, mappings, calculations, join types.
- Describe outputs: Columns with definitions and units.
- Add checks: Nulls, duplicates, referential constraints, row count deltas.
- Record changes: Date, change, reason, impact.
Copy-ready template
Title: [Model/Transformation Name] Purpose: Scope & Grain: Inputs (owner): Join Logic: Business Rules: Data Types & Units: SCD Handling (if any): Outputs (field: description, example): Data Quality Checks: Lineage: Change Log:
Exercises
Do these to build muscle memory. Tip: time-box each to 15–20 minutes.
Exercise 1 — Deduplicate customers (mirrors Exercise ex1)
Write a transformation note for merging duplicate customers from two sources with conflicting fields and inconsistent casing.
- Goal: canonical dim_customers with one row per real person
- Sources: crm.contacts, ecommerce.users
- Edge cases: same email with different capitalization, phone formats, missing names
What to include
- Grain
- Match rules and priority
- Survivorship rules
- Outputs and data quality checks
Exercise 2 — Sales model (mirrors Exercise ex2)
Draft modeling notes for a sales fact table with discounts and refunds.
- Define grain precisely
- State business rules for net revenue
- List inputs and join logic
- Add data quality checks
Self-check checklist
- Grain is explicit and testable
- Every filter has a reason
- Join keys and types are unambiguous
- Data types/units are stated (currency, time zone)
- Outputs list includes definitions and examples
- Change log contains at least one entry
Common mistakes and how to self-check
- Vague grain: Fix by adding a unique row identifier and a one-line definition.
- Hidden filters: List every filter and why it exists; note potential impact.
- Ambiguous joins: Specify keys, join direction, and conflict handling.
- Missing units/time zones: Always document currency, fx rate date, and time zone.
- No data quality checks: Add null/duplicate checks and threshold alerts.
- No change log: Add date, change, reason, and impacted outputs.
Quick self-audit mini-task
Scan your last note and highlight any sentence that could be interpreted in two ways. Rewrite it to remove ambiguity.
Practical projects
- Create a complete star schema note for a small ecommerce dataset (orders, order_lines, customers, products, date). Include three DQ checks.
- Document a currency normalization transformation with a fallback rule when fx rate is missing.
- Write SCD Type 2 notes for a product attribute (e.g., category). Include validity logic and tests.
Who this is for
- BI Analysts and Data Analysts who maintain datasets and dashboards
- Analytics Engineers who want handoff-ready documentation
- Team leads ensuring auditability
Prerequisites
- Basic SQL and familiarity with joins
- Understanding of dimensional modeling (facts, dimensions)
- Comfort reading ERDs or schema diagrams
Learning path
- Start with the template above and fill it for one simple table.
- Add rules and checks; review with a teammate.
- Extend to a small star schema (1 fact, 3 dimensions).
- Introduce an SCD and document it end-to-end.
- Refine with change logs and lineage notes.
Next steps
- Apply the template to one live dataset in your workspace.
- Share your note with a stakeholder and ask for one ambiguity they spotted.
- Set a recurring reminder to update the change log after deployments.
Mini challenge
In 6 bullet points, document how you would transform raw clickstream events into a sessionized table. Include grain, inputs, session logic, and DQ checks.
Quick test is available to everyone; only logged-in users have their progress saved.