How to learn Lineage From Source To BI for Documentation in Analytics Engineer for free

Why this matters

Lineage from source to BI shows how data moves from original systems through transformations into dashboards. As an Analytics Engineer, you use lineage to troubleshoot data issues, plan changes safely, estimate effort, and communicate the impact of upstream changes to stakeholders.

Real task: A sales field is renamed in the CRM. With lineage, you quickly find all models, metrics, and dashboards affected.
Real task: A KPI looks off. Lineage helps you trace how it’s calculated and where it might have drifted.
Real task: You’re adding a new dimension. Lineage tells you which models to update and tests to add.

Who this is for

Analytics Engineers, BI Developers, and Data Analysts who maintain pipelines or dashboards and need clear, reliable documentation of dependencies.

Prerequisites

Comfort reading SQL (SELECT, JOIN, CTEs).
Basic understanding of your data warehouse layers (staging, models, marts).
Familiarity with your BI tool’s datasets/metrics or dashboards.

Concept explained simply

Data lineage is a map of where data comes from, how it changes, and where it ends up.

Mental model

Think of your data platform as a metro map:

Stations = datasets (tables/views, BI datasets, dashboards)
Lines = transformations (SQL models, joins, metrics logic)
Direction = upstream to downstream (source → warehouse → BI)

When a station closes (a schema change), you can instantly see which lines and stations are impacted.

Levels of lineage

System-level: Source systems (CRM, app events, ERP)
Dataset-level: Raw/staging, intermediate models, marts
Column-level: Which columns feed which calculations
Metric-level: How metrics are defined and aggregated
BI-asset-level: Datasets, dashboards, tiles/charts

Key terms

Upstream: Inputs that feed a dataset or metric
Downstream: Assets that depend on a dataset or metric
Data contract: Agreement on schema, types, and SLAs of a source
Freshness SLA: Expected max delay before data is updated

Minimal lineage spec you can use today

Use this compact spec to document any node (table/model/metric/dashboard). Keep it consistent and visible to your team.

node_name: <unique name>
node_type: source|staging|model|mart|metric|bi_dataset|dashboard
system: <where it lives (e.g., warehouse/BI)>
owner: <team or person>
freshness_sla: <e.g., 2h>
pii: none|low|high
business_definition: <clear, human-readable meaning>
dependencies:
  - <upstream node name>
outputs:
  - <downstream node name>
columns:  # optional but recommended for key columns
  - name: <column>
    source_of: [ <downstream column(s)/metric(s)> ]
tests:
  - <freshness/schema/unique/not_null/business tests>
change_log:
  - <date>: <what changed and why>

Worked examples

Example 1 — CRM Leads to BI Conversion Rate

Goal: Document lineage for a BI Conversion Rate tile.

Source: crm.leads_raw
Staging: stg_leads (clean types, standardize status)
Model: fct_lead_events (one row per lead per status transition)
Mart: mart_marketing (agg_lead_conversion)
Metric: conversion_rate = converted_leads / total_leads
BI: Dashboard tile "Lead Conversion Rate"

Key column-level lineage:

converted_leads derives from fct_lead_events.status = 'Converted'
total_leads derives from count distinct of stg_leads.lead_id

Common pitfall: Missing a filter on active campaigns upstream can inflate conversion. Capture that in business_definition and tests.

Example 2 — App Events to Active Users Metric

Source: events.raw_app_events
Staging: stg_events (deduplicate, enforce required fields)
Model: fct_sessions (sessionization)
Mart: mart_product (daily_active_users)
Metric: DAU = count(distinct user_id) where event_date = today
BI: DAU chart

Column-level lineage:

DAU.user_id ← stg_events.user_id (after dedupe rules)
DAU.event_date ← stg_events.event_ts::date via fct_sessions rules

Note: Document the session timeout assumption (e.g., 30 minutes) in business_definition.

Example 3 — Payments to Revenue Dashboard

Source: erp.payments_raw
Staging: stg_payments
Model: fct_payments (currency normalization)
Mart: finance.revenue_daily
Metric: revenue = sum(amount_usd) where status = 'captured'
BI: Revenue dashboard (Total Revenue, Revenue by Region)

Column-level lineage:

amount_usd ← stg_payments.amount_local × fx_rates.rate
region ← customers.region from a join path (ensure join keys documented)

Risk: Changing FX source breaks comparability. Add a change_log entry and alert downstream owners before switching.

How to capture lineage in practice

Inventory your BI assets: List dashboards and tiles that matter to stakeholders.
Backtrack dependencies: For each tile/metric, trace the BI dataset → mart → model → staging → source.
Record a minimal spec: For each node, fill the spec above. Start with dataset-level; add column-level for key metrics.
Add ownership and SLAs: Document owner and freshness expectations to reduce firefighting.
Automate where possible: Use your transformation tool’s compiled DAG and your BI’s dataset-to-dashboard mapping to keep docs current.
Publish and review: Store the spec with your project and review on schema changes or sprint reviews.

Common mistakes and self-checks

Mistake: Stopping at table-level only for critical KPIs. Self-check: For top 5 KPIs, do you know the exact column derivations and filters?
Mistake: No owner or SLA. Self-check: Can you name a person/team for each mart and BI dataset, and a freshness target?
Mistake: Hidden filters. Self-check: Are all KPI filters and exclusions documented in business_definition?
Mistake: Missing change_log. Self-check: Can you tell when and why the metric changed last?
Mistake: Not documenting join keys. Self-check: Are primary keys and join conditions recorded for major joins?

Practical projects

Project 1: Choose one executive dashboard. Document end-to-end lineage for 3 tiles using the minimal spec.
Project 2: Add column-level lineage for one KPI with currency conversion or time-window logic.
Project 3: Implement a freshness test and display the SLA status in your BI dashboard description.

Exercises

These exercises mirror the tasks below in the page’s Exercises section. You can do them here and then check your answers.

Exercise 1 — Map lineage for a revenue KPI

Given nodes: orders_raw, customers_raw → stg_orders, stg_customers → fct_orders → fct_revenue → BI tile "Total Revenue".

Task: List nodes and edges from source to BI, and map the column-level lineage for the KPI value "total_revenue_usd" visible in BI.

Template:

Nodes: [...]
Edges: upstream -> downstream [...]
Column mapping:
  BI.total_revenue_usd <- ...
  joins & filters: ...
SLA/Owner notes: ...

Exercise 2 — Write a one-page lineage spec

For metric "Active Subscribers (28d)", write a YAML-like spec: dependencies, business definition, key columns, tests, and ownership.

[ ] Document upstream and downstream for each BI tile.
[ ] Capture owners and freshness SLAs.
[ ] Note business definitions and filters for each KPI.
[ ] Include join keys for any joins.
[ ] Add at least one test per critical node.

Progress saving note: Anyone can take the test and do exercises for free. If you log in, your progress and answers will be saved.

Mini challenge

Pick one KPI from your environment and produce a single screenshot-worthy lineage: 6–10 nodes, arrows from source to BI, and a 3-line business definition. Keep it under 15 minutes. Share with your team for feedback.

Learning path

Start: Document one BI tile end-to-end (dataset-level).
Next: Add column-level lineage for its key metric.
Then: Add owners, SLAs, and a freshness test.
Finally: Repeat for the top 5 KPIs and schedule periodic reviews.

Next steps

Choose your first dashboard and apply the minimal spec.
Automate partial lineage using your transformation DAG and BI datasets.
Add change_log entries when KPIs or upstream sources change.

Menu

Lineage From Source To BI

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Levels of lineage

Key terms

Minimal lineage spec you can use today

Worked examples

How to capture lineage in practice

Common mistakes and self-checks

Practical projects

Exercises

Mini challenge

Learning path

Next steps

Practice Exercises

Map lineage for a revenue KPI

Instructions

Expected Output

Write a one-page lineage spec for 'Active Subscribers (28d)'

Lineage From Source To BI — Quick Test

Have questions about Lineage From Source To BI?

AI Assistant