luvv to helpDiscover the Best Free Online Tools
Topic 5 of 8

Lineage From Source To BI

Learn Lineage From Source To BI for free with explanations, exercises, and a quick test (for Analytics Engineer).

Published: December 23, 2025 | Updated: December 23, 2025

Why this matters

Lineage from source to BI shows how data moves from original systems through transformations into dashboards. As an Analytics Engineer, you use lineage to troubleshoot data issues, plan changes safely, estimate effort, and communicate the impact of upstream changes to stakeholders.

  • Real task: A sales field is renamed in the CRM. With lineage, you quickly find all models, metrics, and dashboards affected.
  • Real task: A KPI looks off. Lineage helps you trace how it’s calculated and where it might have drifted.
  • Real task: You’re adding a new dimension. Lineage tells you which models to update and tests to add.

Who this is for

Analytics Engineers, BI Developers, and Data Analysts who maintain pipelines or dashboards and need clear, reliable documentation of dependencies.

Prerequisites

  • Comfort reading SQL (SELECT, JOIN, CTEs).
  • Basic understanding of your data warehouse layers (staging, models, marts).
  • Familiarity with your BI tool’s datasets/metrics or dashboards.

Concept explained simply

Data lineage is a map of where data comes from, how it changes, and where it ends up.

Mental model

Think of your data platform as a metro map:

  • Stations = datasets (tables/views, BI datasets, dashboards)
  • Lines = transformations (SQL models, joins, metrics logic)
  • Direction = upstream to downstream (source → warehouse → BI)

When a station closes (a schema change), you can instantly see which lines and stations are impacted.

Levels of lineage

  • System-level: Source systems (CRM, app events, ERP)
  • Dataset-level: Raw/staging, intermediate models, marts
  • Column-level: Which columns feed which calculations
  • Metric-level: How metrics are defined and aggregated
  • BI-asset-level: Datasets, dashboards, tiles/charts

Key terms

  • Upstream: Inputs that feed a dataset or metric
  • Downstream: Assets that depend on a dataset or metric
  • Data contract: Agreement on schema, types, and SLAs of a source
  • Freshness SLA: Expected max delay before data is updated

Minimal lineage spec you can use today

Use this compact spec to document any node (table/model/metric/dashboard). Keep it consistent and visible to your team.

node_name: <unique name>
node_type: source|staging|model|mart|metric|bi_dataset|dashboard
system: <where it lives (e.g., warehouse/BI)>
owner: <team or person>
freshness_sla: <e.g., 2h>
pii: none|low|high
business_definition: <clear, human-readable meaning>
dependencies:
  - <upstream node name>
outputs:
  - <downstream node name>
columns:  # optional but recommended for key columns
  - name: <column>
    source_of: [ <downstream column(s)/metric(s)> ]
tests:
  - <freshness/schema/unique/not_null/business tests>
change_log:
  - <date>: <what changed and why>

Worked examples

Example 1 — CRM Leads to BI Conversion Rate

Goal: Document lineage for a BI Conversion Rate tile.

  • Source: crm.leads_raw
  • Staging: stg_leads (clean types, standardize status)
  • Model: fct_lead_events (one row per lead per status transition)
  • Mart: mart_marketing (agg_lead_conversion)
  • Metric: conversion_rate = converted_leads / total_leads
  • BI: Dashboard tile "Lead Conversion Rate"

Key column-level lineage:

  • converted_leads derives from fct_lead_events.status = 'Converted'
  • total_leads derives from count distinct of stg_leads.lead_id

Common pitfall: Missing a filter on active campaigns upstream can inflate conversion. Capture that in business_definition and tests.

Example 2 — App Events to Active Users Metric
  • Source: events.raw_app_events
  • Staging: stg_events (deduplicate, enforce required fields)
  • Model: fct_sessions (sessionization)
  • Mart: mart_product (daily_active_users)
  • Metric: DAU = count(distinct user_id) where event_date = today
  • BI: DAU chart

Column-level lineage:

  • DAU.user_id ← stg_events.user_id (after dedupe rules)
  • DAU.event_date ← stg_events.event_ts::date via fct_sessions rules

Note: Document the session timeout assumption (e.g., 30 minutes) in business_definition.

Example 3 — Payments to Revenue Dashboard
  • Source: erp.payments_raw
  • Staging: stg_payments
  • Model: fct_payments (currency normalization)
  • Mart: finance.revenue_daily
  • Metric: revenue = sum(amount_usd) where status = 'captured'
  • BI: Revenue dashboard (Total Revenue, Revenue by Region)

Column-level lineage:

  • amount_usd ← stg_payments.amount_local × fx_rates.rate
  • region ← customers.region from a join path (ensure join keys documented)

Risk: Changing FX source breaks comparability. Add a change_log entry and alert downstream owners before switching.

How to capture lineage in practice

  1. Inventory your BI assets: List dashboards and tiles that matter to stakeholders.
  2. Backtrack dependencies: For each tile/metric, trace the BI dataset → mart → model → staging → source.
  3. Record a minimal spec: For each node, fill the spec above. Start with dataset-level; add column-level for key metrics.
  4. Add ownership and SLAs: Document owner and freshness expectations to reduce firefighting.
  5. Automate where possible: Use your transformation tool’s compiled DAG and your BI’s dataset-to-dashboard mapping to keep docs current.
  6. Publish and review: Store the spec with your project and review on schema changes or sprint reviews.

Common mistakes and self-checks

  • Mistake: Stopping at table-level only for critical KPIs. Self-check: For top 5 KPIs, do you know the exact column derivations and filters?
  • Mistake: No owner or SLA. Self-check: Can you name a person/team for each mart and BI dataset, and a freshness target?
  • Mistake: Hidden filters. Self-check: Are all KPI filters and exclusions documented in business_definition?
  • Mistake: Missing change_log. Self-check: Can you tell when and why the metric changed last?
  • Mistake: Not documenting join keys. Self-check: Are primary keys and join conditions recorded for major joins?

Practical projects

  • Project 1: Choose one executive dashboard. Document end-to-end lineage for 3 tiles using the minimal spec.
  • Project 2: Add column-level lineage for one KPI with currency conversion or time-window logic.
  • Project 3: Implement a freshness test and display the SLA status in your BI dashboard description.

Exercises

These exercises mirror the tasks below in the page’s Exercises section. You can do them here and then check your answers.

Exercise 1 — Map lineage for a revenue KPI

Given nodes: orders_raw, customers_raw → stg_orders, stg_customers → fct_orders → fct_revenue → BI tile "Total Revenue".

Task: List nodes and edges from source to BI, and map the column-level lineage for the KPI value "total_revenue_usd" visible in BI.

Template:

Nodes: [...]
Edges: upstream -> downstream [...]
Column mapping:
  BI.total_revenue_usd <- ...
  joins & filters: ...
SLA/Owner notes: ...
Exercise 2 — Write a one-page lineage spec

For metric "Active Subscribers (28d)", write a YAML-like spec: dependencies, business definition, key columns, tests, and ownership.

  • [ ] Document upstream and downstream for each BI tile.
  • [ ] Capture owners and freshness SLAs.
  • [ ] Note business definitions and filters for each KPI.
  • [ ] Include join keys for any joins.
  • [ ] Add at least one test per critical node.

Progress saving note: Anyone can take the test and do exercises for free. If you log in, your progress and answers will be saved.

Mini challenge

Pick one KPI from your environment and produce a single screenshot-worthy lineage: 6–10 nodes, arrows from source to BI, and a 3-line business definition. Keep it under 15 minutes. Share with your team for feedback.

Learning path

  • Start: Document one BI tile end-to-end (dataset-level).
  • Next: Add column-level lineage for its key metric.
  • Then: Add owners, SLAs, and a freshness test.
  • Finally: Repeat for the top 5 KPIs and schedule periodic reviews.

Next steps

  • Choose your first dashboard and apply the minimal spec.
  • Automate partial lineage using your transformation DAG and BI datasets.
  • Add change_log entries when KPIs or upstream sources change.

Practice Exercises

2 exercises to complete

Instructions

Map full lineage from source to BI for BI tile "Total Revenue" with column-level mapping.

Given nodes and hints:

  • Sources: orders_raw(order_id, customer_id, amount_local, currency), customers_raw(customer_id, region)
  • Staging: stg_orders (type fixes), stg_customers (dedupe)
  • Models: fct_orders (join customers), fct_revenue (convert to USD, filter status = 'paid')
  • BI: Dashboard tile "Total Revenue" (sum of amount_usd)

Deliverables:

  • List of Nodes
  • Edges as pairs upstream -> downstream
  • Column mapping for BI.total_revenue_usd
  • Join keys and filters
  • Owner and freshness notes
Expected Output
A clean list of 7–10 nodes, edges from sources to BI, a column mapping showing BI.total_revenue_usd <- fct_revenue.amount_usd <- stg_orders.amount_local × fx_rate, plus join keys (order_id, customer_id), filters (status='paid'), owner and SLA.

Lineage From Source To BI — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Lineage From Source To BI?

AI Assistant

Ask questions about this tool