Why this matters
Source-to-dashboard lineage shows exactly how a number on a dashboard is produced—from raw sources, through transformations, to the final visual. As a BI Analyst, this:
- Speeds up debugging when numbers look wrong.
- Enables impact analysis before you change a field or table.
- Builds trust with stakeholders through transparent, repeatable definitions.
- Helps onboard teammates faster with clear, accurate documentation.
Who this is for
- BI Analysts and Analytics Engineers who create or maintain dashboards.
- Data PMs and Stakeholders who need to understand where metrics come from.
Prerequisites
- Basic SQL (SELECT, JOIN, GROUP BY).
- Understanding of your BI tool concepts (measures, dimensions, filters).
- Familiarity with your data stack stages (ingestion, staging, transformation, marts/models).
Concept explained simply
Lineage is the breadcrumb trail of a metric. It answers: Where did this data start, how was it changed, and where is it used?
Mental model
Imagine a relay race:
- Runner 1: Source system (app/db/file) hands off raw data.
- Runner 2: Ingestion lands data in a warehouse (staging).
- Runner 3: Transform logic cleans and aggregates (models/marts).
- Runner 4: BI tool defines metrics and shows visuals.
Lineage documents each handoff with who, what, when, and how.
Glossary (quick)
- Source: Original data location (e.g., app_events, billing API).
- Ingestion: How data is loaded (e.g., batch daily at 02:00 UTC).
- Staging: Lightly processed copies of sources.
- Transform: Business rules applied (joins, filters, aggregations).
- Mart/Model: Curated tables for analytics (e.g., fact_sales).
- BI Semantic Layer: Measures/dimensions definitions.
- Dashboard: Final visuals and filters.
What to document in lineage
- Purpose: What question does this metric/visual answer?
- Sources: Systems/tables/files feeding the pipeline.
- Transform steps: Each rule with concise SQL-like description.
- Field-level mapping: Key fields traced from source to final.
- Filters and exclusions: Business rules that remove data.
- Refresh & SLAs: How often data updates and expected latency.
- Owners: Tech owner (data), Business owner (meaning).
- Data quality checks: Row counts, null checks, uniqueness tests.
- Known caveats: Assumptions, edge cases, historical backfills.
- Consumers: Dashboards, tiles, alerts using the data.
- Change log: Date, change summary, impact, reviewer.
Simple template you can copy
Metric/Visual: [Name] Purpose: [What decision it supports] Owners: Tech [name/team], Business [name/team] Refresh: [e.g., daily 03:00 UTC], SLA: [e.g., by 07:00 UTC] Lineage: 1) Source(s): [system.table] – key fields: [...] 2) Ingestion: [batch/stream], schedule: [...] 3) Staging: [schema.table], notes: [...] 4) Transform step(s): - T1: [logic] - T2: [logic] 5) Mart/Model: [schema.table/view], primary key: [...] 6) BI Metric: [definition], filters: [...] 7) Dashboard: [Page/Tile], filters/segments: [...] Field-level mapping (example): final_field <- mart.field <- staging.field <- source.field Data Quality: - Checks: [...] - Alerts: [...] Known caveats: [...] Change log: - [YYYY-MM-DD] [change] [impact]
Worked examples
Example 1: Daily Revenue (Paid transactions)
- Source: payments.transactions (columns: id, user_id, amount_cents, currency, status, paid_at)
- Ingestion: Batch hourly, loads to wh_staging.payments_transactions
- Transform:
- T1: Filter to status = 'paid'
- T2: Convert amount_cents to amount_usd using daily fx_rate table
- T3: Aggregate by paid_date = DATE(paid_at)
- Mart: analytics.fact_revenue_daily (paid_date, total_amount_usd)
- BI Metric: sum(total_amount_usd)
- Dashboard Tile: Line chart by day; filter region optional
Field mapping:
final.total_amount_usd <- mart.total_amount_usd <- SUM(stg.amount_usd) stg.amount_usd <- stg.amount_cents/100 * fx.rate stg.amount_cents <- src.amount_cents
Example 2: Weekly Active Users (WAU)
- Source: app.events (user_id, event_name, event_time)
- Transform:
- T1: Keep events in last 7 days, event_name in ('login','session_start')
- T2: Distinct users per ISO week
- Mart: analytics.user_activity_weekly (iso_week, active_users)
- BI Metric: active_users
- Dashboard: KPI card with week picker
Edge case: users in multiple time zones—normalize event_time to UTC.
Example 3: Signup-to-Paid Conversion Rate
- Sources: app.users (signup_at), payments.transactions (paid_at, status)
- Transform:
- T1: Users cohort by signup_date
- T2: First paid transaction per user within 30 days of signup
- T3: Conversion rate = paid_users / total_signups
- Mart: analytics.cohort_conv30d (signup_date, signups, paid_users, conv_rate)
- BI: conv_rate measure = paid_users / signups
Caveat: Exclude test users by email domain rules; document the exact filter.
Step-by-step: create your lineage now
- Pick one dashboard tile. Write its Purpose in one sentence.
- List all Source tables/files and key fields used.
- Describe each Transform step in one line each (filters, joins, aggregations).
- Identify the Mart/Model consumed by BI and its primary key and grain.
- Write the BI metric definition and dashboard filters that affect it.
- Add Owners (tech + business), Refresh schedule, and SLA.
- Add 2–3 Data Quality checks (e.g., non-decreasing cumulative totals, null-rate thresholds).
- Note Caveats and open questions.
- Save with a Change log entry (date + summary).
Exercises
Do these right after reading. Mirror the outputs in your own notes.
Exercise 1 — Map lineage for “New Subscriptions” KPI
Goal: Draft a concise lineage for a KPI that counts new paid subscriptions yesterday.
Assume these inputs:
- app_events.events (user_id, event_name, event_time)
- billing.charges (charge_id, user_id, amount_cents, currency, status, created_at)
- marketing.campaigns (campaign_id, channel, started_at)
Instructions:
- Produce a 4–7 step lineage: Source → Staging → Transform(s) → Mart → BI → Dashboard.
- Define the metric clearly (what counts as a “new subscription”).
- Add owners, refresh schedule, 2 data quality checks, and 2 assumptions.
Expected output: A clear, bullet-style lineage with field-level mapping for user_id and the date used in the KPI.
Exercise 2 — Backtrace and fix double counting
Scenario: A tile “Daily Active Users” seems inflated. Current transform counts logins:
SELECT date(event_time) AS d, COUNT(user_id) AS dau FROM app.events WHERE event_name = 'login' GROUP BY 1
Task:
- Identify the issue and propose a corrected transform.
- Update the lineage doc to reflect the fix and note impacted dashboards.
- Add a data quality test to prevent regression.
Expected output: New transform (distinct user_id per day), updated lineage steps, and a regression test definition.
Common mistakes and self-check
- Missing grain: Not stating if a table is daily, weekly, or user-level.
- Hidden filters: Filters live in SQL but not in docs; expose them.
- Vague metric definitions: Write exact rules (e.g., status in ('paid','captured')).
- Ignoring BI layer logic: Measures or dashboard filters often change results—document them.
- No owners or SLAs: Leads to confusion when things break.
- Stale docs: No change log; readers can’t trust it.
Self-check:
- Can someone new reproduce the number with only your doc?
- Does every field in the final metric map back to a source?
- Is the data’s refresh timing clear and testable?
- Are caveats and exclusions explicitly listed?
Practical projects
- Document lineage for your top 3 KPIs including owners, SLAs, and tests.
- Create a “field-level mapping” appendix for 10 high-impact fields across sources to dashboards.
- Run an impact analysis: pick a field, simulate a rename or logic change, list affected models and tiles.
Learning path
- Start: Document 1 tile fully (source → dashboard) using the template.
- Scale: Add data quality checks and a change log for each metric.
- Detail: Add field-level lineage for critical fields (ids, amounts, dates).
- Govern: Assign owners; set review cadence (e.g., monthly).
- Harden: Include backtrace procedures for on-call debugging.
Next steps
- Complete the exercises above and compare with the solutions.
- Take the Quick Test to check your understanding. Note: The quick test is available to everyone; only logged-in users have progress saved.
- Apply this to one real dashboard at work or in a portfolio project.
Mini challenge
Pick a dashboard tile you use daily. In 10 minutes, write a 6-line lineage: sources, key transforms, mart, BI metric, dashboard filters, owners. Aim for clarity over completeness. Then refine it tomorrow with field-level mapping.