Why this matters
Clear, consistent documentation is the backbone of Data Governance. As a BI Developer, you will:
- Hand off datasets and dashboards to analysts and business users who weren’t in the build meetings.
- Prove KPI definitions during audits and leadership reviews.
- Onboard new teammates faster and reduce back-and-forth questions.
- Prevent misinterpretation of metrics that drive decisions and incentives.
See a realistic request you might get
“We need a definition page for Monthly Active Users. Finance and Product disagree on which events count. Can you publish it and link it from the dashboard?”
Concept explained simply
Think of documentation as three things:
- Contract: The agreed meaning (KPI definitions, data contracts).
- Map: Where things live and how they connect (lineage, sources, owners).
- Journal: What changed and why (changelog, versioning, decisions).
Mental model: Standardize the minimum set of fields for each doc type, so anyone can answer Who, What, Why, How, When in under a minute.
What to document (scope)
- Dataset README (for marts/models)
- Data dictionary (columns and business meanings)
- Metric/KPI definitions
- Dashboard notes (audience, refresh, caveats)
- Lineage and source systems
- Changelog and versioning
- Access, sensitivity, and retention notes
Minimal Dataset README template
Title: [dataset_name] Owner: [team/name] Contact: [email or channel] Purpose: One sentence on business question this answers Grain: One row represents... Primary keys: [...] Refresh cadence: [hourly/daily/weekly] Source(s): [system/table(s)] Quality checks: [e.g., null rate < 1%, duplicates blocked] Caveats: [known gaps, edge cases] Last reviewed: [YYYY-MM-DD]
Minimal KPI definition template
Name: [KPI name] Business question: What it tells us Formula: exact calculation (pseudo-SQL ok) Filters & exclusions: precise rules Grain & time window: e.g., daily, month-to-date Owner & approver: who defines; who signs off Freshness target: e.g., updated by 08:00 UTC daily Caveats: what NOT to use it for Related metrics: similar or upstream measures Version: [semver e.g., 1.2.0] Change notes: link or short note
Minimal column dictionary entry
column_name: customer_id Type: integer Meaning: Stable internal id for a unique customer Allowed values / format: positive int Null policy: never null Example: 1849327 Lineage note: from crm.customers.id
Standards to apply
- Plain language: Short sentences, no internal jargon unless defined.
- Consistent naming: snake_case for columns/tables; Title Case for docs.
- Single source of truth: One canonical doc per dataset/KPI; reference it everywhere.
- Versioning: Semantic Versioning (MAJOR.MINOR.PATCH). Breaking definition change = MAJOR bump.
- Dates: ISO 8601 (YYYY-MM-DD) for clarity.
- Contacts: Always include an owner and contact method.
- Findability: Use the same section order and headings for all docs.
- Doc-as-close-to-code-as-possible: Keep docs next to transformation logic or in the same version control; update in the same change.
Example of clear vs unclear KPI definition
Unclear: “Active Users are users who did something in the app last month.”
Clear: “Active User (MAU) = distinct user_id with at least one event_name in {login, add_to_cart, checkout} between first and last calendar day of the month, in timezone UTC. Exclude test accounts where email like '%@example.test'.”
Worked examples
Example 1: Dataset README for a dimension table
Title: dim_customer Owner: BI Team Contact: bi@company.com Purpose: Master record of customers used to join facts and compute cohort metrics. Grain: One row per unique customer_id Primary keys: customer_id Refresh cadence: Daily at 06:00 UTC Source(s): crm.customers, billing.accounts Quality checks: customer_id unique; email format valid; country in ISO-3166 list Caveats: Some legacy accounts lack country; backfill in progress Last reviewed: 2025-06-10
Example 2: KPI definition
Name: On-Time Delivery Rate (OTD) Business question: How reliably do we deliver orders by the promised date? Formula: sum(case when delivered_at <= promised_date then 1 else 0 end) / count(*) Filters & exclusions: exclude cancelled orders; exclude test orders (order_id in test range) Grain & window: monthly, by order completion month (timezone: UTC) Owner & approver: Ops
Example 3: Changelog entry for a breaking change
[2025-03-01] Version 2.0.0 (MAJOR) Artifact: fact_orders Change: order_status now excludes 'returned'; new column is_returned added Reason: Align with Finance definitions Impact: Downstream dashboards using order_status counts may change by ~2% Action: Use is_returned for return logic; update filters
Exercises
Do these now. You’ll mirror the templates above.
Exercise 1 (matches "ex1")
Create a concise README for a new dataset mart.sales_orders using the minimal template. Assume sources: erp.orders, erp.order_items. Daily refresh at 05:00 UTC. Known caveat: historical orders before 2022-01-01 have incomplete discount data.
Exercise 2 (matches "ex2")
Write a KPI definition for “Repeat Purchase Rate (RPR)” with a monthly grain: percent of customers with 2+ orders in a month. Exclude internal test customers with emails ending @example.test. Include owner, formula, caveats, and version.
Pre-publish checklist
- Owner and contact present
- Grain and refresh cadence stated
- Primary keys and filters clear
- Caveats listed and dated
- Version or last reviewed date included
Common mistakes and how to self-check
- Vague definitions: Replace words like “some,” “often,” “active” with explicit filters and time windows.
- Missing grain: Always state what one row or one metric unit represents.
- No owner: Add a team and a real contact so questions get answered.
- No versioning: Note version or last reviewed date to signal freshness.
- Docs drift from code: Update docs in the same change as model or metric updates.
Self-check prompts
- Could a new analyst reproduce the metric from your doc alone?
- Would two teams get the same number?
- Does the doc tell me who to ping when numbers look off?
Practical projects
- Adopt-a-dataset: Pick one core dataset, add a README, column dictionary for 10 key fields, and a 3-line lineage note.
- KPI alignment sprint: Document 5 top KPIs with formula, filters, and caveats. Review with Finance/Product.
- Changelog discipline: For one month, log every breaking and non-breaking change with semantic versions.
Mini challenge
Your PM says “Churn Rate dropped 3%” but you discover cancellations paused for maintenance for 48 hours. Write a caveat paragraph you would add to the KPI doc, and propose a version bump type. Keep it under 70 words.
Learning path
- Now: Apply these templates to your top 3 datasets and 3 KPIs.
- Next: Add lineage notes and quality checks for each dataset.
- Later: Standardize dashboard doc blocks (audience, refresh, caveats) across all BI dashboards.
Who this is for and prerequisites
- Who: BI Developers, Analytics Engineers, and Data Analysts who publish datasets or dashboards.
- Prerequisites: Basic SQL, familiarity with your data sources, and access to your BI/ETL tooling.
Next steps
- Pick one live dataset and publish a README today.
- Schedule a 15-minute review with a stakeholder to validate one KPI definition.
- Start a simple CHANGELOG file and use semantic versioning.
Quick Test
You can take the test below anytime. Progress is available to everyone; sign in to save your progress and resume later.