Why documentation matters for Analytics Engineers
Documentation is your contract with the business and future-you. It reduces support requests, speeds up onboarding, prevents metric drift, and makes changes safer. Good docs turn a data platform into a product that people trust.
- Fewer Slack questions and ad-hoc meetings
- Predictable releases and safer refactors
- Clear ownership and accountability
- Enable self-serve analytics and consistent metrics
Quick mindset shift
Treat docs as part of the deliverable. A model or metric is not done until it is reliably discoverable, understandable, and maintainable.
Who this is for
- Analytics Engineers building and maintaining models, metrics, and BI layers
- Data/BI Engineers who want stronger collaboration with analysts and business teams
- Senior Analysts transitioning to an engineering approach
Prerequisites
- Comfort with SQL and basic data modeling (staging → marts)
- Familiarity with version control (Git) and code review flow
- Basic knowledge of your BI tool and pipeline orchestrator
What you will be able to do
- Write concise table and column docs that answer business questions
- Define consistent metrics with clear governance
- Capture lineage from source to BI for audits and impact analysis
- Create runbooks for failures and backfills
- Keep docs in sync with code through reviews and automation
Learning path (practical roadmap)
1) Inventory and triage
List your top 10 business-critical models and dashboards. Note missing docs, unclear owners, and metric inconsistencies.
2) Data dictionary foundation
Add model descriptions and column-level docs for the top models. Focus on purpose, grain, filters, and caveats.
3) Lineage and sources
Document each source system, refresh SLAs, and owners. Sketch lineage from source to BI for your critical flows.
4) Metric definitions
Standardize core metrics (e.g., Revenue, Active Users). Define formula, filters, grain, and authoritative owner/approver.
5) Runbooks
Write failure and backfill runbooks for 3 critical pipelines. Include validation and rollback steps.
6) Onboarding guides
Create a short path for new analysts: where to find data, how to request changes, and example queries.
7) Keep docs in sync
Add doc checks to code review, require owners, and schedule quarterly audits.
Mini task: start your doc backlog
Create a 2-column list: Model/Dashboard and Missing/Confusing Docs. Limit to 10 items. This is your starting backlog for the next two weeks.
Worked examples
Example 1: Model description (data dictionary entry)
model: fct_orders
purpose: One row per order at final state; used for revenue and fulfillment reporting.
primary_key: order_id
grain: order_id (unique)
refresh: hourly (source: orders_api)
source_trust: medium (refunds can arrive late)
used_by: Finance, Operations BI
caveats:
- Excludes test orders (is_test = false)
- Currency is stored in order_currency; revenue_usd is FX-adjusted
owner:
name: Commerce Analytics
contact: #commerce-analytics Slack channel
Why this works
It answers the top questions: what is this, how fresh is it, what are the gotchas, who owns it.
Example 2: Column-level documentation
columns:
- name: order_id
type: string
description: Unique order identifier from commerce system.
tests: [unique, not_null]
- name: revenue_usd
type: numeric(38,2)
description: Net revenue in USD after discounts and refunds.
formula: gross_amount - discounts - refunds_fx_adjusted
caveats: FX rates from daily snapshot; late refunds update this value.
- name: order_status
type: enum('placed','fulfilled','refunded','cancelled')
description: Final order status.
allowed_values_source: dim_order_status
Example 3: Metric definition and governance
metric: gross_profit_margin
friendly_name: Gross Profit Margin
formula: (revenue - cogs) / revenue
filters:
- exclude is_test = true
- include region in ('NA','EU') for global rollups
time_grain: day
entity_grain: order_id
status: approved
source_of_truth: fct_orders joined to fct_cogs
owner: Finance Analytics
approver: Director of Finance
change_control: PR approval by owner + approver; notify #metrics-change-log
examples:
- daily dashboard tile: "Gross Profit % by Region"
Example 4: Source definition and ownership
source: orders_api
system: Shopify Plus
refresh: every 15 minutes
SLA: < 30 minutes end-to-end to fct_orders
contracts:
- order_id: string, not null
- status: enum, not null
- total_amount: numeric(38,2)
upstream_owner: Commerce Platform Team
contact: on-call runbook "orders_api_oncall"
change_notes: breaking field rename requires 2-week notice
Example 5: Runbook for a failed load and backfill
incident: fct_orders stale >= 2 hours
impact: Revenue dashboards show partial data
steps:
1) Pause downstream dashboards (set banner "Data freshness issue")
2) Check orchestrator logs for job orders_stage_load
3) If API rate-limited: retry with backoff (max 3 attempts)
4) Run backfill for missing window: --start 2024-10-01 --end 2024-10-01
5) Validate: row_count matches source delta ±2%; last_updated_at < 15m
6) Remove dashboard banner and post summary in #data-status
rollback:
- If validation fails, revert to last good snapshot (s3://warehouse/snapshots/fct_orders/prev)
owners: Data Platform On-call
Example 6: Lineage from source to BI (text map)
orders_api --> stg_orders --> int_orders_clean --> fct_orders --> dim_customer_orders --> BI: Revenue Overview
notes:
- stg_orders: schema alignment and type casting
- int_orders_clean: deduplicate, late-arriving refunds
- fct_orders: business logic for revenue_usd and profit
- BI tile "Revenue by Region" selects from dim_customer_orders
Mini task: your first lineage
Choose one BI dashboard tile and write a one-line lineage from source to tile, including key models. Note one transformation at each hop.
Drills and exercises
- [ ] Write a 5-sentence model description for one production table.
- [ ] Add descriptions to 5 high-usage columns, including formula and caveats.
- [ ] Define 1 core metric with owner, formula, and filters.
- [ ] Create a 6-step failure runbook for a job you own.
- [ ] Map lineage for one dashboard end-to-end.
- [ ] Add an “owner” field to two undocumented models.
Stretch drills
- [ ] Add data quality tests for the columns you documented.
- [ ] Create a quarterly doc audit checklist for your team.
- [ ] Propose a doc template and get it approved in code review.
Common mistakes and debugging tips
- Mistake: Writing docs only once. Tip: Treat docs as code—review and version them.
- Mistake: Definitions without grain or filters. Tip: Always state grain and explicit include/exclude rules.
- Mistake: No owner listed. Tip: Require an owner and approver for every model and metric.
- Mistake: Copying SQL into docs without context. Tip: Start with purpose and business question, then add formulas.
- Mistake: Hidden caveats. Tip: Add a “Caveats” line; incomplete is better than silent.
- Mistake: Docs drift from code. Tip: Block merges if docs or tests are missing for changed columns/metrics.
Debugging doc drift
When a dashboard looks wrong: compare metric definition vs. current SQL; check filters, time grain, and joins. If they differ, file a change with owner approval and update docs before merging.
Mini project: Document a critical data flow
Goal: Produce a complete documentation pack for one critical flow (source → marts → BI).
- Deliverables:
- Model description for two models (purpose, grain, caveats)
- Column docs for 8–12 columns across those models
- Source definition (refresh, contracts, owner)
- Metric spec for 1 KPI used on the related dashboard
- Runbook for a likely failure and a 1-day backfill
- One-page onboarding note for analysts: how to query, known pitfalls, examples
- Acceptance criteria:
- Every artifact lists an owner
- Definitions and SQL are consistent
- Docs stored beside code and reviewed via PR
Optional extras
Add a quarterly doc review reminder and a simple checklist that you paste into each PR.
Practical projects (ideas)
- Metric catalog pilot: Define 5 core metrics with owners and publish to the BI home page.
- Runbook library: Convert tribal knowledge into 5 reusable runbooks with validation steps.
- Lineage spotlight: Create a visual or text map for the top 3 dashboards and attach it to each dashboard description.
Subskills
- Data Dictionary And Model Descriptions — Purpose, grain, refresh, caveats, and owner for each model.
- Column Level Documentation — Clear, concise column descriptions with data types, formulas, and constraints.
- Source Definitions And Ownership — Source system metadata, SLAs, contracts, and accountable owners.
- Lineage From Source To BI — Trace how data flows and transforms to support audits and impact analysis.
- Metric Definitions And Governance — Standardize formulas, filters, grain, owner, and change control.
- Runbooks For Failures And Backfills — Step-by-step recovery with validation, rollback, and communication.
- Onboarding Guides For Analysts — Short guides to find data, query safely, and follow team norms.
- Keeping Docs In Sync With Code — PR templates, CI checks, and scheduled audits to prevent drift.
Next steps
- Pick one critical dashboard and complete the mini project.
- Add owners to all high-impact models and metrics.
- Set a recurring doc review (quarterly) and add checks to your PR template.
Exam readiness
When you can define a metric with grain and filters, write a 6-step runbook, and map lineage end-to-end, you are ready to take the skill exam. Anyone can take the exam for free; only logged-in users have their progress saved.