Skill Not Found

Why documentation matters for Analytics Engineers

Documentation is your contract with the business and future-you. It reduces support requests, speeds up onboarding, prevents metric drift, and makes changes safer. Good docs turn a data platform into a product that people trust.

Fewer Slack questions and ad-hoc meetings
Predictable releases and safer refactors
Clear ownership and accountability
Enable self-serve analytics and consistent metrics

Quick mindset shift

Treat docs as part of the deliverable. A model or metric is not done until it is reliably discoverable, understandable, and maintainable.

Who this is for

Analytics Engineers building and maintaining models, metrics, and BI layers
Data/BI Engineers who want stronger collaboration with analysts and business teams
Senior Analysts transitioning to an engineering approach

Prerequisites

Comfort with SQL and basic data modeling (staging → marts)
Familiarity with version control (Git) and code review flow
Basic knowledge of your BI tool and pipeline orchestrator

What you will be able to do

Write concise table and column docs that answer business questions
Define consistent metrics with clear governance
Capture lineage from source to BI for audits and impact analysis
Create runbooks for failures and backfills
Keep docs in sync with code through reviews and automation

Learning path (practical roadmap)

1) Inventory and triage

List your top 10 business-critical models and dashboards. Note missing docs, unclear owners, and metric inconsistencies.

2) Data dictionary foundation

Add model descriptions and column-level docs for the top models. Focus on purpose, grain, filters, and caveats.

3) Lineage and sources

Document each source system, refresh SLAs, and owners. Sketch lineage from source to BI for your critical flows.

4) Metric definitions

Standardize core metrics (e.g., Revenue, Active Users). Define formula, filters, grain, and authoritative owner/approver.

5) Runbooks

Write failure and backfill runbooks for 3 critical pipelines. Include validation and rollback steps.

6) Onboarding guides

Create a short path for new analysts: where to find data, how to request changes, and example queries.

7) Keep docs in sync

Add doc checks to code review, require owners, and schedule quarterly audits.

Mini task: start your doc backlog

Create a 2-column list: Model/Dashboard and Missing/Confusing Docs. Limit to 10 items. This is your starting backlog for the next two weeks.

Worked examples

Example 1: Model description (data dictionary entry)

model: fct_orders
purpose: One row per order at final state; used for revenue and fulfillment reporting.
primary_key: order_id
grain: order_id (unique)
refresh: hourly (source: orders_api)
source_trust: medium (refunds can arrive late)
used_by: Finance, Operations BI
caveats:
  - Excludes test orders (is_test = false)
  - Currency is stored in order_currency; revenue_usd is FX-adjusted
owner:
  name: Commerce Analytics
  contact: #commerce-analytics Slack channel

Why this works

It answers the top questions: what is this, how fresh is it, what are the gotchas, who owns it.

Example 2: Column-level documentation

columns:
  - name: order_id
    type: string
    description: Unique order identifier from commerce system.
    tests: [unique, not_null]
  - name: revenue_usd
    type: numeric(38,2)
    description: Net revenue in USD after discounts and refunds.
    formula: gross_amount - discounts - refunds_fx_adjusted
    caveats: FX rates from daily snapshot; late refunds update this value.
  - name: order_status
    type: enum('placed','fulfilled','refunded','cancelled')
    description: Final order status.
    allowed_values_source: dim_order_status

Example 3: Metric definition and governance

metric: gross_profit_margin
friendly_name: Gross Profit Margin
formula: (revenue - cogs) / revenue
filters:
  - exclude is_test = true
  - include region in ('NA','EU') for global rollups
time_grain: day
entity_grain: order_id
status: approved
source_of_truth: fct_orders joined to fct_cogs
owner: Finance Analytics
approver: Director of Finance
change_control: PR approval by owner + approver; notify #metrics-change-log
examples:
  - daily dashboard tile: "Gross Profit % by Region"

Example 4: Source definition and ownership

source: orders_api
system: Shopify Plus
refresh: every 15 minutes
SLA: < 30 minutes end-to-end to fct_orders
contracts:
  - order_id: string, not null
  - status: enum, not null
  - total_amount: numeric(38,2)
upstream_owner: Commerce Platform Team
contact: on-call runbook "orders_api_oncall"
change_notes: breaking field rename requires 2-week notice

Example 5: Runbook for a failed load and backfill

incident: fct_orders stale >= 2 hours
impact: Revenue dashboards show partial data
steps:
  1) Pause downstream dashboards (set banner "Data freshness issue")
  2) Check orchestrator logs for job orders_stage_load
  3) If API rate-limited: retry with backoff (max 3 attempts)
  4) Run backfill for missing window: --start 2024-10-01 --end 2024-10-01
  5) Validate: row_count matches source delta ±2%; last_updated_at < 15m
  6) Remove dashboard banner and post summary in #data-status
rollback:
  - If validation fails, revert to last good snapshot (s3://warehouse/snapshots/fct_orders/prev)
owners: Data Platform On-call

Example 6: Lineage from source to BI (text map)

orders_api --> stg_orders --> int_orders_clean --> fct_orders --> dim_customer_orders --> BI: Revenue Overview
notes:
  - stg_orders: schema alignment and type casting
  - int_orders_clean: deduplicate, late-arriving refunds
  - fct_orders: business logic for revenue_usd and profit
  - BI tile "Revenue by Region" selects from dim_customer_orders

Mini task: your first lineage

Choose one BI dashboard tile and write a one-line lineage from source to tile, including key models. Note one transformation at each hop.

Drills and exercises

[ ] Write a 5-sentence model description for one production table.
[ ] Add descriptions to 5 high-usage columns, including formula and caveats.
[ ] Define 1 core metric with owner, formula, and filters.
[ ] Create a 6-step failure runbook for a job you own.
[ ] Map lineage for one dashboard end-to-end.
[ ] Add an “owner” field to two undocumented models.

Stretch drills

[ ] Add data quality tests for the columns you documented.
[ ] Create a quarterly doc audit checklist for your team.
[ ] Propose a doc template and get it approved in code review.

Common mistakes and debugging tips

Mistake: Writing docs only once. Tip: Treat docs as code—review and version them.
Mistake: Definitions without grain or filters. Tip: Always state grain and explicit include/exclude rules.
Mistake: No owner listed. Tip: Require an owner and approver for every model and metric.
Mistake: Copying SQL into docs without context. Tip: Start with purpose and business question, then add formulas.
Mistake: Hidden caveats. Tip: Add a “Caveats” line; incomplete is better than silent.
Mistake: Docs drift from code. Tip: Block merges if docs or tests are missing for changed columns/metrics.

Debugging doc drift

When a dashboard looks wrong: compare metric definition vs. current SQL; check filters, time grain, and joins. If they differ, file a change with owner approval and update docs before merging.

Mini project: Document a critical data flow

Goal: Produce a complete documentation pack for one critical flow (source → marts → BI).

Deliverables:
- Model description for two models (purpose, grain, caveats)
- Column docs for 8–12 columns across those models
- Source definition (refresh, contracts, owner)
- Metric spec for 1 KPI used on the related dashboard
- Runbook for a likely failure and a 1-day backfill
- One-page onboarding note for analysts: how to query, known pitfalls, examples
Acceptance criteria:
- Every artifact lists an owner
- Definitions and SQL are consistent
- Docs stored beside code and reviewed via PR

Optional extras

Add a quarterly doc review reminder and a simple checklist that you paste into each PR.

Practical projects (ideas)

Metric catalog pilot: Define 5 core metrics with owners and publish to the BI home page.
Runbook library: Convert tribal knowledge into 5 reusable runbooks with validation steps.
Lineage spotlight: Create a visual or text map for the top 3 dashboards and attach it to each dashboard description.

Subskills

Data Dictionary And Model Descriptions — Purpose, grain, refresh, caveats, and owner for each model.
Column Level Documentation — Clear, concise column descriptions with data types, formulas, and constraints.
Source Definitions And Ownership — Source system metadata, SLAs, contracts, and accountable owners.
Lineage From Source To BI — Trace how data flows and transforms to support audits and impact analysis.
Metric Definitions And Governance — Standardize formulas, filters, grain, owner, and change control.
Runbooks For Failures And Backfills — Step-by-step recovery with validation, rollback, and communication.
Onboarding Guides For Analysts — Short guides to find data, query safely, and follow team norms.
Keeping Docs In Sync With Code — PR templates, CI checks, and scheduled audits to prevent drift.

Next steps

Pick one critical dashboard and complete the mini project.
Add owners to all high-impact models and metrics.
Set a recurring doc review (quarterly) and add checks to your PR template.

Exam readiness

When you can define a metric with grain and filters, write a 6-step runbook, and map lineage end-to-end, you are ready to take the skill exam. Anyone can take the exam for free; only logged-in users have their progress saved.

Menu

Documentation

Table of Contents

Why documentation matters for Analytics Engineers

Who this is for

Prerequisites

What you will be able to do

Learning path (practical roadmap)

1) Inventory and triage

2) Data dictionary foundation

3) Lineage and sources

4) Metric definitions

5) Runbooks

6) Onboarding guides

7) Keep docs in sync

Worked examples

Example 1: Model description (data dictionary entry)

Example 2: Column-level documentation

Example 3: Metric definition and governance

Example 4: Source definition and ownership

Example 5: Runbook for a failed load and backfill

Example 6: Lineage from source to BI (text map)

Drills and exercises

Common mistakes and debugging tips

Mini project: Document a critical data flow

Practical projects (ideas)

Subskills

Next steps

Documentation — Skill Exam

Topics

Data Dictionary And Model Descriptions

Metric Definitions And Governance

Column Level Documentation

Source Definitions And Ownership

Lineage From Source To BI

Runbooks For Failures And Backfills

Onboarding Guides For Analysts

Keeping Docs In Sync With Code

Have questions about Documentation?

AI Assistant