Why this matters
Clear data dictionaries and model descriptions turn your warehouse into a product people trust. In daily Analytics Engineering work, you will:
- Define business terms once and reuse them across models and dashboards.
- Unblock analysts and stakeholders who need to know what a column means.
- Reduce bugs by making assumptions explicit (nullable, allowed values, grain).
- Speed up onboarding and handoffs with consistent, searchable docs.
Concept explained simply
A data dictionary is a catalog of tables and columns: names, definitions, data types, allowed values, nullability, examples, source, and sensitivity. Model descriptions explain a model as a whole: purpose, grain, primary keys, important measures and dimensions, lineage, known limitations, and owners.
Mental model
- Data dictionary: glossary for every field.
- Model description: README for each model.
- Together: a single source of truth that answers What, How, and Why.
Use this lightweight template
Copy and adapt as needed.
Model Description - Name: - Purpose (1–2 sentences): - Grain: - Primary key(s): - Freshness/Update cadence: - Upstream sources: - Downstream consumers: - Business logic highlights (rules, filters): - Measures & Dimensions (key ones): - Data quality tests (key checks): - Sensitivity/Privacy: - Owner(s): - Last reviewed (date): Data Dictionary (per column) - Column name: - Data type: - Business definition: - Allowed values / Range: - Nullability: - Example value: - Calculation/source: - Notes (edge cases, timezone, units): - Sensitivity (PII/PHI/None):
Worked examples
Example 1 — Dimension table: dim_customer
Model: dim_customer Purpose: One row per customer for analytics and segmentation. Grain: Customer (customer_id) Primary key(s): customer_id Upstream sources: raw.customers, stg_subscriptions Business logic highlights: active_customer = last_order_date within 365 days OR subscription_status = 'active'. Owner(s): Analytics Engineering Last reviewed: 2025-01-10 Columns - customer_id Type: string Definition: Stable unique ID for a customer across systems. Allowed values: Non-empty alphanumeric Nullability: Not null Example: C_004923 Sensitivity: None - email Type: string Definition: Primary contact email at time of latest record. Allowed values: RFC 5322 compliant Nullability: Nullable (not all customers provide email) Example: a.user@example.com Sensitivity: PII - signup_date Type: date Definition: First successful registration date (UTC). Nullability: Not null Example: 2024-05-14 Notes: UTC timezone - active_customer Type: boolean Definition: True if last_order_date within 365 days OR subscription_status = 'active'. Allowed values: true/false Nullability: Not null Example: true Tests: assert boolean and non-null
Example 2 — Fact table: fct_orders
Model: fct_orders Purpose: One row per order; supports revenue reporting. Grain: Order (order_id) Primary key(s): order_id Measures: order_total_usd, discount_usd, tax_usd Dimensions: order_date, channel, region Data quality tests: uniqueness(order_id), not_null(order_id), non_negative(order_total_usd) Owner(s): Analytics Engineering Columns - order_id: string | Not null | Unique id for each order. - order_date: date | Not null | Order creation date (UTC). - customer_id: string | Not null | Links to dim_customer.customer_id. - order_total_usd: numeric(12,2) | Not null | Net total after discounts but before tax. - tax_usd: numeric(12,2) | Nullable | Applied tax amount. - discount_usd: numeric(12,2) | Nullable | Discounts applied at checkout. - channel: string | Nullable | 'web', 'ios', 'android', 'partner'.
Example 3 — Business rule clarity: active subscription
Term: active_subscription
Definition: A customer is active if they have at least one subscription with status 'active' and end_date in the future.
Common pitfalls: 'trialing' is NOT active; paused counts as inactive.
Where used: dim_customer.active_customer, churn dashboard KPI.
Owner(s): Finance + Analytics
Tests: Ensure values only {true,false} and definition applied consistently across models.
How to write great model descriptions
- State the model's purpose in one sentence.
- Declare the grain and primary keys.
- List key measures and dimensions (not every column, just the important ones here).
- Note critical business rules and filters.
- Capture upstream inputs and main downstream consumers.
- Add data quality checks that protect the definition.
- Assign owners and a review date.
Quality checklist
- Each model has a purpose, grain, and primary key documented.
- Each column has a clear business definition and data type.
- Nullability and allowed values are explicit (including units and timezone).
- Business rules are written in plain language and are testable.
- Owners and last reviewed dates are filled in.
- Sensitive fields are tagged (PII/PHI) to guide access controls.
- Examples provided for ambiguous fields.
Common mistakes and how to self-check
- Mistake: Using vague terms like recent or large. Fix: Specify exact windows and thresholds (e.g., 30 days, amount >= 100).
- Mistake: Skipping nullability. Fix: Document when nulls are expected and why.
- Mistake: Mixing technical SQL with business language. Fix: Put the business rule first; include SQL only if needed for clarity.
- Mistake: Duplicated definitions across models. Fix: Centralize shared terms and reference them consistently.
- Mistake: Not documenting units/timezones. Fix: Add units (USD, kg) and timezone (UTC) everywhere.
Self-check mini audit
- Pick one top dashboard and trace each metric to a model and column. Are all definitions findable?
- Search for active in your docs. Are there conflicting definitions?
- Randomly select 5 columns. Do they each have type, definition, nullability, and example?
Exercises
Do these right below or in your notes. The Quick Test at the end is available to everyone; saving progress requires login.
-
Exercise ex1: Write a data dictionary for a customer table with columns: customer_id, email, signup_date, marketing_opt_in (Y/N), country.
See expected output format
Use the template: column name, type, business definition, allowed values, nullability, example, sensitivity.
-
Exercise ex2: Draft a model description for a model named mart_customer_health that classifies customers into health_buckets based on NPS and activity in the last 90 days.
See expected output format
Include purpose, grain, keys, key measures/dimensions, business rules, inputs, quality checks, owner, last reviewed.
Practical projects
- Create a complete dictionary for your top 5 models (one dimension, one fact, three marts). Add at least one example value for each ambiguous column.
- Standardize business terms (active, churned, new) in a shared glossary section and reference them from model descriptions.
- Add basic data quality tests aligned with your definitions (e.g., non-null PK, value in set, non-negative amounts) and document them.
Who this is for
- Analytics Engineers and BI Developers creating or maintaining warehouse models.
- Analysts who need to formalize metric and dimension definitions.
Prerequisites
- Comfort reading SQL schemas (tables, columns, types).
- Basic understanding of dimensional modeling (facts, dimensions, grain).
Learning path
- Start with a small, high-impact model (e.g., fct_orders).
- Document 10 most-used columns fully.
- Add model description (purpose, grain, PK, business rules).
- Expand to related dimension model; align shared terms.
- Review with stakeholders; refine unclear terms.
Mini challenge
Pick a model you did not build. In 30 minutes, write a draft description and dictionary for 8 columns. Then ask a stakeholder to spot anything unclear. Update accordingly.
Next steps
- Apply the template to one more model this week.
- Schedule a quarterly doc review for critical metrics.
- Add tags for sensitivity and ownership to guide access and support.
Quick Test
Take the short test to check your understanding. Anyone can take it; login is only needed to save progress.