luvv to helpDiscover the Best Free Online Tools
Topic 1 of 8

Data Dictionary And Model Descriptions

Learn Data Dictionary And Model Descriptions for free with explanations, exercises, and a quick test (for Analytics Engineer).

Published: December 23, 2025 | Updated: December 23, 2025

Why this matters

Clear data dictionaries and model descriptions turn your warehouse into a product people trust. In daily Analytics Engineering work, you will:

  • Define business terms once and reuse them across models and dashboards.
  • Unblock analysts and stakeholders who need to know what a column means.
  • Reduce bugs by making assumptions explicit (nullable, allowed values, grain).
  • Speed up onboarding and handoffs with consistent, searchable docs.

Concept explained simply

A data dictionary is a catalog of tables and columns: names, definitions, data types, allowed values, nullability, examples, source, and sensitivity. Model descriptions explain a model as a whole: purpose, grain, primary keys, important measures and dimensions, lineage, known limitations, and owners.

Mental model

  • Data dictionary: glossary for every field.
  • Model description: README for each model.
  • Together: a single source of truth that answers What, How, and Why.
Use this lightweight template

Copy and adapt as needed.

Model Description
- Name:
- Purpose (1–2 sentences):
- Grain:
- Primary key(s):
- Freshness/Update cadence:
- Upstream sources:
- Downstream consumers:
- Business logic highlights (rules, filters):
- Measures & Dimensions (key ones):
- Data quality tests (key checks):
- Sensitivity/Privacy:
- Owner(s):
- Last reviewed (date):

Data Dictionary (per column)
- Column name:
- Data type:
- Business definition:
- Allowed values / Range:
- Nullability:
- Example value:
- Calculation/source:
- Notes (edge cases, timezone, units):
- Sensitivity (PII/PHI/None):

Worked examples

Example 1 — Dimension table: dim_customer
Model: dim_customer
Purpose: One row per customer for analytics and segmentation.
Grain: Customer (customer_id)
Primary key(s): customer_id
Upstream sources: raw.customers, stg_subscriptions
Business logic highlights: active_customer = last_order_date within 365 days OR subscription_status = 'active'.
Owner(s): Analytics Engineering
Last reviewed: 2025-01-10

Columns
- customer_id
  Type: string
  Definition: Stable unique ID for a customer across systems.
  Allowed values: Non-empty alphanumeric
  Nullability: Not null
  Example: C_004923
  Sensitivity: None

- email
  Type: string
  Definition: Primary contact email at time of latest record.
  Allowed values: RFC 5322 compliant
  Nullability: Nullable (not all customers provide email)
  Example: a.user@example.com
  Sensitivity: PII

- signup_date
  Type: date
  Definition: First successful registration date (UTC).
  Nullability: Not null
  Example: 2024-05-14
  Notes: UTC timezone

- active_customer
  Type: boolean
  Definition: True if last_order_date within 365 days OR subscription_status = 'active'.
  Allowed values: true/false
  Nullability: Not null
  Example: true
  Tests: assert boolean and non-null
Example 2 — Fact table: fct_orders
Model: fct_orders
Purpose: One row per order; supports revenue reporting.
Grain: Order (order_id)
Primary key(s): order_id
Measures: order_total_usd, discount_usd, tax_usd
Dimensions: order_date, channel, region
Data quality tests: uniqueness(order_id), not_null(order_id), non_negative(order_total_usd)
Owner(s): Analytics Engineering

Columns
- order_id: string | Not null | Unique id for each order.
- order_date: date | Not null | Order creation date (UTC).
- customer_id: string | Not null | Links to dim_customer.customer_id.
- order_total_usd: numeric(12,2) | Not null | Net total after discounts but before tax.
- tax_usd: numeric(12,2) | Nullable | Applied tax amount.
- discount_usd: numeric(12,2) | Nullable | Discounts applied at checkout.
- channel: string | Nullable | 'web', 'ios', 'android', 'partner'.
Example 3 — Business rule clarity: active subscription
Term: active_subscription
Definition: A customer is active if they have at least one subscription with status 'active' and end_date in the future.
Common pitfalls: 'trialing' is NOT active; paused counts as inactive.
Where used: dim_customer.active_customer, churn dashboard KPI.
Owner(s): Finance + Analytics
Tests: Ensure values only {true,false} and definition applied consistently across models.

How to write great model descriptions

  1. State the model's purpose in one sentence.
  2. Declare the grain and primary keys.
  3. List key measures and dimensions (not every column, just the important ones here).
  4. Note critical business rules and filters.
  5. Capture upstream inputs and main downstream consumers.
  6. Add data quality checks that protect the definition.
  7. Assign owners and a review date.

Quality checklist

  • Each model has a purpose, grain, and primary key documented.
  • Each column has a clear business definition and data type.
  • Nullability and allowed values are explicit (including units and timezone).
  • Business rules are written in plain language and are testable.
  • Owners and last reviewed dates are filled in.
  • Sensitive fields are tagged (PII/PHI) to guide access controls.
  • Examples provided for ambiguous fields.

Common mistakes and how to self-check

  • Mistake: Using vague terms like recent or large. Fix: Specify exact windows and thresholds (e.g., 30 days, amount >= 100).
  • Mistake: Skipping nullability. Fix: Document when nulls are expected and why.
  • Mistake: Mixing technical SQL with business language. Fix: Put the business rule first; include SQL only if needed for clarity.
  • Mistake: Duplicated definitions across models. Fix: Centralize shared terms and reference them consistently.
  • Mistake: Not documenting units/timezones. Fix: Add units (USD, kg) and timezone (UTC) everywhere.
Self-check mini audit
  • Pick one top dashboard and trace each metric to a model and column. Are all definitions findable?
  • Search for active in your docs. Are there conflicting definitions?
  • Randomly select 5 columns. Do they each have type, definition, nullability, and example?

Exercises

Do these right below or in your notes. The Quick Test at the end is available to everyone; saving progress requires login.

  1. Exercise ex1: Write a data dictionary for a customer table with columns: customer_id, email, signup_date, marketing_opt_in (Y/N), country.
    See expected output format

    Use the template: column name, type, business definition, allowed values, nullability, example, sensitivity.

  2. Exercise ex2: Draft a model description for a model named mart_customer_health that classifies customers into health_buckets based on NPS and activity in the last 90 days.
    See expected output format

    Include purpose, grain, keys, key measures/dimensions, business rules, inputs, quality checks, owner, last reviewed.

Practical projects

  • Create a complete dictionary for your top 5 models (one dimension, one fact, three marts). Add at least one example value for each ambiguous column.
  • Standardize business terms (active, churned, new) in a shared glossary section and reference them from model descriptions.
  • Add basic data quality tests aligned with your definitions (e.g., non-null PK, value in set, non-negative amounts) and document them.

Who this is for

  • Analytics Engineers and BI Developers creating or maintaining warehouse models.
  • Analysts who need to formalize metric and dimension definitions.

Prerequisites

  • Comfort reading SQL schemas (tables, columns, types).
  • Basic understanding of dimensional modeling (facts, dimensions, grain).

Learning path

  1. Start with a small, high-impact model (e.g., fct_orders).
  2. Document 10 most-used columns fully.
  3. Add model description (purpose, grain, PK, business rules).
  4. Expand to related dimension model; align shared terms.
  5. Review with stakeholders; refine unclear terms.

Mini challenge

Pick a model you did not build. In 30 minutes, write a draft description and dictionary for 8 columns. Then ask a stakeholder to spot anything unclear. Update accordingly.

Next steps

  • Apply the template to one more model this week.
  • Schedule a quarterly doc review for critical metrics.
  • Add tags for sensitivity and ownership to guide access and support.

Quick Test

Take the short test to check your understanding. Anyone can take it; login is only needed to save progress.

Practice Exercises

2 exercises to complete

Instructions

You have a table dim_customer with columns: customer_id, email, signup_date, marketing_opt_in (Y/N), country. Write a data dictionary entry for each column using the template. Be explicit about types, nullability, allowed values, examples, and sensitivity.

Expected Output
Columns - customer_id | string | Unique customer id across systems | Not null | Example: C_10023 | Sensitivity: None - email | string | Primary contact email | Nullable | Example: a.user@example.com | Allowed: RFC 5322 | Sensitivity: PII - signup_date | date | First successful registration date (UTC) | Not null | Example: 2024-11-02 | Sensitivity: None - marketing_opt_in | string | Customer consent to marketing communications | Allowed: {'Y','N'} | Not null | Example: 'Y' | Sensitivity: PII - country | string | ISO 3166-1 alpha-2 country code at signup | Nullable | Allowed: 2-letter codes | Example: 'US' | Sensitivity: None

Data Dictionary And Model Descriptions — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Data Dictionary And Model Descriptions?

AI Assistant

Ask questions about this tool