luvv to helpDiscover the Best Free Online Tools
Topic 4 of 8

Data Dictionary And Business Glossary

Learn Data Dictionary And Business Glossary for free with explanations, exercises, and a quick test (for Data Platform Engineer).

Published: January 11, 2026 | Updated: January 11, 2026

Why this matters

As a Data Platform Engineer, you build trustworthy data. A data dictionary and business glossary are how teams agree on what data means and how it should be used. They reduce rework, prevent broken dashboards, and make compliance simpler.

  • Real tasks you will do: define column-level metadata; capture metric formulas; tag PII; align teams on terms like "Active Customer"; document lineage for audits; standardize naming.
  • Outcome: faster onboarding, fewer conflicting definitions, and a safer, compliant platform.

Who this is for

  • Data Platform Engineers and Data Engineers who publish datasets.
  • Analytics Engineers and BI Developers defining metrics and models.
  • Data Stewards and Product Analysts who curate definitions.

Prerequisites

  • Basic SQL (SELECT, JOIN, COUNT).
  • Understanding of tables, columns, data types, and constraints.
  • Familiarity with your organization’s main data domains (e.g., customers, orders, subscriptions).

Concept explained simply

Think of the data dictionary as the parts list for your data tables (column-level details). The business glossary is the rulebook of shared meanings (business terms and metrics). Together, they ensure that when someone says "Active Customer," everyone calculates it the same way and knows which columns and tables support it.

Mental model

Use a two-layer mental model:

  • Layer 1: Physical layer (Data Dictionary) — what exists in storage: columns, types, nullability, allowed values, owners, quality checks, sensitivity.
  • Layer 2: Semantic layer (Business Glossary) — shared meaning: term definitions, formulas, scope, exclusions, synonyms, and source-of-truth datasets.

Map terms in the glossary to columns and datasets in the dictionary. That creates traceability from concept to implementation.

Key components

Data Dictionary — include these fields
  • System/Table/Column names
  • Business-friendly label
  • Data type and format (e.g., varchar(36) UUID)
  • Nullability, uniqueness, primary key
  • Allowed values / reference table
  • Example values
  • Calculation logic (if derived)
  • Lineage (upstream tables, transformations)
  • Data quality checks (constraints, tests)
  • Security classification (e.g., PII, Confidential)
  • Owner and steward
  • Retention policy
  • Last updated
Business Glossary — include these fields
  • Term name and concise definition
  • Business context/scope (what’s included/excluded)
  • Formula (for metrics) with exact logic
  • Time window and grain (e.g., daily, monthly)
  • Source of truth (dataset/view name)
  • Related terms and synonyms
  • Approval status and steward
  • Change history / last review date

Standards and good practices

  • Naming: use clear, lowercase_snake_case for columns; use Title Case for terms.
  • Be unambiguous: include inclusion/exclusion rules; avoid vague words like "recent" or "significant" without exact ranges.
  • Keep versions: when definitions change, version and date them to protect historical reports.
  • Link terms to columns and datasets: every metric should reference the exact dataset and columns used.
  • Classify sensitivity: tag PII/PHI/PCI and define handling rules.
  • Review cadence: set quarterly reviews for top terms and critical columns.

Worked examples

Example 1 — Column dictionary entry
System: analytics_warehouse
Table: dim_customer
Column: customer_id
Label: Customer ID
Type: string (UUID), format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Nullability: NOT NULL
Uniqueness: unique per row (primary key)
Allowed values: must match UUID format
Example values: "a3b72e84-1c9a-4f11-9a7e-f4c1b2e91e33"
Calculation logic: n/a (ingested from crm.customer)
Lineage: crm.customer.customer_id -> stg_crm__customer -> dim_customer.customer_id
Quality checks: not_null, unique, valid_uuid
Security classification: Low
Owner: Data Platform Team
Steward: CRM Operations
Retention: 7 years
Last updated: 2025-09-15
Example 2 — Metric term in glossary
Term: Active Customer
Definition: A customer with at least one billable transaction in the last 90 days.
Scope: Excludes refunded-only transactions; test accounts excluded.
Formula: count_distinct(customer_id) where exists transaction with amount > 0 and transaction_date >= today - 90 days.
Time grain: Daily snapshot
Source of truth: mart_metrics.active_customers_daily
Synonyms: Engaged Customer (deprecated)
Owner: VP Sales Operations
Steward: Analytics Engineering
Approval status: Approved (v1.2 on 2025-10-01)
Related terms: Churned Customer
Example 3 — Handling synonyms and conflicts

Situation: Marketing says "Active User" = logged in in last 30 days. Finance says "Active Customer" = billable transaction in last 90 days. Resolution:
- Keep both terms with different names and scopes.
- Add cross-references and a warning: "Not interchangeable."
- Link each term to its source-of-truth dataset and dashboards.

Example 4 — PII tagging in dictionary
Table: dim_customer
Column: email
Type: string
Security classification: PII (contact)
Policy: Hash for analytics; restrict raw values to role: customer_support
Quality checks: not_null for active customers, valid_email_pattern
Owner: Data Protection Officer
Last updated: 2025-07-08

Build your first entries (step-by-step)

Step 1 — Pick a small scope

Select 1–2 core tables (e.g., dim_customer, fct_orders) and 2–3 key business terms (Active Customer, Monthly Recurring Revenue).

Step 2 — Draft column entries

For each critical column, fill: label, type, nullability, uniqueness, allowed values, lineage, quality checks, owner, sensitivity.

Step 3 — Draft term entries

Write a one-sentence definition, then detail scope, exact formula, time grain, source-of-truth dataset, synonyms, approvals.

Map terms to datasets/columns. Ask a business stakeholder and a data engineer to review for clarity and feasibility.

Step 5 — Publish and version

Publish to your catalog. Record version and last updated. Add a reminder to review quarterly.

Practical projects

  • Create a minimal dictionary for two tables with 15 columns fully documented.
  • Define 5 glossary terms including one with a non-trivial formula (e.g., MRR with proration rules).
  • Add PII classification to all customer attributes and state handling rules.
  • Set up a weekly checklist review with one stakeholder from Sales or Finance.

Common mistakes and self-checks

  • Mistake: Vague definitions ("recent"). Self-check: Replace with a hard number (e.g., 30 days).
  • Mistake: No owner. Self-check: Every entry lists owner and steward.
  • Mistake: Missing formula details (e.g., how to treat refunds). Self-check: Add inclusions/exclusions and edge cases.
  • Mistake: Not linking terms to datasets. Self-check: Each term references a dataset and key columns.
  • Mistake: Ignoring versioning. Self-check: Include version and last updated date.
  • Mistake: Unlabeled PII. Self-check: Tag sensitivity and access policy.

Exercises

The exercises below are available to everyone. If you log in, your progress is saved so you can resume later.

Exercise 1: Column dictionary entry (id: ex1)

Write a complete data dictionary entry for a column customer_status in dim_customer.

  • Include: label, data type, allowed values, nullability, default, example values, lineage, quality checks, sensitivity, owner, retention, last updated.
  • Decide if it’s derived or ingested, and document logic if derived.
Template hint
Table: dim_customer
Column: customer_status
Label:
Type:
Nullability:
Allowed values:
Default:
Example values:
Calculation logic:
Lineage:
Quality checks:
Security classification:
Owner:
Retention:
Last updated:
  • [ ] Allowed values are explicit and mutually exclusive.
  • [ ] Quality checks include at least not_null or accepted_values.
  • [ ] Lineage lists upstream sources or states "authoritative source".

Exercise 2: Term definition (id: ex2)

Define the business term Monthly Recurring Revenue (MRR).

  • Provide: definition, scope, formula, proration/discount treatment, time grain, source of truth, synonyms, owner, steward, approval status.
  • State how upgrades/downgrades mid-month are handled.
Formula hint

Sum over active subscriptions: normalized monthly price after discounts, pro-rated by active_days_in_month / days_in_month.

  • [ ] The formula is executable (SQL-ready or pseudo-SQL).
  • [ ] Exclusions (e.g., free trials) are stated.
  • [ ] Source-of-truth dataset is named.

Quick test

Take the quick test below to check your understanding. Available to everyone; log in to save your progress and see it on your dashboard.

Learning path

  • Step 1: Document 2 critical tables in the data dictionary.
  • Step 2: Define top 5 business terms and link them to datasets.
  • Step 3: Add PII classifications and access notes.
  • Step 4: Set a review cadence and version your entries.
  • Step 5: Expand coverage to the next domain (e.g., billing).

Next steps

  • Run a 30-minute review with a business stakeholder to validate one term.
  • Add two data quality checks to a critical column and record them in the dictionary.
  • Schedule quarterly reviews for your top 10 terms.

Mini challenge

Pick one ambiguous term used in your org (e.g., "Churn"). Draft two competing definitions from different teams’ perspectives. Merge them into one approved definition with precise scope and formula, and link it to specific columns and a source-of-truth dataset.

Practice Exercises

2 exercises to complete

Instructions

Write a full dictionary entry for dim_customer.customer_status. Decide allowed values and whether it is derived. Include lineage and quality checks.

  • Fields to include: label, data type, allowed values, nullability, default, example values, calculation logic, lineage, quality checks, sensitivity, owner, retention, last updated.
  • State handling for unexpected values.
Expected Output
A structured entry covering all required fields with unambiguous allowed values, at least one quality check, and a clear lineage statement.

Data Dictionary And Business Glossary — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Data Dictionary And Business Glossary?

AI Assistant

Ask questions about this tool