Who this is for
- BI Developers who build dashboards and datasets that must be trusted across teams.
- Analytics Engineers documenting models for reuse.
- Data Stewards formalizing definitions to reduce metric confusion.
Prerequisites
- Basic SQL (SELECT, JOIN).
- Familiarity with your data warehouse tables and BI metrics.
- Ability to edit a spreadsheet or shared doc.
Why this matters
In BI work, most rework comes from unclear definitions: different teams use different filters for the same KPI, or columns are misused because their meaning isn’t documented. A data dictionary and business glossary fix this by defining fields and terms once, and making them discoverable. You’ll use them to:
- Design trustworthy dashboards (definitions match stakeholder expectations).
- Onboard new teammates quickly (clear column/metric meaning).
- Prevent metric drift (one place to propose and approve changes).
- Speed up debugging (lineage, owners, and quality rules are known).
Concept explained simply
Data Dictionary = technical facts about data fields (columns): name, data type, format, meaning, allowed values, source, transformations, freshness, owners.
Business Glossary = business terms and metrics: what the term means in plain language, exact formula/filters, scope, synonyms, stakeholders, and examples.
Mental model
Think of the glossary as the dictionary at the front of a book (shared language for readers), and the data dictionary as the detailed appendix (field-level specs for builders). The glossary tells everyone what “Active Customer” means. The data dictionary tells the developer which table and column implement it, the type, constraints, and where it came from.
What to capture
Data dictionary essentials
- Table and column name (with friendly name if needed).
- Business meaning (1–2 sentence plain-language description).
- Data type and format (e.g., DECIMAL(12,2), ISO-8601 date, currency code).
- Allowed values and constraints (ENUM list, NOT NULL, unique, ranges, reference table).
- Source system and lineage (where it originally came from, major transforms).
- Granularity (row = one order line, one customer-day, etc.).
- Units/timezone (USD, kg, UTC).
- Refresh cadence and last load timestamp field.
- Quality rules (e.g., must be non-negative; foreign key must exist).
- Owner/steward and contact.
- Sensitivity classification (PII, confidential).
Business glossary essentials
- Term/metric name and concise definition.
- Exact formula or logical rule (with filters and exclusions).
- Scope (e.g., “ecommerce only”, “US market”).
- Freshness expectation (e.g., daily by 07:00 UTC).
- Synonyms and deprecated terms (“Revenue” synonym of “Net Sales”?)
- Examples and edge cases (what is excluded?).
- Owner/steward and review cycle.
Worked examples
Example 1: Column in the dictionary
Table: fact_sales Column: order_total_amount Friendly name: Order Total Business meaning: Final charged amount per order after discounts and before tax. Type: DECIMAL(12,2) USD Allowed values: >= 0 Source: checkout_service.order_total Lineage: from raw_orders.total less discount_total; tax excluded in transform step Granularity: 1 row per order Refresh: Hourly; last_load_ts column Quality rules: Non-negative; if currency != 'USD', convert using fx_rates table Owner: Sales Analytics Sensitivity: Non-PII
Example 2: Metric in the glossary
Term: Active Customer Definition: A customer with at least one completed order in the last 30 calendar days. Formula: COUNT(DISTINCT customer_id) where order_status = 'completed' and order_date >= current_date - 30 Scope: Global ecommerce only Freshness: Daily by 07:00 UTC Synonyms: none; DO NOT use 'Returning User' as a synonym Examples: A customer with only refunded orders is NOT active Owner: Growth Analytics
Example 3: Resolve synonyms
Problem: Finance says “Revenue”, Sales says “Net Sales”.
Glossary decision: Keep “Net Sales” as the primary term. Set “Revenue” as a deprecated synonym with a redirect note to “Net Sales”. The dictionary then points to the implemented field or metric view that calculates Net Sales with consistent filters.
Example 4: Allowed values and reference mapping
Term: Order Status
Allowed values: pending, completed, cancelled, refunded
Mapping: raw_system: {PEND → pending, DONE → completed, CANC → cancelled, RFND → refunded}
Rule: Any new status must be added here before appearing in dashboardsBuild it step-by-step
- Pick a high-impact table and 3–5 key metrics (start small).
- Draft plain-language definitions with stakeholders (sales, finance, ops).
- Fill technical fields for the data dictionary (type, constraints, lineage, owners).
- Validate with real rows (sample data) and at least one dashboard that uses them.
- Publish in a shared place; add owners and a change review cadence.
- Adopt naming rules (singular vs plural, snake_case, disambiguate with prefixes).
- Add change logs and version notes when definitions evolve.
Templates you can copy
Data dictionary (columns)
table | column | friendly_name | business_meaning | data_type | format | allowed_values | source | lineage | granularity | unit | timezone | refresh | quality_rules | owner | sensitivity
Business glossary (metrics/terms)
term | definition | formula_or_logic | scope | freshness | synonyms | deprecated_terms | examples | owner | review_cycle
Exercises
Note: The quick test is available to everyone; only logged-in users will have results saved.
Exercise 1 — Create a mini data dictionary
Using the template above, document the following columns for table fact_sales:
- order_id, order_date, customer_id, order_total_amount, currency_code, order_status
Include: friendly name, business meaning (1–2 sentences), data type, allowed values or reference table, source, lineage note, refresh cadence, owner, sensitivity.
Show checklist
- Each column has a clear, plain-language meaning.
- Type and constraints are specified.
- Status values and mapping rules are listed.
- Owner and sensitivity are filled.
Exercise 2 — Write glossary entries
Create glossary entries for these terms: Net Sales, Gross Sales, Refund Rate, Active Customer, Average Order Value.
For each, write a precise definition, formula with filters, scope, synonyms or deprecated terms, and one example/edge case.
Show checklist
- Definition is unambiguous and testable.
- Formula/filters are explicit (no hidden assumptions).
- Scope, freshness, and owner provided.
- Synonyms/deprecations clarified.
Common mistakes and how to self-check
- Vague definitions (“usually”, “around”, “often”). Fix: use exact filters, time windows, and inclusions/exclusions.
- Hiding filters in SQL only. Fix: write filters in the glossary so non-technical readers understand them.
- Forgetting refresh expectations. Fix: add freshness targets and last-load fields.
- No owner. Fix: assign owner/steward; log review dates.
- Not documenting units/timezones. Fix: always specify currency, unit, and timezone.
- Duplicated terms. Fix: merge and mark one as deprecated with redirect.
Self-check mini list
- Can a new analyst compute the metric with only your glossary?
- Can a developer validate values and constraints from your dictionary?
- Is there exactly one canonical name per metric?
Practical projects
- Project 1: Document one fact table and two dimension tables end-to-end. Include quality rules and lineage notes.
- Project 2: Standardize three revenue-related terms with Finance and Sales; publish synonyms and deprecations.
- Project 3: Add a weekly change log to your glossary and run a 15-min review ritual with stakeholders.
Mini challenge
Your marketing dashboard shows “Active Users” higher than Product’s report. Propose a glossary entry that explains the difference. Include scope, filters, and a note on why they differ. Then update the dictionary to point to the correct source fields for each version.
Learning path
- Learn the difference between dictionary (fields) and glossary (terms).
- Document a single critical table and 3–5 KPIs.
- Add owners, freshness, and quality rules.
- Introduce change control and deprecate duplicates.
- Automate pulls for data types and freshness where possible.
- Expand coverage and integrate with dashboards as tooltips or field help.
Next steps
- Extend documentation to lineage and ownership across pipelines.
- Add data quality checks that reference your definitions.
- Embed definitions into BI fields and dashboard tooltips.