luvv to helpDiscover the Best Free Online Tools
Topic 1 of 8

Data Dictionary And Metric Glossary

Learn Data Dictionary And Metric Glossary for free with explanations, exercises, and a quick test (for BI Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Why this matters

As a BI Analyst, you turn raw data into reliable answers. A clear Data Dictionary and Metric Glossary make your dashboards trustworthy, keep SQL consistent across teams, and reduce back-and-forth with stakeholders. You will use them to onboard new datasets, prevent double counting (like revenue), and document the exact logic behind KPIs such as Active Users or Conversion Rate.

  • Real task: Onboard a new ecommerce dataset and define fields like order_id, user_id, and order_status.
  • Real task: Standardize what "Revenue" means across Sales and Finance so dashboards match.
  • Real task: Explain why two dashboards show different numbers and resolve by aligning metric definitions.

Concept explained simply

Data Dictionary: A plain-language reference for your data structures. It lists tables, columns, data types, formats, allowed values, and what each field means.

Metric Glossary: The canonical guide to how metrics are calculated. It includes the business definition, SQL sketch, filters, time grain, exclusions, owners, and caveats.

Mental model

Think of the Data Dictionary as the "map" of your data, and the Metric Glossary as the "rules of the game." Together, they form a contract between data and business. If everyone uses the same map and rules, numbers line up.

Core components you should capture

  • Name: Stable, descriptive, lowercase_with_underscores.
  • Type and format: int, decimal(10,2), date (YYYY-MM-DD), string, boolean.
  • Allowed values and examples: e.g., order_status in {placed, paid, shipped, cancelled}.
  • Business definition: What this means to non-technical stakeholders.
  • Calculation logic (for metrics): SQL sketch or formula including filters and joins.
  • Time grain and time zone: e.g., daily in UTC.
  • Owner and approver: Person or team responsible.
  • Source and lineage: Where the field/metric comes from (table/view, upstream system).
  • Freshness/SLA: Expected update frequency and delay (e.g., daily by 03:00 UTC).
  • Quality caveats: Known issues, exclusions, edge cases.
  • Status and version: draft, approved, deprecated; version numbers for major changes.

Naming, versions, and statuses

  • Be explicit: total_revenue_usd, active_users_7d.
  • Avoid ambiguity: Do not reuse the same name for different logic.
  • Version breaking changes: e.g., net_revenue_v2. Mark old version as deprecated with a removal date.
  • Statuses: draft (under review), approved (use in dashboards), deprecated (do not use; has replacement).

Worked examples

Example 1: Data Dictionary entry for orders.order_id
  • Name: order_id
  • Table: orders
  • Type: string
  • Business definition: Unique identifier for a customer order at the header level.
  • Allowed values: Non-null, unique within orders.
  • Example: ORD_102938
  • Owner: Data Platform
  • Status: approved
Example 2: Metric Glossary entry for total_revenue_usd
  • Business definition: Sum of item-level revenue in USD from paid or shipped orders, excluding cancelled or test orders.
  • SQL sketch: sum(oi.quantity * oi.unit_price_usd) from order_items oi join orders o on oi.order_id = o.order_id where o.order_status in ('paid','shipped') and o.is_test = false.
  • Time grain: daily (order_date in UTC).
  • Owner: Finance Analytics
  • Caveats: Refunds handled in refunds table; use net_revenue_usd if refunds included.
  • Status: approved
Example 3: Metric Glossary entry for active_users_7d
  • Business definition: Count of distinct users with at least one qualifying session event in the last 7 days (rolling window).
  • SQL sketch: count(distinct user_id) from events where event_name in ('session_start','purchase') and event_timestamp between current_date - interval '6 day' and current_date.
  • Time grain: daily, timezone UTC.
  • Owner: Product Analytics
  • Caveats: Web and app events unified; exclude internal users by email domain filter.
  • Status: approved

Step-by-step: create your first entries

  1. Pick scope: Choose one table and two key metrics.
  2. Draft field entries: For each column, fill name, type, definition, allowed values, examples.
  3. Draft metric entries: For each metric, write business definition, filters, SQL sketch, time grain, owner, caveats.
  4. Review with stakeholders: Confirm meanings with Data Engineering and business owners.
  5. Approve and version: Mark status approved. If you change logic later, create v2 and deprecate v1 with a sunset date.
  6. Publish and use: Reference the glossary in dashboards and pull requests.

Exercises

These mirror the exercises below. Do them directly in your notes or tool of choice.

Exercise 1: Mini Data Dictionary (ex1)

Dataset context: ecommerce with tables customers, orders, order_items, products, events.

  • Document at least 6 fields across the tables.
  • Include: name, table, type, definition, allowed values or example, owner, status.
Quality checklist for ex1
  • Names use lowercase_with_underscores.
  • Definitions are business-friendly, not just technical.
  • Each field has an example or allowed values.
  • Owner and status provided.

Exercise 2: Two Metrics, Zero Ambiguity (ex2)

Define the following metrics with precision:

  • gross_revenue_usd
  • conversion_rate_session_to_order

For each metric, include: business definition, filters/exclusions, SQL sketch, time grain, owner, caveats.

Quality checklist for ex2
  • Includes time grain and timezone.
  • Join keys and distinctness are explicit.
  • Edge cases handled (cancelled orders, test traffic, refunds).

Common mistakes and how to self-check

  • Ambiguous names: Fix by adding units or windows (e.g., revenue_usd_daily).
  • Missing filters: Always call out exclusions (cancelled orders, test users).
  • Double counting: State the grain and when to use distincts.
  • No owner: Assign a responsible person/team to every entry.
  • Stale entries: Add review dates or freshness notes and deprecate outdated definitions.
Self-check prompts
  • Could two analysts read this and produce the same SQL?
  • Does the dashboard number match when using this definition?
  • Are units, windows, and time zones explicit?

Practical projects

  • Project 1: Document a sales table and three key fields, then get sign-off from a business partner.
  • Project 2: Standardize "Revenue" across two teams and publish a single approved metric.
  • Project 3: Build a mini glossary for user engagement (DAU, WAU, Active Users 7d) and add caveats for bots/internal users.

Learning path

  • Start: Data Dictionary basics (this lesson) and one approved metric.
  • Next: Governance (statuses, review cadence, versioning).
  • Then: Modeling and metrics layer concepts (grains, joins, slowly changing dimensions, deduping).
  • Finally: Dashboard specifications that reference glossary IDs to keep alignment.

Who this is for

  • BI Analysts, Data Analysts, and Analytics Engineers who need consistent, trusted reporting.

Prerequisites

  • Basic SQL and knowledge of your data sources.
  • Understanding of business KPIs in your domain.

Next steps

  • Complete the exercises and self-check.
  • Take the quick test to validate your understanding.

Quick Test and progress note

The quick test is available to everyone. Only logged-in users will have their progress saved.

Mini challenge

Pick one metric used by multiple teams that often disagrees (e.g., Active Users). Write two competing definitions you find in the wild, then reconcile them into one approved entry with explicit filters, time grain, and ownership. Mark the old version deprecated with a planned removal date.

Practice Exercises

2 exercises to complete

Instructions

Create a concise Data Dictionary (6–8 fields) for these tables: customers, orders, order_items, products, events.

  • For each field include: name, table, type, business definition, allowed values or example, owner, status.
  • Use lowercase_with_underscores and simple, precise language.
Expected Output
6–8 well-formed entries with clear definitions, examples/allowed values, owner, and status. Names are consistent and unambiguous.

Data Dictionary And Metric Glossary — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Data Dictionary And Metric Glossary?

AI Assistant

Ask questions about this tool