What is Metadata Management for BI Developers?
Metadata management is the practice of defining, organizing, and maintaining information about your data assets—what fields mean, where data comes from, how KPIs are calculated, who owns reports, and which datasets are certified. For a BI Developer, strong metadata makes dashboards trustworthy, speeds up development, and reduces rework.
Why it matters
- Build reliable dashboards: consistent KPI definitions and clear field descriptions prevent contradictory numbers.
- Debug faster: documented lineage shows where a number originated and what transformed it.
- Improve governance: certified datasets and report ownership reduce risk and duplication.
- Onboard quickly: a data dictionary and catalog let teammates self-serve.
Who this is for
- BI Developers and Analytics Engineers building reports, models, and semantic layers.
- Data Analysts who need consistent metrics and clear glossary terms.
- Team leads standardizing KPIs and certifying datasets.
Prerequisites
- SQL: joins, aggregations, window functions.
- Basic data modeling concepts: staging vs. marts, dimensions/facts.
- Familiarity with your BI tool semantics (metrics/measures, dimensions, hierarchies).
Learning path (roadmap)
- Start a baseline catalog
Outcome: A minimal data dictionary and report catalog.- Create a data dictionary for top 10 tables and top 50 fields used in reporting.
- Document owners and refresh cadence for critical reports.
- Standardize language
Outcome: Shared glossary and naming rules.- Publish a glossary of core business terms and synonyms.
- Adopt field naming conventions and apply them in new models.
- Lock in KPI definitions
Outcome: Versioned KPI specs and calculations.- Write one canonical definition per KPI, including filters, grain, and edge cases.
- Refactor dashboards to use the canonical metric wherever possible.
- Annotate models and lineage
Outcome: Model descriptions and basic lineage maps.- Add column-level descriptions and model purpose notes.
- Sketch lineage from sources to marts and key dashboards.
- Certify critical datasets
Outcome: A certification workflow and badges in BI. - Operationalize updates
Outcome: Review cadence, ownership, and change logs for metadata.
Milestones checklist
- At least 80% of dashboard KPIs mapped to a single definition.
- Data dictionary covers the top 20 revenue-impacting tables.
- Lineage diagrams exist for all executive dashboards.
- Certified datasets identified and labeled in BI.
- Named owners for every published report.
Worked examples (copy, adapt, reuse)
1) Create a simple Data Dictionary table in your warehouse
This stores descriptions, owners, and freshness for fields.
-- Table to store column-level metadata
create table if not exists analytics_meta.data_dictionary (
schema_name varchar,
table_name varchar,
column_name varchar,
data_type varchar,
description varchar,
pii boolean,
owner_team varchar,
last_reviewed_at timestamp,
primary key (schema_name, table_name, column_name)
);
insert into analytics_meta.data_dictionary
(schema_name, table_name, column_name, data_type, description, pii, owner_team, last_reviewed_at)
values
('mart_sales','fct_orders','order_id','number','Unique order identifier',false,'BI',current_timestamp),
('mart_sales','fct_orders','order_status','varchar','Finalized status at time of fulfillment',false,'BI',current_timestamp),
('mart_sales','dim_customer','email','varchar','Customer email address',true,'Security',current_timestamp);
Tip: Keep descriptions short, action-oriented, and unambiguous.
2) Canonical KPI definition (YAML + SQL)
Write the business spec in YAML and implement in SQL.
# metrics.yml
metrics:
- name: gross_revenue
label: Gross Revenue
calculation: sum(unit_price * quantity)
filters:
- field: order_status
include: ["completed", "shipped"]
grain: daily
currency: USD
edge_cases:
- exclude test orders where is_test = true
- refunds not netted here (see net_revenue)
-- metric implementation view
create or replace view mart_sales.metrics__gross_revenue as
select
order_date::date as metric_date,
sum(unit_price * quantity) as gross_revenue
from mart_sales.fct_orders
where order_status in ('completed','shipped')
and coalesce(is_test,false) = false
group by 1;
Now every dashboard uses mart_sales.metrics__gross_revenue, ensuring consistency.
3) Field naming conventions
Adopt predictable, machine-friendly names: lower_snake_case; suffixes like _id for keys; _at for timestamps; avoid ambiguous abbreviations.
-- Example: rename ambiguous columns to conventions
create or replace view mart_sales.fct_orders_v2 as
select
id as order_id,
created as created_at,
custId as customer_id,
amt_total as total_amount,
currency as currency_code
from raw.orders;
Document these names in your dictionary the same day you release them.
4) Model descriptions and column annotations
Keep purpose, grain, and refresh rules close to the model.
-- SQL comments at model and column level
/*
Model: mart_sales.fct_orders
Purpose: Fact table of finalized orders.
Grain: One row per order.
Refresh: Hourly incremental.
*/
create or replace table mart_sales.fct_orders as
select /* order_id: unique key */
o.order_id,
/* customer_id: FK to dim_customer */
o.customer_id,
o.order_date,
o.order_status,
o.unit_price,
o.quantity,
(o.unit_price * o.quantity) as line_amount
from staging.orders_clean o;
5) Lightweight lineage register (JSON)
Capture source-to-model relationships without a special tool.
-- Table to store lineage relationships
create table if not exists analytics_meta.lineage (
parent_fqn varchar,
child_fqn varchar,
relation varchar, -- e.g., "transforms", "aggregates"
notes varchar,
primary key (parent_fqn, child_fqn)
);
insert into analytics_meta.lineage (parent_fqn, child_fqn, relation, notes) values
('raw.orders','staging.orders_clean','transforms','Standardizes status codes'),
('staging.orders_clean','mart_sales.fct_orders','aggregates','Derives line_amount'),
('mart_sales.fct_orders','dashboards.exec_revenue','feeds','Exec revenue tiles');
Combine this with your dictionary to answer: where did this number come from?
Drills and exercises
- Pick a high-impact dashboard and list every KPI it shows. For each KPI, write a one-sentence business definition and the exact SQL grain.
- Document the top 20 columns used across the last quarter’s ad-hoc queries, including descriptions and PII flags.
- Create or update a report catalog: title, owner, contact, refresh cadence, source dataset, and certification status.
- Add a "certified" flag to at least two core semantic views and route dashboards to them.
- Sketch lineage for a critical metric from source to dashboard. Note any unclear transformations to fix.
Common mistakes and how to fix them
- Vague KPI definitions: If a KPI says "revenue" without net/gross and refund handling, expect disputes. Fix by adding filters, inclusions/exclusions, and grain.
- Descriptions out of sync: People update SQL but not the dictionary. Fix by adding metadata updates to the PR checklist and release notes.
- Inconsistent naming: Mixed casing, inconsistent suffixes. Fix by adopting a simple naming RFC and lint new models for conformance.
- Too much metadata: Bloated docs nobody reads. Focus on fields used in top reports; iterate.
- No ownership: Reports without owners become stale. Assign a named owner and backup for every asset.
Debugging tips
- Find conflicting metrics: search dashboards for duplicate KPI names pointing to different tables; consolidate to one certified view.
- Trace anomalies: walk lineage from dashboard tile to source; compare row counts at each hop to spot drops or duplicates.
- Check freshness: add a "last successful load" timestamp and alert if stale.
Mini project: BI Catalog-in-a-Box
Goal: Ship a minimal but useful metadata system for your BI environment in one week.
- Schema: Create tables for data_dictionary, lineage, report_catalog, and datasets with a certified flag.
- Seed: Load entries for your top 10 tables, 50 columns, and 15 reports.
- KPI canon: Create 5 canonical metric views with clear YAML/SQL definitions.
- Certify: Mark 3 core datasets as certified and map which dashboards must use them.
- Review loop: Add a monthly reminder to review descriptions changed in the last 30 days.
Suggested table templates
create table if not exists analytics_meta.report_catalog (
report_id varchar primary key,
title varchar,
owner varchar,
owner_contact varchar,
source_dataset varchar,
refresh_cadence varchar,
certified boolean,
last_reviewed_at timestamp
);
create table if not exists analytics_meta.datasets (
dataset_fqn varchar primary key,
purpose varchar,
grain varchar,
certified boolean,
owner_team varchar,
certified_at timestamp,
notes varchar
);
Next steps
- Adopt a "docs-first metric" habit: new metric requests require a one-paragraph spec before coding.
- Automate freshness checks and make them visible in dashboards.
- Run a quarterly KPI review with business stakeholders.
Subskills
- Data Dictionary And Glossary — Build shared language for fields and business terms.
- Consistent KPI Definitions — One calculation, one owner, one source.
- Field Naming Conventions — Predictable, searchable, enforceable names.
- Model Descriptions And Annotations — Purpose, grain, and column notes close to code.
- Tracking Data Lineage Basics — Document hops from source to dashboard.
- Report Catalog And Ownership — Who owns what, how often it refreshes.
- Managing Certified Datasets — Mark trusted sources and route dashboards to them.
- Keeping Metadata Updated — Review cadence, change logs, and ownership.