Why this matters
Conformed dimensions let different teams (Sales, Marketing, Support, Finance) analyze the same entities consistently. When Customer, Product, or Calendar dimensions are aligned, dashboards agree, joins are simple, and metrics are trusted. As a data architect, you will be asked to design and govern these shared dimensions so that data marts across teams integrate without surprises.
- Eliminate conflicting KPI numbers across dashboards.
- Enable self-serve analytics with predictable joins.
- Reduce duplicated ETL logic and schema drift.
- Support cross-domain analysis (e.g., marketing spend to sales revenue by product and region).
Real tasks you might do
- Define the canonical Customer dimension used by Sales and Support, including SCD policy.
- Create a Product dimension grain (Product vs SKU/Variant) that works for Analytics, Finance, and Inventory.
- Publish mapping tables from domain natural keys to shared surrogate keys.
- Set tests that ensure new sources don;t break conformance.
Concept explained simply
A conformed dimension is a shared lookup table (Customer, Product, Date, Geography, Channel, etc.) with the same business meaning, grain, and coding across all data marts. When two teams say Customer_Segment, they mean the same thing, computed the same way, with values that match.
- Same meaning: agreed definitions for each attribute.
- Same grain: one row per business entity at the same level (e.g., one per Customer, not per Account+Customer in one team and per Customer in another).
- Same keys: a consistent surrogate key (or a stable mapping to one).
- Same SCD policy: identical change-tracking rules (e.g., Type 2 for demographics).
Common conformance patterns:
- Central canonical dimension: publish a single golden dimension with surrogate keys and SCD columns. All marts use it.
- Distributed with mappings: domains keep their own dimensions but publish mapping tables to a shared reference dimension. Consumers join through the mapping.
- Semantic aliasing: one physical dimension can be used in multiple roles (e.g., Order Date, Ship Date) via role-playing.
Mental model
Think of conformed dimensions like a shared dictionary. Each team might speak a dialect (their source system), but they translate to the same dictionary entries before analysis. The surrogate key is like a passport numberstable, unique, and recognized everywhere.
Quick DOs and DONTs
- Do fix the grain and SCD policy up front.
- Do publish mapping tables and data contracts.
- Do version attribute definitions when meaning changes.
- Dont rely on natural keys across teams; they drift.
- Dont standardize just column names; standardize meaning and values.
Worked examples
Example 1: Customer (Sales CRM + Support)
Goal: Sales and Support should report by the same Customer and Segment.
-- Canonical dimension (Type 2)
CREATE TABLE dim_customer (
customer_sk BIGINT PRIMARY KEY,
universal_id TEXT, -- business key after ID resolution
source_bk_hash TEXT, -- hash of contributing natural keys
name TEXT,
email_normalized TEXT,
segment TEXT, -- conformed values: Enterprise/SMB/Consumer
country_code TEXT, -- ISO-3166
effective_from TIMESTAMP,
effective_to TIMESTAMP,
is_current BOOLEAN
);
-- Mapping from CRM and Support to the conformed surrogate key
CREATE TABLE map_crm_customer_to_dim (
crm_customer_id TEXT,
customer_sk BIGINT,
valid_from TIMESTAMP,
valid_to TIMESTAMP
);
CREATE TABLE map_support_user_to_dim (
support_user_id TEXT,
customer_sk BIGINT,
valid_from TIMESTAMP,
valid_to TIMESTAMP
);
-- Consumer join pattern
SELECT f.fact_id, d.segment, d.country_code
FROM f_sales AS f
JOIN map_crm_customer_to_dim m ON m.crm_customer_id = f.crm_customer_id AND f.order_ts BETWEEN m.valid_from AND m.valid_to
JOIN dim_customer d ON d.customer_sk = m.customer_sk AND d.is_current = TRUE;
Notes:
- Type 2 used so historical segment changes are preserved.
- Mapping tables are time-bounded to match historical facts correctly.
- Segment values come from a definition doc and are computed uniformly.
Example 2: Product (Catalog + Inventory)
Decision: Choose grain. Analytics wants per SKU (variant), Finance wants per Product (parent). Strategy: Conform at SKU grain and publish a parent mapping for rollups.
CREATE TABLE dim_sku (
sku_sk BIGINT PRIMARY KEY,
sku_code TEXT, -- natural key in catalog
product_id TEXT, -- parent product
product_name TEXT,
variant_color TEXT,
variant_size TEXT,
status TEXT, -- Active/Discontinued (conformed)
effective_from TIMESTAMP,
effective_to TIMESTAMP,
is_current BOOLEAN
);
CREATE TABLE map_inventory_sku_to_dim (
inv_item_id TEXT,
sku_sk BIGINT,
valid_from TIMESTAMP,
valid_to TIMESTAMP
);
-- Roll up to product when needed
SELECT d.product_id, SUM(f.units) AS units
FROM f_shipments f
JOIN map_inventory_sku_to_dim m ON m.inv_item_id = f.inv_item_id AND f.ship_ts BETWEEN m.valid_from AND m.valid_to
JOIN dim_sku d ON d.sku_sk = m.sku_sk AND d.is_current = TRUE
GROUP BY d.product_id;
Notes:
- Conform at the lowest common useful grain (SKU). Provide rollups for Product.
- Ensure status codes align across catalog and inventory.
Example 3: Calendar (Gregorian + Fiscal)
Some teams report by fiscal year; others by calendar. Strategy: One Date dimension with role-playing columns or separate conformed dimensions with a shared key per date.
CREATE TABLE dim_date (
date_sk INT PRIMARY KEY,
date DATE,
day_of_week INT,
month INT,
quarter INT,
year INT,
fiscal_month INT,
fiscal_quarter INT,
fiscal_year INT,
iso_week INT
);
-- Role-playing in marts: order_date_sk, ship_date_sk, invoice_date_sk all refer to dim_date.date_sk
Notes:
- Align fiscal definitions across teams; if not possible, version them (e.g., fiscal_calendar = A vs B) and document.
How to implement
- Set scope and grain
Decide the business entity and level
- Customer: person/company? one row per customer, not per account x customer.
- Product: SKU or parent product? Choose lowest useful level and provide rollups.
- Date: single date row re-used via role-playing keys.
- Define attribute semantics
Write unambiguous definitions
- Segment: Enterprise (1000+ employees), SMB (50-999), Consumer (<50).
- Status: Active if available for sale; Discontinued if retired; Preorder allowed? Define.
- Country: ISO-3166 alpha-2.
- Choose SCD policy
Pick Type 1/2 per attribute
- Type 2: changes you want to analyze historically (segment, region).
- Type 1: corrections/fixes (typos).
- Keying and ID resolution
Stabilize joins
- Assign surrogate keys in the conformed dimension.
- Create mapping tables from each domains natural keys to the surrogate key.
- For person/company, add identity resolution (email normalization, dedupe rules).
- Publish contract and tests
Make it safe to consume
- Publish schema, attribute definitions, allowed values, SCD policy, refresh cadence.
- Add data tests: uniqueness, nullability, valid codes, mapping coverage >= target.
- Change management
Version attributes and communicate
- When a definition changes meaning, version it (segment_v2) and provide a migration plan.
- Announce deprecations with dates.
Governance and change
- Stewardship: name owners for Customer, Product, Date.
- Approval: changes to grain, SCD, or definitions require review.
- SLAs: refresh frequency, data delay expectations, and uptime targets.
- Observability: freshness checks, row count stability, distribution drift alerts.
Exercises
Practice mirrors the graded exercise below. You can do it offline, then compare with the provided solution.
Exercise 1: Conform Customer across CRM and Support
Design a conformed Customer dimension and mapping tables so that Sales (CRM) and Support (Ticketing) report the same segments.
- Choose grain and SCD policy.
- Define 6-8 attributes with clear meanings and allowed values.
- Propose table DDL for dim_customer and mapping tables for both systems.
- Show a sample SQL join pattern for both facts (orders and tickets).
- Describe data tests to ensure conformance (coverage, codes, uniqueness).
Checklist
- One row per customer at chosen grain.
- SCD Type 2 for analytical attributes.
- Mapping tables with time bounds.
- Uniform segment rule and country codes.
- Coverage test target (e.g., > 98%).
Sample solution (short)
- Grain: customer (company), SCD2 for segment, region; Type1 for name fixes.
- dim_customer columns: customer_sk, universal_id, name, segment, region, country_code, effective_from, effective_to, is_current.
- map_crm_customer_to_dim(crm_customer_id, customer_sk, valid_from, valid_to), map_support_user_to_dim(support_user_id, customer_sk, valid_from, valid_to).
- Join: fact x mapping (time-bounded) x dim_customer (current or historical as needed).
- Tests: unique(customer_sk), not null keys, allowed values for segment, coverage >= 98% for both mappings, drift alert on segment distribution.
Common mistakes and self-checks
- Mistake: Same column name, different meaning. Fix: Publish definitions and examples; add data tests on value sets.
- Mistake: Mismatched grain (SKU vs Product). Fix: Pick a lowest common grain and provide rollups.
- Mistake: Inconsistent SCD policy. Fix: Decide per attribute; enforce via modeling and tests.
- Mistake: Joining with natural keys across teams. Fix: Use surrogate keys via mapping tables.
- Mistake: Missing time bounds in mappings. Fix: Include valid_from/valid_to and join facts by timestamp.
Self-check prompts
- Do two dashboards using different marts show the same totals when sliced by the conformed dimension?
- Is mapping coverage high and monitored?
- Are attribute distributions stable, or did something drift?
Mini challenge
Youre asked to add Channel (Web, Retail, Partner) across Marketing and Sales. Marketing has 12 detailed sources; Sales has 3. In two steps, propose: (1) the conformed values and mapping strategy; (2) tests you will add to prevent unmapped new channels from slipping through.
Hint
Introduce a conformed set of 5-7 values, add a channel mapping table with a not-null, foreign-key-like validation and a daily unmapped alert.
Who this is for
- Data Architects designing shared analytics models.
- Analytics Engineers building cross-team marts.
- Data Stewards defining and governing shared entities.
Prerequisites
- Basic dimensional modeling (facts, dimensions, SCD types).
- SQL joins and surrogate vs natural keys.
- Understanding of your organizations source systems (at a high level).
Learning path
- Review dimensional modeling basics and SCD types.
- List candidate conformed dimensions (Customer, Product, Date, Geography, Channel).
- Draft grain and attribute definitions; align with stakeholders.
- Build one conformed dimension + mapping; add tests.
- Roll out to more domains and track adoption.
Practical projects
- Implement a conformed Customer dimension and integrate it into two existing marts.
- Refactor a Product dimension to SKU grain and add a parent-product rollup bridge.
- Add a conformed Channel dimension with automated unmapped-value alerts.
Next steps
- Introduce a semantic layer to standardize metric definitions on top of conformed dimensions.
- Automate data contracts and validation for mapping tables.
- Set SLAs and observability for conformed dimensions (freshness, coverage, drift).
About the Quick Test
The Quick Test is available to everyone for free. Log in to save your progress and track completion across lessons.