Why this matters
Analytics Engineers often build multiple star schemas for different teams (Sales, Marketing, Finance). Conformed dimensions let all these marts speak the same language: a single Customer, Product, Date, or Geography definition that works everywhere. This enables consistent metrics, easy cross-mart joins, and trustworthy dashboards.
- Real tasks: unifying customer identities across CRM and app events.
- Real tasks: sharing a single Product hierarchy across Sales and Marketing.
- Real tasks: validating that Order Date and Ship Date both reuse the same Date dimension (role-playing).
Concept explained simply
A conformed dimension is a dimension table that is built once and reused by multiple fact tables and marts without changing its meaning. It preserves the same keys, attributes, and definitions everywhere.
Mental model
Think of a conformed dimension as a passport. No matter which country (mart) you visit, your passport is recognized. The unique surrogate key is the passport number; the person is the business entity; visas are the slowly changing dimension (SCD) history rows.
Core principles
- Same grain and meaning: a Customer row always represents the same level (e.g., one real-world customer) across marts.
- Shared surrogate key: the same surrogate key is used to join any fact to the conformed dimension.
- Stable business key: use a reliable business key or crosswalk to link multiple source systems to the same entity.
- Unified SCD strategy: choose and document SCD Type 1 vs Type 2 rules; apply consistently across marts.
- Consistent attribute definitions: names, data types, allowed values, and hierarchies match everywhere.
- Shrunken dimensions: smaller, filtered or rolled-up versions that remain conformed to the base dimension.
- Role-playing: reuse the same dimension for different roles (Order Date, Ship Date) via views or aliases.
- Governance: tests to ensure uniqueness, non-null keys, and aligned counts across marts.
Worked examples
Example 1: Conformed Customer for Sales and Support
Scenario: CRM has customers by email; Support has accounts by domain. We need one Customer dimension for Sales fact_sales and Support fact_tickets.
- Business key: unified by a crosswalk mapping CRM customer_id and Support account_id to a canonical customer_uuid derived from verified email domain + dedup rules.
- Surrogate key: dim_customer_sk assigned centrally and reused across marts.
- SCD: Type 2 for changes in industry or segment; Type 1 for corrected typos.
dim_customer (sk, customer_uuid, name, email_domain, segment, industry, is_active, scd_valid_from, scd_valid_to, scd_is_current)
Both fact tables join on dim_customer_sk, delivering consistent segmentation across Sales and Support reports.
Example 2: Conformed Product for Sales and Marketing with a shrunken dimension
Scenario: Sales needs full Product attributes; Marketing only needs Category and Brand.
- Base: dim_product with full attributes and stable surrogate key.
- Shrunken: dim_product_mkt with only product_sk, category, brand. It remains conformed because it reuses product_sk and definitions.
- Marketing fact_campaigns joins to dim_product_mkt on product_sk.
dim_product (product_sk, product_code, name, brand, category, subcategory, launch_date, ...)
dim_product_mkt (product_sk, brand, category)
Both marts use the same category definitions, avoiding mismatches.
Example 3: Conformed Date and Geography across Finance and Warehouse Ops
Date is a classic conformed dimension. Finance fact_pnl and Ops fact_shipments both use the same dim_date. Geography is also conformed: ISO country codes, region rollups, and names match across marts.
dim_date (date_sk, date, year, quarter, month, week, day_of_week)
dim_geo (geo_sk, iso_country_code, country_name, region, subregion)
Result: calendars and regions remain consistent so cross-mart comparisons are valid.
Design patterns you can use
Central golden dimension + published views
- Build a single golden dimension table (e.g., dim_customer) in a shared schema.
- Publish role-specific views (e.g., sales.dim_customer_v, support.dim_customer_v) that select the same keys/attributes.
- Keep surrogate key generation in one place.
Crosswalk mapping for multiple source systems
Create a crosswalk table mapping source business keys to a canonical ID.
customer_crosswalk (
canonical_customer_uuid,
source_system, source_customer_id,
match_confidence
)
Use the canonical ID to assign the dimension surrogate key.
SCD alignment (Type 1 vs Type 2)
- Type 2 for meaningful history (segment, industry, region).
- Type 1 for corrections (spelling, casing).
- Apply the same logic in every mart that consumes the dimension.
Shrunken dimension
Create a smaller dimension for specific marts using the same surrogate key and a subset of attributes.
Role-playing dimension
Use the same dim_date as order_date, ship_date via separate foreign keys or views.
Step-by-step implementation
- List subject areas: build a simple bus matrix of facts vs dimensions to identify shared dims.
- Define the grain: one row per real-world entity (customer, product, date, geo level).
- Choose business keys: if multiple, create a canonical key + crosswalk.
- Decide SCD strategy: document Type 1/2 columns and rules.
- Model the golden dimension: columns, datatypes, conformed attribute names.
- Load process: deduplicate, assign surrogate keys, maintain SCD history.
- Publish to marts: either reference the golden table or create conformed views/shrunken dims.
- Test and monitor: uniqueness, not nulls, cross-mart key reuse, row count expectations.
SQL-style sketches
Conformed Customer (SCD2) skeleton
-- Skeleton only; adapt to your warehouse dialect
CREATE TABLE dim_customer (
customer_sk BIGINT PRIMARY KEY,
customer_uuid STRING NOT NULL,
name STRING,
email_domain STRING,
segment STRING,
industry STRING,
is_active BOOLEAN,
scd_valid_from TIMESTAMP,
scd_valid_to TIMESTAMP,
scd_is_current BOOLEAN
);
-- Fact joining example
-- fact_sales.customer_sk -> dim_customer.customer_sk
Shrunken Product dimension view
CREATE VIEW dim_product_mkt AS
SELECT product_sk, brand, category
FROM dim_product;
Basic conformance tests
-- No duplicate surrogate keys
dselect customer_sk, count(*) c from dim_customer group by 1 having c > 1;
-- Exactly one current row per business key
select customer_uuid, count_if(scd_is_current) c
from dim_customer
group by 1
having c != 1;
-- Cross-mart foreign keys must exist
select f.customer_sk
from fact_sales f
left join dim_customer d on d.customer_sk = f.customer_sk
where d.customer_sk is null;
Common mistakes and self-checks
- Mistake: Different attribute definitions per mart (e.g., different category mappings). Fix: Maintain a single mapping table used by all marts.
- Mistake: Multiple surrogate key generators. Fix: Centralize SK assignment; never generate SKs in each mart separately.
- Mistake: Inconsistent SCD rules. Fix: Document and apply the same Type 1/2 logic.
- Mistake: Joining facts to different customer grains. Fix: Verify grain consistency and business key mapping.
- Mistake: Silent dimension drift. Fix: Add automated tests to detect new nulls, duplicate SKs, or broken crosswalks.
Self-check checklist
- [ ] One golden dimension per entity (Customer/Product/Date/Geo)
- [ ] Surrogate keys unique and reused across marts
- [ ] SCD rules documented and enforced
- [ ] Crosswalks for multi-source keys
- [ ] Tests for uniqueness, not-null FKs, conformed attributes
- [ ] Shrunken and role-playing dimensions reference the same SKs
Exercises
Do these to solidify skills. The quick test is available to everyone; logged-in users will have their progress saved.
Exercise 1: Build a conformed Product dimension from two sources
Goal: Map ERP products (by product_code) and eCommerce products (by sku) to a single dim_product with a shared surrogate key and consistent Category hierarchy.
- Create a crosswalk that maps ERP.product_code and ECO.sku to a canonical product_uuid.
- Design dim_product with product_sk, product_uuid, product_code, sku, name, brand, category, subcategory, scd columns.
- Define SCD: Type 2 for category/subcategory; Type 1 for name corrections.
- Produce a shrunken dim_product_mkt with product_sk, brand, category.
Hints
- Start by standardizing category values (e.g., using a mapping table).
- Use the same surrogate key for all marts; do not regenerate per mart.
Expected output
A populated dim_product with one current row per product_uuid and a dim_product_mkt view that selects product_sk, brand, category.
Show solution
-- Crosswalk
CREATE TABLE product_crosswalk AS
SELECT coalesce(erp.product_code, eco.sku) as product_uuid,
erp.product_code,
eco.sku
FROM erp_products erp
FULL OUTER JOIN eco_products eco
ON erp.product_code = eco.product_code_alt
;
-- Golden dimension (sk generated via sequence/identity)
CREATE TABLE dim_product (
product_sk BIGINT,
product_uuid STRING NOT NULL,
product_code STRING,
sku STRING,
name STRING,
brand STRING,
category STRING,
subcategory STRING,
scd_valid_from TIMESTAMP,
scd_valid_to TIMESTAMP,
scd_is_current BOOLEAN
);
-- Shrunken dimension
CREATE VIEW dim_product_mkt AS
SELECT product_sk, brand, category FROM dim_product WHERE scd_is_current;
Exercise 2: Conform Customer across CRM and Support
Goal: Produce dim_customer and ensure fact_sales and fact_tickets both join to the same surrogate keys.
- Define customer_uuid using a deduped, verified email-based identity or account domain.
- Load dim_customer with SCD2 for segment and industry.
- Validate: both fact tables have zero orphan customer_sk.
Hints
- Ensure exactly one current row per customer_uuid.
- Use a small validation query to find orphan keys.
Expected output
dim_customer with unique customer_sk and one current row per business key; validation returns zero orphans.
Show solution
-- Crosswalk
CREATE TABLE customer_crosswalk AS
SELECT canonical_uuid as customer_uuid, 'CRM' as source_system, crm_id as source_id FROM crm_customers
UNION ALL
SELECT canonical_uuid, 'SUPPORT', acct_id FROM support_accounts;
-- Golden dimension
CREATE TABLE dim_customer (
customer_sk BIGINT,
customer_uuid STRING NOT NULL,
name STRING,
email_domain STRING,
segment STRING,
industry STRING,
scd_valid_from TIMESTAMP,
scd_valid_to TIMESTAMP,
scd_is_current BOOLEAN
);
-- Validation
SELECT f.customer_sk
FROM fact_sales f
LEFT JOIN dim_customer d ON d.customer_sk = f.customer_sk
WHERE d.customer_sk IS NULL;
Exercise checklist
- [ ] One golden dimension per entity
- [ ] Consistent surrogate key reused in all marts
- [ ] SCD rules applied as defined
- [ ] Shrunken/role-playing dims built from the golden dim
- [ ] Validation queries pass with zero orphans
Mini challenge
Marketing wants to roll subcategory into category for simpler reporting, but Sales needs both. Propose a design that keeps both teams happy and preserves conformance. Write 3–5 sentences describing which tables you will create or views you will publish, how keys are reused, and how you will test that both marts still align on brand and category.
Who this is for
- Analytics Engineers building multiple star schemas.
- BI Developers who need consistent dimensions across dashboards.
- Data Modelers standardizing entity definitions.
Prerequisites
- Basic star schema concepts (facts, dimensions, grain).
- Comfort with SQL joins and window functions.
- Familiarity with Slowly Changing Dimensions (Type 1 and Type 2).
Learning path
- Star schema fundamentals.
- SCD patterns and implementation.
- Conformed dimensions across marts (this lesson).
- Data quality testing and monitoring for dimensional models.
Practical projects
- Retail: Build conformed Product and Date dims used by Sales and Marketing marts.
- SaaS: Unify Customer across CRM, billing, and app events; validate metrics across marts.
- Geo reporting: Standardize dim_geo with ISO codes and reuse across Finance and Ops.
Next steps
- Harden your conformance tests and add them to CI for every dimension change.
- Create a short data dictionary for shared dimensions (definitions, SCD rules, allowed values).
- Take the quick test to check understanding. The test is available to everyone; logged-in users will have their progress saved.