Topic Not Found

Who this is for

Data Architects and Analytics Engineers defining enterprise data models.
Data Engineers implementing star schemas in warehouses or lakehouses.
BI Developers building semantic layers and dashboards.

Prerequisites

Basic understanding of star schemas (facts vs. dimensions).
Comfort with SQL DDL/DML (CREATE TABLE, INSERT, MERGE/UPSERT).
Familiarity with ETL/ELT pipelines and data quality checks.

Why this matters

Dimensions and hierarchies are how people slice and drill into data. If you get them right, analysts can answer questions quickly and consistently. As a Data Architect, you will:

Define conformed dimensions used across multiple fact tables.
Model slowly changing dimensions (SCD) to preserve historical truth.
Create usable hierarchies (e.g., Date: Year > Quarter > Month > Day; Product: Category > Subcategory > Product).
Handle ragged and many-to-many relationships without breaking reports.

Concept explained simply

A dimension is a descriptive table about a business entity (customer, product, date, region). It has:

A surrogate key (integer generated by the warehouse) to join facts reliably.
Attributes (names, types, labels) used to filter and group metrics.
Optionally, history tracking (SCD) for changes over time.

A hierarchy is an ordered set of attributes that form drill paths (e.g., Country > State > City). Hierarchies can be:

Balanced (every branch has the same depth) or ragged (some levels missing).
Simple level-based columns or parent-child structures with helper columns.

Mental model

Think of a dimension like a well-organized card catalog. Each card has a stable ID (surrogate key), a real-world reference (natural key), helpful labels (attributes), and sometimes previous versions (history). A hierarchy is like the folder structure that lets you drill down from broad to specific. Your job is to make these cards and folders consistent, easy to navigate, and durable over time.

Core building blocks

Surrogate key (e.g., product_key): warehouse-generated integer; never changes.
Natural/business key (e.g., product_code): the source identifier; may change or be reused.
Conformed dimension: shared across facts (e.g., Date, Product, Customer).
SCD Type 1: overwrite attributes; no history (for corrections/typos).
SCD Type 2: insert new row with effective dates and current flag; keeps history.
Default/Unknown members: include records like -1 "Unknown" to avoid join breaks.
Hierarchies: level-based columns (year, quarter, month) or parent-child with helper columns (parent_key, path, depth, is_leaf).
Bridge tables: resolve many-to-many or variable-depth scenarios (e.g., product with multiple categories) with weighting if needed.

Quick reference: When to use which SCD

Type 1: Fix data quality (name casing), non-analytical corrections.
Type 2: Preserve analytical truth (price changes, segment changes).
Hybrid: Type 2 for critical attributes; Type 1 for cosmetic ones.

Worked examples

Example 1 — Date dimension with hierarchy

Goal: Create a reusable date dimension with common attributes and a clear drill path.

Hierarchy: year > quarter > month > day
Attributes: day_of_week, week_of_year, is_weekend, fiscal variants if needed

Sample DDL

-- Dimension: dim_date
-- Include an "unknown" row (-1)
CREATE TABLE dim_date (
  date_key INT PRIMARY KEY, -- e.g., 20250131
  date_value DATE NOT NULL,
  year INT NOT NULL,
  quarter INT NOT NULL,
  month INT NOT NULL,
  day INT NOT NULL,
  day_name VARCHAR(10),
  week_of_year INT,
  is_weekend BOOLEAN,
  fiscal_year INT,
  fiscal_quarter INT
);

Tips

Pre-populate date rows for a wide range (e.g., 20 years).
Use numeric keys like YYYYMMDD for efficient joins to facts.

Example 2 — Customer dimension (hybrid SCD)

Goal: Track customer segment changes over time (Type 2), but overwrite minor formatting fixes (Type 1).

Sample DDL

CREATE TABLE dim_customer (
  customer_key BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  customer_id VARCHAR(64) NOT NULL, -- natural key from CRM
  customer_name VARCHAR(255),
  email VARCHAR(255), -- Type 1
  segment VARCHAR(50), -- Type 2
  country VARCHAR(50), -- Type 2
  effective_from DATE NOT NULL,
  effective_to DATE,
  is_current BOOLEAN NOT NULL DEFAULT TRUE
);
CREATE UNIQUE INDEX ux_customer_nk_version
  ON dim_customer(customer_id, effective_from);

Sample upsert logic (pseudocode)

-- If Type 2 attributes changed, close current row and insert new version
IF existing.is_current AND (existing.segment != incoming.segment OR existing.country != incoming.country) THEN
  UPDATE dim_customer
    SET effective_to = today - 1, is_current = FALSE
    WHERE customer_key = existing.customer_key;
  INSERT INTO dim_customer (..., effective_from=today, effective_to=NULL, is_current=TRUE);
ELSE
  -- Type 1 changes
  UPDATE dim_customer SET email = incoming.email WHERE customer_key = existing.customer_key;
END IF;

Example 3 — Product hierarchy (ragged + parent-child)

Goal: Support Category > Subcategory > Product, with some categories skipping levels and some products belonging to multiple categories.

Design options

Level-based columns in dim_product: category_name, subcategory_name, product_name (simple; fast for BI).
Parent-child structure in a separate dim_category with helper columns: category_key, parent_category_key, level_depth, hierarchy_path.
Bridge table product_category_bridge(product_key, category_key, weight_ratio) for many-to-many and weighted attribution.

Sample DDL

CREATE TABLE dim_product (
  product_key BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  product_code VARCHAR(64) NOT NULL,
  product_name VARCHAR(255),
  category_name VARCHAR(255),
  subcategory_name VARCHAR(255),
  -- optional SCD fields if names change materially
  effective_from DATE,
  effective_to DATE,
  is_current BOOLEAN
);
CREATE TABLE dim_category (
  category_key BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  category_id VARCHAR(64) NOT NULL,
  category_name VARCHAR(255) NOT NULL,
  parent_category_key BIGINT NULL,
  level_depth INT NOT NULL,
  hierarchy_path VARCHAR(4000), -- e.g., '/10/42/'
  is_leaf BOOLEAN
);
CREATE TABLE product_category_bridge (
  product_key BIGINT NOT NULL,
  category_key BIGINT NOT NULL,
  weight_ratio DECIMAL(5,4) DEFAULT 1.0 -- optional for allocation
);

Reporting tip

Provide both: easy level columns in dim_product for most BI use-cases, plus the category dimension + bridge for advanced analyses.

Step-by-step: Build a dimension and hierarchy

Define the business grain: one row per what? (e.g., per product, per customer version, per calendar date)
Choose keys: add a surrogate key; preserve natural key; add an Unknown row (-1).
Select attributes: include names, codes, and analysis-friendly flags.
Decide SCD strategy: Type 1, Type 2, or hybrid. Identify which columns drive new versions.
Design hierarchies: level columns or parent-child; plan for ragged cases.
Add audit fields: effective_from/to, is_current, load_ts, source_system.
Populate and validate: load sample data; test joins to facts; check hierarchy rollups.

Self-check checklist

One clear business grain per dimension.
Surrogate key is stable and never reused.
Natural key uniqueness is enforced per version window.
Unknown/default member exists and is used by ETL on missing keys.
Hierarchy has no cycles; ragged paths handled gracefully (NULLs or parent-child).
SCD rules documented and tested.

Exercises

Do them in your own SQL environment or on paper. Then compare with the suggested solution.

Exercise 1 — Geography dimension with ragged hierarchy

Design dim_geography to support Country > Region/State > City. Some countries have no state. Use Type 1 for City name corrections; Type 2 for Region boundary changes.

Deliver: DDL, unknown member, example rows for two countries (one with states, one without).
Include a clear drill path Country > Region > City in level columns.

Hints

Denormalize levels into one table for BI simplicity; allow NULL for region when not applicable.
Use effective_from/to and is_current for Type 2 attributes.

Exercise 2 — Subscription plan (hybrid SCD)

Build dim_subscription where price changes are Type 2, description changes are Type 1.

Deliver: DDL, sample upsert (pseudocode), and two example versions for the same plan_code with different prices.

Hints

Plan_code is the natural key; subscription_key is surrogate.
Close the current row before inserting a new Type 2 version.

Common mistakes and how to self-check

Mixing grains: e.g., one row per customer and sometimes per customer version. Self-check: Does each row represent exactly one business instance?
Relying on natural keys in facts: they can change. Self-check: Facts always join via surrogate keys.
No Unknown member: leads to broken joins. Self-check: Insert -1 Unknown rows and use them in ETL fallback logic.
Overusing SCD Type 2: unnecessary bloat. Self-check: Only version attributes that affect analytics.
Ragged hierarchies causing NULL rollups. Self-check: BI layer handles NULLs, or use parent-child with helper columns.
Missing uniqueness checks. Self-check: Unique constraints on (natural_key, effective_from) or current-window logic.

Practical projects

Retail mart: dim_date, dim_store (with Country > Region > City), dim_product + product_category_bridge; connect to a sales fact and validate drilldowns.
SaaS mart: dim_customer (Type 2 on segment), dim_subscription (Type 2 on price), fact_billing; verify historical revenue by segment.
Org hierarchy: dim_employee with parent_employee_key and hierarchy_path; create a headcount fact and test rollups by department.

Learning path

Before: Star schema basics, identifying facts and conformed dimensions.
Now: Building dimensions and hierarchies (this lesson).
Next: Fact table design, degenerate dimensions, and bridge tables for allocations.

Mini challenge

Given a Product that can belong to multiple Subcategories, design the minimal set of tables/columns to support both simple drilldowns and many-to-many attribution. List the fields and describe how a dashboard would roll up revenue by Category.

One possible approach

Keep level columns in dim_product for easy drill (category_name, subcategory_name), and use product_category_bridge for many-to-many attribution with weight_ratio. Dashboards use dim_product for default drill; advanced reports join the bridge for weighted rollups.

Next steps

Review your current dimensions: add Unknown rows and document SCD rules.
Refactor one hierarchy to be explicit (either level-based or parent-child with helper columns).
Prepare test data to validate drilldowns and SCD behavior.

Ready to test yourself?

Take the quick test below to check understanding. Available to everyone; only logged-in users get saved progress.

Menu

Building Dimensions And Hierarchies

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Mental model

Core building blocks

Worked examples

Example 1 — Date dimension with hierarchy

Example 2 — Customer dimension (hybrid SCD)

Example 3 — Product hierarchy (ragged + parent-child)

Step-by-step: Build a dimension and hierarchy

Exercises

Exercise 1 — Geography dimension with ragged hierarchy

Exercise 2 — Subscription plan (hybrid SCD)

Common mistakes and how to self-check

Practical projects

Learning path

Mini challenge

Next steps

Ready to test yourself?

Practice Exercises

Geography dimension with ragged hierarchy

Instructions

Expected Output

Subscription plan (hybrid SCD)

Building Dimensions And Hierarchies — Quick Test

Have questions about Building Dimensions And Hierarchies?

AI Assistant