Menu

Topic 6 of 8

Building Dimensions And Hierarchies

Learn Building Dimensions And Hierarchies for free with explanations, exercises, and a quick test (for Data Architect).

Published: January 18, 2026 | Updated: January 18, 2026

Who this is for

  • Data Architects and Analytics Engineers defining enterprise data models.
  • Data Engineers implementing star schemas in warehouses or lakehouses.
  • BI Developers building semantic layers and dashboards.

Prerequisites

  • Basic understanding of star schemas (facts vs. dimensions).
  • Comfort with SQL DDL/DML (CREATE TABLE, INSERT, MERGE/UPSERT).
  • Familiarity with ETL/ELT pipelines and data quality checks.

Why this matters

Dimensions and hierarchies are how people slice and drill into data. If you get them right, analysts can answer questions quickly and consistently. As a Data Architect, you will:

  • Define conformed dimensions used across multiple fact tables.
  • Model slowly changing dimensions (SCD) to preserve historical truth.
  • Create usable hierarchies (e.g., Date: Year > Quarter > Month > Day; Product: Category > Subcategory > Product).
  • Handle ragged and many-to-many relationships without breaking reports.

Concept explained simply

A dimension is a descriptive table about a business entity (customer, product, date, region). It has:

  • A surrogate key (integer generated by the warehouse) to join facts reliably.
  • Attributes (names, types, labels) used to filter and group metrics.
  • Optionally, history tracking (SCD) for changes over time.

A hierarchy is an ordered set of attributes that form drill paths (e.g., Country > State > City). Hierarchies can be:

  • Balanced (every branch has the same depth) or ragged (some levels missing).
  • Simple level-based columns or parent-child structures with helper columns.

Mental model

Think of a dimension like a well-organized card catalog. Each card has a stable ID (surrogate key), a real-world reference (natural key), helpful labels (attributes), and sometimes previous versions (history). A hierarchy is like the folder structure that lets you drill down from broad to specific. Your job is to make these cards and folders consistent, easy to navigate, and durable over time.

Core building blocks

  • Surrogate key (e.g., product_key): warehouse-generated integer; never changes.
  • Natural/business key (e.g., product_code): the source identifier; may change or be reused.
  • Conformed dimension: shared across facts (e.g., Date, Product, Customer).
  • SCD Type 1: overwrite attributes; no history (for corrections/typos).
  • SCD Type 2: insert new row with effective dates and current flag; keeps history.
  • Default/Unknown members: include records like -1 "Unknown" to avoid join breaks.
  • Hierarchies: level-based columns (year, quarter, month) or parent-child with helper columns (parent_key, path, depth, is_leaf).
  • Bridge tables: resolve many-to-many or variable-depth scenarios (e.g., product with multiple categories) with weighting if needed.
Quick reference: When to use which SCD
  • Type 1: Fix data quality (name casing), non-analytical corrections.
  • Type 2: Preserve analytical truth (price changes, segment changes).
  • Hybrid: Type 2 for critical attributes; Type 1 for cosmetic ones.

Worked examples

Example 1 — Date dimension with hierarchy

Goal: Create a reusable date dimension with common attributes and a clear drill path.

  • Hierarchy: year > quarter > month > day
  • Attributes: day_of_week, week_of_year, is_weekend, fiscal variants if needed
Sample DDL
-- Dimension: dim_date
-- Include an "unknown" row (-1)
CREATE TABLE dim_date (
  date_key INT PRIMARY KEY, -- e.g., 20250131
  date_value DATE NOT NULL,
  year INT NOT NULL,
  quarter INT NOT NULL,
  month INT NOT NULL,
  day INT NOT NULL,
  day_name VARCHAR(10),
  week_of_year INT,
  is_weekend BOOLEAN,
  fiscal_year INT,
  fiscal_quarter INT
);
Tips
  • Pre-populate date rows for a wide range (e.g., 20 years).
  • Use numeric keys like YYYYMMDD for efficient joins to facts.

Example 2 — Customer dimension (hybrid SCD)

Goal: Track customer segment changes over time (Type 2), but overwrite minor formatting fixes (Type 1).

Sample DDL
CREATE TABLE dim_customer (
  customer_key BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  customer_id VARCHAR(64) NOT NULL, -- natural key from CRM
  customer_name VARCHAR(255),
  email VARCHAR(255), -- Type 1
  segment VARCHAR(50), -- Type 2
  country VARCHAR(50), -- Type 2
  effective_from DATE NOT NULL,
  effective_to DATE,
  is_current BOOLEAN NOT NULL DEFAULT TRUE
);
CREATE UNIQUE INDEX ux_customer_nk_version
  ON dim_customer(customer_id, effective_from);
Sample upsert logic (pseudocode)
-- If Type 2 attributes changed, close current row and insert new version
IF existing.is_current AND (existing.segment != incoming.segment OR existing.country != incoming.country) THEN
  UPDATE dim_customer
    SET effective_to = today - 1, is_current = FALSE
    WHERE customer_key = existing.customer_key;
  INSERT INTO dim_customer (..., effective_from=today, effective_to=NULL, is_current=TRUE);
ELSE
  -- Type 1 changes
  UPDATE dim_customer SET email = incoming.email WHERE customer_key = existing.customer_key;
END IF;

Example 3 — Product hierarchy (ragged + parent-child)

Goal: Support Category > Subcategory > Product, with some categories skipping levels and some products belonging to multiple categories.

Design options
  • Level-based columns in dim_product: category_name, subcategory_name, product_name (simple; fast for BI).
  • Parent-child structure in a separate dim_category with helper columns: category_key, parent_category_key, level_depth, hierarchy_path.
  • Bridge table product_category_bridge(product_key, category_key, weight_ratio) for many-to-many and weighted attribution.
Sample DDL
CREATE TABLE dim_product (
  product_key BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  product_code VARCHAR(64) NOT NULL,
  product_name VARCHAR(255),
  category_name VARCHAR(255),
  subcategory_name VARCHAR(255),
  -- optional SCD fields if names change materially
  effective_from DATE,
  effective_to DATE,
  is_current BOOLEAN
);
CREATE TABLE dim_category (
  category_key BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  category_id VARCHAR(64) NOT NULL,
  category_name VARCHAR(255) NOT NULL,
  parent_category_key BIGINT NULL,
  level_depth INT NOT NULL,
  hierarchy_path VARCHAR(4000), -- e.g., '/10/42/'
  is_leaf BOOLEAN
);
CREATE TABLE product_category_bridge (
  product_key BIGINT NOT NULL,
  category_key BIGINT NOT NULL,
  weight_ratio DECIMAL(5,4) DEFAULT 1.0 -- optional for allocation
);
Reporting tip

Provide both: easy level columns in dim_product for most BI use-cases, plus the category dimension + bridge for advanced analyses.

Step-by-step: Build a dimension and hierarchy

  1. Define the business grain: one row per what? (e.g., per product, per customer version, per calendar date)
  2. Choose keys: add a surrogate key; preserve natural key; add an Unknown row (-1).
  3. Select attributes: include names, codes, and analysis-friendly flags.
  4. Decide SCD strategy: Type 1, Type 2, or hybrid. Identify which columns drive new versions.
  5. Design hierarchies: level columns or parent-child; plan for ragged cases.
  6. Add audit fields: effective_from/to, is_current, load_ts, source_system.
  7. Populate and validate: load sample data; test joins to facts; check hierarchy rollups.
Self-check checklist
  • One clear business grain per dimension.
  • Surrogate key is stable and never reused.
  • Natural key uniqueness is enforced per version window.
  • Unknown/default member exists and is used by ETL on missing keys.
  • Hierarchy has no cycles; ragged paths handled gracefully (NULLs or parent-child).
  • SCD rules documented and tested.

Exercises

Do them in your own SQL environment or on paper. Then compare with the suggested solution.

Exercise 1 — Geography dimension with ragged hierarchy

Design dim_geography to support Country > Region/State > City. Some countries have no state. Use Type 1 for City name corrections; Type 2 for Region boundary changes.

  • Deliver: DDL, unknown member, example rows for two countries (one with states, one without).
  • Include a clear drill path Country > Region > City in level columns.
Hints
  • Denormalize levels into one table for BI simplicity; allow NULL for region when not applicable.
  • Use effective_from/to and is_current for Type 2 attributes.

Exercise 2 — Subscription plan (hybrid SCD)

Build dim_subscription where price changes are Type 2, description changes are Type 1.

  • Deliver: DDL, sample upsert (pseudocode), and two example versions for the same plan_code with different prices.
Hints
  • Plan_code is the natural key; subscription_key is surrogate.
  • Close the current row before inserting a new Type 2 version.

Common mistakes and how to self-check

  • Mixing grains: e.g., one row per customer and sometimes per customer version. Self-check: Does each row represent exactly one business instance?
  • Relying on natural keys in facts: they can change. Self-check: Facts always join via surrogate keys.
  • No Unknown member: leads to broken joins. Self-check: Insert -1 Unknown rows and use them in ETL fallback logic.
  • Overusing SCD Type 2: unnecessary bloat. Self-check: Only version attributes that affect analytics.
  • Ragged hierarchies causing NULL rollups. Self-check: BI layer handles NULLs, or use parent-child with helper columns.
  • Missing uniqueness checks. Self-check: Unique constraints on (natural_key, effective_from) or current-window logic.

Practical projects

  • Retail mart: dim_date, dim_store (with Country > Region > City), dim_product + product_category_bridge; connect to a sales fact and validate drilldowns.
  • SaaS mart: dim_customer (Type 2 on segment), dim_subscription (Type 2 on price), fact_billing; verify historical revenue by segment.
  • Org hierarchy: dim_employee with parent_employee_key and hierarchy_path; create a headcount fact and test rollups by department.

Learning path

  • Before: Star schema basics, identifying facts and conformed dimensions.
  • Now: Building dimensions and hierarchies (this lesson).
  • Next: Fact table design, degenerate dimensions, and bridge tables for allocations.

Mini challenge

Given a Product that can belong to multiple Subcategories, design the minimal set of tables/columns to support both simple drilldowns and many-to-many attribution. List the fields and describe how a dashboard would roll up revenue by Category.

One possible approach

Keep level columns in dim_product for easy drill (category_name, subcategory_name), and use product_category_bridge for many-to-many attribution with weight_ratio. Dashboards use dim_product for default drill; advanced reports join the bridge for weighted rollups.

Next steps

  • Review your current dimensions: add Unknown rows and document SCD rules.
  • Refactor one hierarchy to be explicit (either level-based or parent-child with helper columns).
  • Prepare test data to validate drilldowns and SCD behavior.

Ready to test yourself?

Take the quick test below to check understanding. Available to everyone; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Design dim_geography to support Country > Region/State > City. Requirements:

  • Use a surrogate key geography_key and preserve a natural key if available.
  • Type 1: City name corrections overwrite in place.
  • Type 2: Region boundary changes create a new version.
  • Support countries without regions (NULL allowed).
  • Provide DDL, an Unknown row (-1), and 4 example rows across two countries.
Expected Output
A CREATE TABLE statement with SCD fields, an Unknown member, and example rows covering both with-region and no-region cases.

Building Dimensions And Hierarchies — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Building Dimensions And Hierarchies?

AI Assistant

Ask questions about this tool