Why this matters for Data Architects
Dimensional modeling turns raw data into reliable, fast analytics. As a Data Architect, you design fact and dimension tables so teams can answer business questions consistently—without fragile, one-off SQL. This skill unlocks: clear business definitions, governed star schemas, semantic-model-ready datasets, and trustworthy metrics across BI tools.
Who this is for
- Data Architects planning analytics platforms and semantic layers.
- Analytics Engineers and Data Engineers building BI-ready marts.
- Tech Leads standardizing metrics across teams.
Prerequisites
- Comfort with SQL joins, aggregations, and window functions.
- Basic data modeling concepts (entities, keys, normalization).
- Familiarity with the business domain you’re modeling (sales, product, finance, etc.).
Learning path
Worked examples
1) Define grain and measures for an Orders fact
Business process: e-commerce ordering. Grain: one row per order line (order_id + product_id).
-- Grain: 1 row per order line item
create table dim_product (
product_sk bigint primary key,
product_id text unique,
category text,
subcategory text
);
create table dim_customer (
customer_sk bigint primary key,
customer_id text unique,
segment text,
country text
);
create table fact_order_line (
order_date date,
order_id text,
product_sk bigint,
customer_sk bigint,
quantity int,
unit_price numeric(12,2),
discount numeric(5,2),
revenue numeric(14,2) generated always as (quantity * unit_price * (1 - discount)) stored,
constraint pk_fact primary key (order_id, product_sk)
);
Notes: Dimensions use surrogate keys to decouple facts from source system keys. Measures are atomic and additive where possible.
2) Build a Date dimension with hierarchies
create table dim_date as
select
d::date as date_value,
extract(year from d)::int as year,
to_char(d, 'Q')::int as quarter,
extract(month from d)::int as month,
to_char(d, 'Mon') as month_name,
extract(isodow from d)::int as iso_dow,
to_char(d, 'Dy') as dow_name,
case when extract(isodow from d) in (6,7) then true else false end as is_weekend
from generate_series(date '2023-01-01', date '2030-12-31', interval '1 day') as g(d);
Hierarchy examples: Year > Quarter > Month > Date. Keep friendly names for BI.
3) SCD Type 2 for Customer dimension
-- dim_customer_scd2
create table dim_customer_scd2 (
customer_sk bigint generated by default as identity primary key,
customer_id text,
segment text,
country text,
valid_from timestamp not null,
valid_to timestamp,
is_current boolean not null default true
);
-- Close out previous version and insert a new one when attributes change
with changed as (
select s.* from staging_customer s
join dim_customer_scd2 d
on d.customer_id = s.customer_id and d.is_current
where (s.segment, s.country) is distinct from (d.segment, d.country)
)
update dim_customer_scd2 d
set valid_to = now(), is_current = false
from changed c
where d.customer_id = c.customer_id and d.is_current;
insert into dim_customer_scd2 (customer_id, segment, country, valid_from, valid_to, is_current)
select c.customer_id, c.segment, c.country, now(), null, true
from changed c;
Querying current vs historical: filter is_current = true for latest, or join on date between valid_from and coalesce(valid_to, '9999-12-31').
4) Snapshot vs Accumulating facts
-- Daily inventory snapshot (periodic snapshot)
create table fact_inventory_snapshot as
select
d.date_value as snapshot_date,
p.product_sk,
sum(coalesce(inv.on_hand_qty,0)) as on_hand_qty
from dim_date d
left join product_inventory inv on inv.snapshot_date = d.date_value
left join dim_product p on p.product_id = inv.product_id
group by 1,2;
-- Accumulating fact for order fulfillment (one row per order, milestones update)
create table fact_order_fulfillment (
order_id text primary key,
customer_sk bigint,
order_date date,
picked_date date,
packed_date date,
shipped_date date,
delivered_date date,
days_to_ship int generated always as (shipped_date - order_date) stored,
days_to_deliver int generated always as (delivered_date - order_date) stored
);
Choose snapshots for periodic states (inventory, subscription MRR). Choose accumulating facts for processes with defined milestones.
5) Conformed dimensions across Sales and Support
Goal: Sales and Support both analyze by the same Customer definition.
-- A single conformed dim_customer used by multiple marts
create view sales.dim_customer as
select customer_sk, customer_id, segment, country from dim_customer_scd2 where is_current;
create view support.dim_customer as
select customer_sk, customer_id, segment, country from dim_customer_scd2 where is_current;
-- Both sales.fact_order_line and support.fact_tickets reference customer_sk from the same dimension.Conformance means shared keys, consistent attribute names, and identical business definitions so cross-domain metrics align.
Drills and exercises
- Write a grain statement for two processes: Orders and Website Sessions.
- List additive, semi-additive, and non-additive measures in your domain.
- Create a dim_date table with fiscal calendar attributes.
- Implement SCD Type 2 for one dimension; validate historical queries.
- Build one transaction fact and one periodic snapshot; compare query patterns.
- Define three governed metrics (formula, grain, filters, time).
Common mistakes and debugging tips
Mistake: Unclear grain
Symptom: Duplicated counts or inconsistent revenue. Fix: Write a one-sentence grain statement for each fact. Verify keys that uniquely identify a row.
Mistake: Snowflaking dimensions prematurely
Symptom: Slow BI queries with many joins. Fix: Prefer denormalized dimensions for analytics unless size is prohibitive.
Mistake: Missing surrogate keys
Symptom: Source key reuse breaks joins. Fix: Use surrogate keys; keep natural keys as attributes.
Mistake: Wrong SCD type
Symptom: Lost history or inflated counts. Fix: Use Type 2 for attributes that change and affect analysis; Type 1 for minor corrections.
Debugging joins
Check row counts at each step. Validate distinct business keys. Look for many-to-many joins—resolve via bridge tables or proper grain alignment.
Mini project: Retail analytics mart
Build a BI-ready retail mart covering orders, customers, and products.
- Write grain statements for fact_order_line (order_id + product_id) and fact_inventory_snapshot (date + product).
- Create dim_date, dim_customer (SCD2), dim_product.
- Load sample data and implement surrogate keys.
- Build fact_order_line (transaction) and fact_inventory_snapshot (daily snapshot).
- Define three metrics: gross_revenue, avg_order_value, repeat_rate. Document filters and time grains.
- Test: a) daily revenue trend, b) revenue by segment and category, c) inventory on-hand vs sales.
Acceptance checklist
- Facts have clear, documented grain.
- Dimensions have surrogate keys and friendly names.
- SCD2 implemented for customer; current and historical queries work.
- Metrics return identical results across two different queries.
Practical projects
- Subscription analytics mart: customers, plans, invoices, MRR daily snapshots.
- Logistics mart: orders, shipments (accumulating fact), carriers, lanes.
- Product analytics mart: sessions, events, users, feature adoption cohorts.
Subskills
- Defining Grain And Facts: Choose transaction/snapshot/accumulating types and write precise grain statements with additive measures.
- Building Dimensions And Hierarchies: Design denormalized dimensions with surrogate keys and intuitive hierarchies (date, product, geography).
- Conformed Dimensions Across Teams: Standardize shared dimensions and keys to align Sales, Support, Finance, and Product analytics.
- Slowly Changing Dimensions Patterns: Apply SCD 0/1/2/3 and query with valid_from/valid_to or is_current.
- Snapshot And Accumulating Facts: Model state over time and milestone-driven processes for performance and clarity.
- Metric Consistency By Design: Define metrics once; control grain, filters, and aggregation behavior to avoid double counting.
- Semantic Model Readiness: Ensure types, naming, and relationships are BI/semantic-layer friendly.
- Star Schema Governance: Establish conventions, ownership, and review to keep marts consistent and trustworthy.
Next steps
- Apply these patterns to one real business process and validate with stakeholders.
- Create a short design doc per star: grain, keys, dimensions, measures, and metric definitions.
- Take the skill exam to check your understanding. Anyone can take it for free; sign in if you want your progress saved.