How to learn Building Facts And Dimensions In SQL for SQL in Analytics Engineer for free

Why this matters

As an Analytics Engineer, you turn raw application tables into clean, trustworthy star schemas that power BI dashboards, forecasting models, and business decisions. Common tasks include:

Designing star schemas with clear grain and conformed dimensions
Building Type 1 and Type 2 dimensions with surrogate keys
Creating fact tables with accurate measures and foreign keys to dimensions
Handling incremental loads and late-arriving data
Ensuring data quality with self-checks and reconciliation

Real-world examples

E-commerce: customer, product, and date dimensions; orders and payments facts
SaaS: account, plan, and user dimensions; subscriptions and usage facts
Finance: chart of accounts dimension; journal entries fact

Concept explained simply

A dimension table describes things (who, what, when, where). A fact table records events or measurements at a defined granularity (grain).

Mental model: store receipts

Each shopping receipt line is a fact (one row per item bought). The product catalog, customers, and calendar are dimensions. Facts point to dimensions via keys and store quantities and amounts.

Who this is for

Aspiring and junior Analytics Engineers
Data Analysts moving toward modeling
BI Developers who need robust back-end models

Prerequisites

Solid SQL basics (SELECT, JOIN, GROUP BY, window functions)
Comfort with data types, NULLs, and casting
Understanding of business domain terms (orders, customers, etc.)

Learning path

Define star schema and grains
Build core dimensions (Type 1), then add Type 2 where history matters
Create fact tables with measures and foreign keys
Add incremental logic and QA checks
Optimize joins and document grain/columns

Key steps to build dimensions

Define the grain: what makes one row unique? (e.g., one row per customer)
Select attributes: stable, descriptive columns (avoid volatile metrics)
Create a surrogate key: customer_sk integer or a stable hash
Deduplicate: use ROW_NUMBER() by natural key, keep the latest
Choose SCD type:
- Type 1: overwrite changes (no history)
- Type 2: track history with effective dates or an is_current flag

Type 2 details

Columns: effective_from, effective_to, is_current
Join facts to the correct historical row using event timestamp between effective_from and effective_to

Key steps to build facts

Pick the grain: e.g., one row per order line
Measures: amounts and counts, computed at the same grain
Foreign keys: join to dimensions using surrogate keys
Filters: exclude test data, canceled rows if business requires
Incremental: load only new/changed data using watermarks

Performance tips

Prefer integer surrogate keys for joins
Pre-aggregate to the needed grain for dashboards
Partition or cluster large facts by date

Worked examples

Example 1: Customer Dimension (Type 1)

-- Keep latest version per natural key (customer_id), overwrite changes
WITH src AS (
  SELECT customer_id, email, first_name, last_name, country, updated_at,
         ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY updated_at DESC) AS rn
  FROM stg_customers
)
SELECT
  CAST(ABS(CAST(MD5(COALESCE(LOWER(email), CAST(customer_id AS VARCHAR))) AS BIGINT)) AS BIGINT) AS customer_sk,
  customer_id AS customer_id_nk,
  email,
  first_name,
  last_name,
  country,
  CURRENT_DATE AS record_loaded_at
FROM src
WHERE rn = 1;

Why this works

ROW_NUMBER() keeps the latest row per customer. A surrogate key is generated deterministically from a natural identifier; in production, many warehouses use sequences/identity columns instead.

Example 2: Product Dimension (Type 2)

-- Build SCD2 for products tracking price and category changes
WITH staged AS (
  SELECT * FROM stg_products
), dedup AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY product_id, valid_from ORDER BY updated_at DESC) AS rn
  FROM staged
), base AS (
  SELECT product_id, product_name, category, price, valid_from, valid_to
  FROM dedup WHERE rn = 1
), with_bounds AS (
  SELECT *,
         COALESCE(valid_to, DATE '9999-12-31') AS effective_to,
         valid_from AS effective_from
  FROM base
)
SELECT
  CAST(ABS(CAST(MD5(CAST(product_id AS VARCHAR) || '|' || CAST(effective_from AS VARCHAR)) AS BIGINT)) AS BIGINT) AS product_sk,
  product_id AS product_id_nk,
  product_name,
  category,
  price,
  effective_from,
  effective_to,
  CASE WHEN effective_to = DATE '9999-12-31' THEN TRUE ELSE FALSE END AS is_current
FROM with_bounds;

Notes

Use business validity dates when available; otherwise derive with window functions on change points.
Facts must time-travel join to the correct row.

Example 3: Orders Fact (one row per order line)

WITH order_lines AS (
  SELECT ol.order_id, ol.line_id, ol.product_id, ol.quantity, ol.unit_price,
         o.customer_id, o.order_timestamp::DATE AS order_date
  FROM stg_order_lines ol
  JOIN stg_orders o ON o.order_id = ol.order_id
  WHERE o.is_test = FALSE
), product_dim AS (
  SELECT * FROM dim_product  -- SCD2 with effective dates
), customer_dim AS (
  SELECT * FROM dim_customer -- Type 1 is fine for many cases
)
SELECT
  ol.order_id,
  ol.line_id,
  cd.customer_sk,
  pd.product_sk,
  CAST(ol.order_date AS DATE) AS order_date,
  ol.quantity,
  ol.unit_price,
  (ol.quantity * ol.unit_price) AS line_amount
FROM order_lines ol
LEFT JOIN customer_dim cd
  ON cd.customer_id_nk = ol.customer_id
LEFT JOIN product_dim pd
  ON pd.product_id_nk = ol.product_id
 AND CAST(ol.order_date AS DATE) BETWEEN pd.effective_from AND pd.effective_to;

Why LEFT JOIN?

To avoid dropping fact rows when a dimension row is temporarily missing. You can later track missing keys via data quality checks.

Common mistakes and self-checks

Mistake: Undefined grain. Fix: Write a one-sentence grain statement for every table.
Mistake: Joining facts to Type 2 dims without date filters. Fix: Join using event timestamp between effective dates.
Mistake: Using natural keys in facts. Fix: Map to surrogate keys for stable joins.
Mistake: Measures not at the fact grain. Fix: Recompute or pre-aggregate appropriately.
Mistake: Duplicates in dimensions. Fix: Deduplicate using ROW_NUMBER() by natural key.

Quick self-check queries

Fact row count vs source: sums should match after filters.
Check for NULL foreign keys in facts and track rates.
Ensure effective_from <= effective_to and exactly one current row per natural key for Type 2.

Exercises

These mirror the exercises below. Try them before peeking.

Exercise 1: Build a Type 1 dim_customer from stg_customers with dedup and a surrogate key.
Exercise 2: Create an orders_fact table at line grain and map to product and customer dimensions.

Checklist to complete each exercise:
- Grain statement written in a comment
- All joins explicit and justified
- NULL handling and filters are documented
- Foreign keys map to surrogate keys

Practical projects

E-commerce star schema: build dim_customer, dim_product, dim_date, and fact_orders with incremental loads.
SaaS subscriptions: dim_account (Type 2 on plan), fact_subscriptions, fact_usage_daily.
Finance mini-warehouse: dim_account, dim_date, fact_journal_entries with debits/credits validation.

Next steps

Add data quality tests (row counts, null rates, uniqueness)
Document tables: grain, columns, and business rules
Introduce incremental patterns and partitioning for large facts

Mini challenge

Your PM wants historical product category reporting. Convert your product dimension from Type 1 to Type 2 and update the fact join to time-travel. Validate that last quarter's category revenue matches after the change.

Quick Test access

The test is available to everyone. If you log in, your progress will be saved.

Menu

Building Facts And Dimensions In SQL

Table of Contents

Why this matters

Concept explained simply

Who this is for

Prerequisites

Learning path

Key steps to build dimensions

Key steps to build facts

Worked examples

Example 1: Customer Dimension (Type 1)

Example 2: Product Dimension (Type 2)

Example 3: Orders Fact (one row per order line)

Common mistakes and self-checks

Exercises

Practical projects

Next steps

Mini challenge

Quick Test access

Practice Exercises

Build a Type 1 Customer Dimension

Instructions

Expected Output

Create Orders Fact at Line Grain

Building Facts And Dimensions In SQL — Quick Test

Have questions about Building Facts And Dimensions In SQL?

AI Assistant