How to learn Data Modeling for BI for BI Developer for free

Why Data Modeling matters for a BI Developer

Great BI is impossible without a clean data model. As a BI Developer, your dashboards, metrics, and self-serve exploration all depend on a model that is consistent, predictable, and fast. Good models reduce ad-hoc fixes, prevent double counting, and let business users answer their own questions safely.

Unlocks trustworthy KPIs (one definition, reused everywhere)
Speeds up dashboards and queries (star schemas, proper grain)
Enables self-serve analytics (semantic models with business names)
Simplifies maintenance (clear relationships and change handling)

What you’ll learn

Designing star schemas for reporting
Defining the correct grain of fact tables
Building conformed dimensions and hierarchies
Handling Slowly Changing Dimensions (SCD) basics
Designing semantic models for self-serve analytics
Separating measures from dimensions cleanly
Using surrogate keys and designing relationships
Mapping business logic into the model safely

Who this is for

BI Developers building dashboards, reports, and semantic layers
Analysts moving from ad-hoc SQL toward robust, reusable models
Data practitioners who support self-serve analytics in a BI tool

Prerequisites

Comfort with SQL SELECT, JOIN, GROUP BY, and basic window functions
Understanding of your business domains (sales, marketing, support, etc.)
Basic knowledge of your BI tool’s semantic layer and data sources

Learning path (roadmap)

Capture reporting needs
List core KPIs, supporting dimensions (who/what/when/where), and how users slice results.
Choose a star schema
Identify fact tables (events, transactions, snapshots) and conformed dimensions (Date, Customer, Product, etc.).
Define fact grain
Write one sentence per fact: “One row per <thing> per <time unit>.” Validate against KPIs.
Build dimensions and hierarchies
Add surrogate keys, natural keys, attributes, and hierarchies (e.g., Country → Region).
Handle changes (SCD)
Start with SCD Type 2 for business-critical attributes; add valid_from/valid_to/is_current.
Separate measures vs dimensions
Make numeric aggregations measures; keep descriptors as attributes. Avoid mixed-use fields.
Design the semantic model
Expose friendly names, set default aggregations, hide technical columns, define relationships.
Validate and iterate
Recalculate KPIs, reconcile to source-of-truth, and add tests for counts, uniqueness, and referential integrity.

Quick checks you can do at each milestone

Can you describe the grain out loud in one sentence?
Is every measure tied to the correct fact table?
Are dimensions conformed across facts (one Customer dimension, reused)?
Are changing attributes handled (SCD) where needed?
Do default aggregations match business expectations?

Worked examples

Example 1: Sales star schema from requirements

Requirement: Report Revenue, Orders, Units by Product, Customer, and Day. Drill down by Category and Region.

Grain choice: FactSales = one row per order_line_id.

-- Dimensions (surrogate keys, conformed for reuse)
CREATE TABLE dim_date (
  date_key INT PRIMARY KEY,        -- e.g., 20250131
  date_value DATE NOT NULL,
  day_of_week INT,
  week_of_year INT,
  month INT,
  quarter INT,
  year INT
);

CREATE TABLE dim_product (
  product_sk BIGINT PRIMARY KEY,
  product_nk VARCHAR(50) UNIQUE,   -- natural key from source
  product_name VARCHAR(255),
  category VARCHAR(100),
  subcategory VARCHAR(100)
);

CREATE TABLE dim_customer (
  customer_sk BIGINT PRIMARY KEY,
  customer_nk VARCHAR(50) UNIQUE,
  customer_name VARCHAR(255),
  region VARCHAR(100)
);

-- Fact at order line grain
CREATE TABLE fact_sales (
  order_line_id BIGINT PRIMARY KEY,
  order_nk VARCHAR(50),
  date_key INT NOT NULL REFERENCES dim_date(date_key),
  product_sk BIGINT NOT NULL REFERENCES dim_product(product_sk),
  customer_sk BIGINT NOT NULL REFERENCES dim_customer(customer_sk),
  quantity INT,
  unit_price NUMERIC(12,2),
  revenue AS (quantity * unit_price) STORED
);

Measures: Revenue (SUM), Units (SUM), Orders (COUNT DISTINCT order_nk).

Why order line grain?

It avoids double counting when multiple products exist per order and supports product-level analysis while still enabling order-level KPIs with COUNT DISTINCT.

Example 2: SCD Type 2 basics for Customer

CREATE TABLE dim_customer_scd (
  customer_sk BIGINT PRIMARY KEY,
  customer_nk VARCHAR(50),
  customer_name VARCHAR(255),
  region VARCHAR(100),            -- may change over time
  valid_from TIMESTAMP NOT NULL,
  valid_to TIMESTAMP,
  is_current BOOLEAN NOT NULL DEFAULT TRUE
);

-- Pseudocode for an upsert pattern
-- 1) End current record when a change is detected
UPDATE dim_customer_scd d
SET valid_to = NOW(), is_current = FALSE
FROM staging_customer s
WHERE d.customer_nk = s.customer_nk
  AND d.is_current = TRUE
  AND (d.region IS DISTINCT FROM s.region
       OR d.customer_name IS DISTINCT FROM s.customer_name);

-- 2) Insert a new current record
INSERT INTO dim_customer_scd (
  customer_sk, customer_nk, customer_name, region, valid_from, valid_to, is_current)
SELECT nextval('customer_sk_seq'), s.customer_nk, s.customer_name, s.region, NOW(), NULL, TRUE
FROM staging_customer s
LEFT JOIN dim_customer_scd d
  ON d.customer_nk = s.customer_nk AND d.is_current = TRUE
WHERE d.customer_nk IS NULL
   OR (d.region IS DISTINCT FROM s.region
       OR d.customer_name IS DISTINCT FROM s.customer_name);

Querying with SCD2

Join facts to the current dimension version with is_current = TRUE, or to a historical version by filtering fact_date BETWEEN valid_from AND COALESCE(valid_to, '9999-12-31').

Example 3: Measures vs dimensions separation

-- View to expose clean measures
CREATE VIEW mart_sales_measures AS
SELECT
  fs.product_sk,
  fs.customer_sk,
  fs.date_key,
  SUM(fs.quantity)           AS units,
  SUM(fs.revenue)            AS revenue,
  COUNT(DISTINCT fs.order_nk) AS orders,
  CASE WHEN SUM(fs.quantity) = 0 THEN NULL
       ELSE SUM(fs.revenue) / SUM(fs.quantity) END AS avg_price
FROM fact_sales fs
GROUP BY 1,2,3;

Keep descriptors (like product_name, region) in dimensions; keep aggregations in a measures view or semantic model. This avoids mixing data types and simplifies default aggregations in the BI tool.

Example 4: Semantic model for self-serve

Expose business-friendly names, hide technical keys, and set default aggregations.

-- Example: final semantic view for BI
CREATE VIEW bi_sales AS
SELECT
  d.date_value         AS Date,
  p.product_name       AS Product,
  p.category           AS Category,
  c.customer_name      AS Customer,
  c.region             AS Region,
  m.units              AS Units,        -- SUM default
  m.revenue            AS Revenue,      -- SUM default
  m.orders             AS Orders,       -- COUNT default
  m.avg_price          AS Avg Price     -- AVERAGE default (precomputed)
FROM mart_sales_measures m
JOIN dim_date d     ON d.date_key = m.date_key
JOIN dim_product p  ON p.product_sk = m.product_sk
JOIN dim_customer c ON c.customer_sk = m.customer_sk;

Tip: role-playing dates

If you need Order Date and Ship Date, use multiple foreign keys to the same DimDate and alias columns as Order Date, Ship Date in the semantic view.

Example 5: Many-to-many with a bridge

Customers can belong to multiple Segments; you want Segment analysis on sales.

CREATE TABLE dim_segment (
  segment_sk BIGINT PRIMARY KEY,
  segment_nk VARCHAR(50) UNIQUE,
  segment_name VARCHAR(100)
);

CREATE TABLE bridge_customer_segment (
  customer_sk BIGINT NOT NULL REFERENCES dim_customer(customer_sk),
  segment_sk BIGINT NOT NULL REFERENCES dim_segment(segment_sk),
  PRIMARY KEY (customer_sk, segment_sk)
);

-- Segment-level revenue
SELECT s.segment_name, SUM(fs.revenue) AS revenue
FROM fact_sales fs
JOIN bridge_customer_segment b ON b.customer_sk = fs.customer_sk
JOIN dim_segment s ON s.segment_sk = b.segment_sk
GROUP BY 1;

Avoid double counting

When combining Segment with other many-to-many attributes, restrict to one bridge at a time or pre-aggregate to the appropriate grain to avoid duplication.

Drills and exercises

Write the grain statement for two facts you own (e.g., “one row per invoice line”).
List 5 measures and map each to its correct fact table.
Identify 3 attributes that change slowly; decide if they need SCD2.
Create a small Date dimension with year/quarter/month/day and a surrogate key.
Design a simple bridge table for one many-to-many relationship in your domain.
Rename columns in a semantic view to business-friendly labels.
Add a default aggregation for every numeric field in your BI tool.

Common mistakes and how to debug

Undefined grain: Symptoms include inconsistent counts. Fix by writing a one-sentence grain and aligning all joins to it.
Mixing measures and attributes: Causes wrong aggregations. Separate into measure fields and dimensional attributes.
No conformed dimensions: Different Customer tables across facts. Create a single dim_customer reused by all facts.
Ignoring history: Overwrites break historical reports. Use SCD2 for critical attributes (region, segment).
Many-to-many double counting: Duplicated totals when joining bridges. Aggregate at the right grain or pre-aggregate first.
Leaking technical keys: Confuses users. Hide surrogate keys in the semantic layer; expose business names.
Missing default aggregations: Averages where sums are expected. Set explicit defaults (SUM, COUNT, NONE).

Debugging checklist

Recalculate a KPI three ways (SQL, BI visual, exported CSV) and compare.
Check referential integrity: any fact rows with missing dimension keys?
Validate today’s numbers against a trusted baseline (finance close, source app).
Scan for non-unique natural keys in dimensions; dedupe if needed.
Test historical joins with BETWEEN valid_from AND valid_to.

Mini project: Sales Analytics Mart

Goal: Build a star schema for sales with a self-serve semantic view that supports Revenue, Orders, Units by Product, Customer, and Date.

Define grain: one row per order line in fact_sales.
Create dim_date, dim_product, dim_customer (with SCD2 for region).
Load fact_sales and compute revenue.
Add a bridge for customer-to-segment (optional).
Create a measures view with revenue, orders, units, and avg_price.
Design a semantic view with friendly names and default aggregations.
Validate totals against the source system for a chosen period.

Acceptance criteria

All keys resolve; no orphaned fact rows.
Revenue total matches within 0.5% of the source for last month.
Filters by Category and Region work without double counting.
Both current and historical region analysis are possible.

Subskills

Star Schema For Reporting — Design facts and conformed dimensions for simple, fast reporting.
Defining Grain For Facts — Make one clear statement per fact; drive all joins and measures by it.
Building Dimensions And Hierarchies — Create attributes and drill paths that match how the business thinks.
Handling Slowly Changing Dimensions Basics — Track attribute history (e.g., region changes) safely.
Designing Semantic Models For Self Serve — Expose business-friendly fields with correct default aggregations.
Measures Versus Dimensions Separation — Keep numeric aggregations separate from descriptive attributes.
Surrogate Keys And Relationship Design — Use stable keys and clear relationships across tables.
Mapping Business Logic Into The Model — Implement metric definitions and rules once, reuse everywhere.

Next steps

Practice by modeling one domain end-to-end (orders, subscriptions, or support tickets).
Take the skill exam below to validate your understanding. Everyone can take it; only logged-in users will see saved progress.
Iterate with stakeholder feedback and add tests for grain, uniqueness, and relationships.

Menu

Data Modeling for BI

Table of Contents

Why Data Modeling matters for a BI Developer

What you’ll learn

Who this is for

Prerequisites

Learning path (roadmap)

Worked examples

Example 1: Sales star schema from requirements

Example 2: SCD Type 2 basics for Customer

Example 3: Measures vs dimensions separation

Example 4: Semantic model for self-serve

Example 5: Many-to-many with a bridge

Drills and exercises

Common mistakes and how to debug

Mini project: Sales Analytics Mart

Subskills

Next steps

Data Modeling for BI — Skill Exam

Topics

Star Schema For Reporting

Defining Grain For Facts

Building Dimensions And Hierarchies

Handling Slowly Changing Dimensions Basics

Designing Semantic Models For Self Serve

Measures Versus Dimensions Separation

Surrogate Keys And Relationship Design

Mapping Business Logic Into The Model

Have questions about Data Modeling for BI?

AI Assistant