How to learn Data Modeling Basics for BI Dashboards in Data Analyst for free

Why this matters

Well-modeled data makes BI dashboards fast, accurate, and easy to extend. As a Data Analyst, you will routinely need to: diagnose why a KPI looks wrong, add a new field without breaking existing visuals, join multiple sources while keeping numbers consistent, and make dashboards refresh quickly for stakeholders. Solid data modeling skills turn scattered tables into a reliable, analytical dataset.

Who this is for

Data Analysts building or maintaining BI dashboards
Business Analysts who need trustworthy metrics
Anyone preparing to collaborate with data engineers on analytics-ready datasets

Prerequisites

Basic SQL (SELECT, JOIN, GROUP BY)
Comfort with spreadsheet pivot logic (rows, columns, values)
Familiarity with a BI tool (any) is helpful but not required

Concept explained simply

Data modeling is arranging tables and relationships so analysis is simple and correct. Most BI models use a star schema: one big fact table of events or numeric records, linked to smaller dimension tables that describe those events (like Date, Customer, Product).

Mental model

Imagine a worksheet of transactions (fact) and several lookup sheets (dimensions). You keep the facts tall and skinny (many rows, few computed columns), and the dimensions short and wide (fewer rows, lots of descriptive attributes). You connect lookups to facts with keys so you can slice and aggregate reliably.

Key analogy

Fact table = receipts; Dimension tables = catalogs that describe things on those receipts. The receipt lines store quantities and amounts; catalogs store names, categories, and attributes.

Core building blocks

Grain: a precise statement of what a single row in your fact represents (for example, one order line).
Fact table: numeric events at a chosen grain (for example, sales_amount, quantity, cost).
Dimension tables: descriptive attributes (for example, product_category, customer_segment, region).
Keys: surrogate keys (integers) link facts to dimensions. Natural keys (like product_code) are kept as attributes.
Relationships: many-to-one from fact to each dimension; filtering generally flows from dimensions to facts.
Measures: aggregations defined on fact columns (for example, SUM(sales_amount), AVG(discount)).
Slowly Changing Dimensions (SCD): preserve history when attributes change; Type 2 adds new rows with effective dates.

Worked examples

Example 1: Retail sales dashboard

Grain: one row per order line.
Fact: fact_sales with columns order_id, order_line_id, date_key, store_key, product_key, salesperson_key, quantity, sales_amount, discount_amount.
Dimensions: dim_date, dim_store, dim_product, dim_salesperson.
Why this works: You can slice sales by store, product, day, or salesperson, and add new dimensions later without changing the fact’s grain.

Example 2: Support tickets performance

Goal: Track time to resolution by priority and team.
Grain: one row per ticket status change (to analyze lifecycle), or simpler: one row per ticket.
Fact: fact_tickets with ticket_key, created_date_key, resolved_date_key, team_key, priority_key, status, time_to_resolve_hours.
Dimensions: dim_date, dim_team, dim_priority.
Tip: If you need lifecycle details (created, assigned, resolved), consider a status-change fact with one row per status transition.

Example 3: Marketing funnel

Goal: Analyze conversions by campaign and day.
Option A Grain: one row per session. Measures: session_count, converted_flag.
Option B Grain: one row per event (page_view, sign_up). Then build measures that count specific events.
Trade-off: Session grain is simpler; event grain is more flexible for multi-step funnels.

Common mistakes and how to self-check

Unclear grain: If you cannot describe what one fact row represents in one sentence, stop and define it.
Snowflaked dimensions in BI: Over-normalizing dims can slow filters. Prefer denormalized star for analysis.
Mixing periodic snapshots with transactions: Do not sum balances across days; use last-known balance or changes, depending on metric.
Missing surrogate keys: Relying only on natural keys can break joins when source systems change values.
Incorrect SCD handling: Overwriting attributes instead of preserving history causes time-travel reports to lie.

Self-check checklist

I have a one-sentence grain statement.
All fact-to-dimension links are many-to-one.
Dimensions contain descriptive attributes; facts contain numeric measures and foreign keys.
Surrogate keys exist and are unique per dimension row.
Historical attribute changes are handled (SCD) where needed.
Aggregations produce expected totals on sample data.

Exercises

Do these directly, then compare with the solutions in the Exercises section below.

ex1 — Define the grain and dimensions for a retail sales dataset with fields: date, store, product, salesperson, quantity, revenue, discount, return_flag. Write your grain statement and list necessary dimensions and fact columns.
ex2 — Sketch a star schema for subscriptions (customers on plans). Identify fact_subscriptions columns and dimension tables (customer, plan, date). Note keys and cardinalities.
ex3 — Design a Product dimension with SCD Type 2. Specify surrogate key and the columns needed to track history.

Hints

Start from the reporting questions: what will you slice by? Those are dimensions.
One row = one something. Lock that down first.
For SCD2, think: effective_from, effective_to, is_current.

Practice checklist

I wrote a precise grain statement for each fact.
I separated numeric facts from descriptive attributes.
I introduced surrogate keys for each dimension.
I noted which attributes require historical tracking.
I validated that each relationship is many-to-one from fact to dimension.

Practical projects

Store Sales Mini-Mart: Build fact_sales and dimensions (date, store, product, cashier). Create measures: Total Sales, Avg Discount %, Returns Count. Validate filters across all dims.
Ticket SLA Model: Build fact_tickets (one row per ticket) and dims (date, team, priority). Create measures: Avg Resolution Hours, % Breached SLA. Add a calculated column for business hours if needed.
Marketing Events Model: Build event-grain fact_events with dims (date, campaign, channel, device). Measures: Sessions, Sign-ups, Conversion Rate. Compare with a session-grain version and note pros/cons.

Learning path

Data Modeling Basics (this lesson)
Joins and Cardinality in BI models
Measures and Calculations (for example, DAX or calculated fields)
Performance tuning: partitions, aggregations, and model size
Governance: conformed dimensions, data quality checks, documentation

Next steps

Finish the exercises and confirm with the checklist.
Take the Quick Test to check understanding.
Apply the model to a small real dataset at work or a sample CSV.

Mini challenge

Design a model for subscription revenue with upgrades and downgrades. Choose a fact grain (for example, one row per subscription change). List required dimensions (date, customer, plan, reason). Define measures: MRR, Net MRR Change, Churn MRR. Write your grain statement in one sentence.

Quick Test

The quick test below is available to everyone. Only logged-in users have their progress saved.

Menu

Data Modeling Basics

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Core building blocks

Worked examples

Example 1: Retail sales dashboard

Example 2: Support tickets performance

Example 3: Marketing funnel

Common mistakes and how to self-check

Exercises

Practice checklist

Practical projects

Learning path

Next steps

Mini challenge

Quick Test

Practice Exercises

Define grain and dimensions for retail sales

Instructions

Expected Output

Star schema for subscriptions

Design an SCD Type 2 product dimension

Data Modeling Basics — Quick Test

Have questions about Data Modeling Basics?

AI Assistant