luvv to helpDiscover the Best Free Online Tools
Topic 8 of 12

Data Modeling Basics

Learn Data Modeling Basics for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

Well-modeled data makes BI dashboards fast, accurate, and easy to extend. As a Data Analyst, you will routinely need to: diagnose why a KPI looks wrong, add a new field without breaking existing visuals, join multiple sources while keeping numbers consistent, and make dashboards refresh quickly for stakeholders. Solid data modeling skills turn scattered tables into a reliable, analytical dataset.

Who this is for

  • Data Analysts building or maintaining BI dashboards
  • Business Analysts who need trustworthy metrics
  • Anyone preparing to collaborate with data engineers on analytics-ready datasets

Prerequisites

  • Basic SQL (SELECT, JOIN, GROUP BY)
  • Comfort with spreadsheet pivot logic (rows, columns, values)
  • Familiarity with a BI tool (any) is helpful but not required

Concept explained simply

Data modeling is arranging tables and relationships so analysis is simple and correct. Most BI models use a star schema: one big fact table of events or numeric records, linked to smaller dimension tables that describe those events (like Date, Customer, Product).

Mental model

Imagine a worksheet of transactions (fact) and several lookup sheets (dimensions). You keep the facts tall and skinny (many rows, few computed columns), and the dimensions short and wide (fewer rows, lots of descriptive attributes). You connect lookups to facts with keys so you can slice and aggregate reliably.

Key analogy

Fact table = receipts; Dimension tables = catalogs that describe things on those receipts. The receipt lines store quantities and amounts; catalogs store names, categories, and attributes.

Core building blocks

  • Grain: a precise statement of what a single row in your fact represents (for example, one order line).
  • Fact table: numeric events at a chosen grain (for example, sales_amount, quantity, cost).
  • Dimension tables: descriptive attributes (for example, product_category, customer_segment, region).
  • Keys: surrogate keys (integers) link facts to dimensions. Natural keys (like product_code) are kept as attributes.
  • Relationships: many-to-one from fact to each dimension; filtering generally flows from dimensions to facts.
  • Measures: aggregations defined on fact columns (for example, SUM(sales_amount), AVG(discount)).
  • Slowly Changing Dimensions (SCD): preserve history when attributes change; Type 2 adds new rows with effective dates.

Worked examples

Example 1: Retail sales dashboard

  • Grain: one row per order line.
  • Fact: fact_sales with columns order_id, order_line_id, date_key, store_key, product_key, salesperson_key, quantity, sales_amount, discount_amount.
  • Dimensions: dim_date, dim_store, dim_product, dim_salesperson.
  • Why this works: You can slice sales by store, product, day, or salesperson, and add new dimensions later without changing the fact’s grain.

Example 2: Support tickets performance

  • Goal: Track time to resolution by priority and team.
  • Grain: one row per ticket status change (to analyze lifecycle), or simpler: one row per ticket.
  • Fact: fact_tickets with ticket_key, created_date_key, resolved_date_key, team_key, priority_key, status, time_to_resolve_hours.
  • Dimensions: dim_date, dim_team, dim_priority.
  • Tip: If you need lifecycle details (created, assigned, resolved), consider a status-change fact with one row per status transition.

Example 3: Marketing funnel

  • Goal: Analyze conversions by campaign and day.
  • Option A Grain: one row per session. Measures: session_count, converted_flag.
  • Option B Grain: one row per event (page_view, sign_up). Then build measures that count specific events.
  • Trade-off: Session grain is simpler; event grain is more flexible for multi-step funnels.

Common mistakes and how to self-check

  • Unclear grain: If you cannot describe what one fact row represents in one sentence, stop and define it.
  • Snowflaked dimensions in BI: Over-normalizing dims can slow filters. Prefer denormalized star for analysis.
  • Mixing periodic snapshots with transactions: Do not sum balances across days; use last-known balance or changes, depending on metric.
  • Missing surrogate keys: Relying only on natural keys can break joins when source systems change values.
  • Incorrect SCD handling: Overwriting attributes instead of preserving history causes time-travel reports to lie.
Self-check checklist
  • I have a one-sentence grain statement.
  • All fact-to-dimension links are many-to-one.
  • Dimensions contain descriptive attributes; facts contain numeric measures and foreign keys.
  • Surrogate keys exist and are unique per dimension row.
  • Historical attribute changes are handled (SCD) where needed.
  • Aggregations produce expected totals on sample data.

Exercises

Do these directly, then compare with the solutions in the Exercises section below.

  1. ex1 — Define the grain and dimensions for a retail sales dataset with fields: date, store, product, salesperson, quantity, revenue, discount, return_flag. Write your grain statement and list necessary dimensions and fact columns.
  2. ex2 — Sketch a star schema for subscriptions (customers on plans). Identify fact_subscriptions columns and dimension tables (customer, plan, date). Note keys and cardinalities.
  3. ex3 — Design a Product dimension with SCD Type 2. Specify surrogate key and the columns needed to track history.
Hints
  • Start from the reporting questions: what will you slice by? Those are dimensions.
  • One row = one something. Lock that down first.
  • For SCD2, think: effective_from, effective_to, is_current.

Practice checklist

  • I wrote a precise grain statement for each fact.
  • I separated numeric facts from descriptive attributes.
  • I introduced surrogate keys for each dimension.
  • I noted which attributes require historical tracking.
  • I validated that each relationship is many-to-one from fact to dimension.

Practical projects

  • Store Sales Mini-Mart: Build fact_sales and dimensions (date, store, product, cashier). Create measures: Total Sales, Avg Discount %, Returns Count. Validate filters across all dims.
  • Ticket SLA Model: Build fact_tickets (one row per ticket) and dims (date, team, priority). Create measures: Avg Resolution Hours, % Breached SLA. Add a calculated column for business hours if needed.
  • Marketing Events Model: Build event-grain fact_events with dims (date, campaign, channel, device). Measures: Sessions, Sign-ups, Conversion Rate. Compare with a session-grain version and note pros/cons.

Learning path

  1. Data Modeling Basics (this lesson)
  2. Joins and Cardinality in BI models
  3. Measures and Calculations (for example, DAX or calculated fields)
  4. Performance tuning: partitions, aggregations, and model size
  5. Governance: conformed dimensions, data quality checks, documentation

Next steps

  • Finish the exercises and confirm with the checklist.
  • Take the Quick Test to check understanding.
  • Apply the model to a small real dataset at work or a sample CSV.

Mini challenge

Design a model for subscription revenue with upgrades and downgrades. Choose a fact grain (for example, one row per subscription change). List required dimensions (date, customer, plan, reason). Define measures: MRR, Net MRR Change, Churn MRR. Write your grain statement in one sentence.

Quick Test

The quick test below is available to everyone. Only logged-in users have their progress saved.

Practice Exercises

3 exercises to complete

Instructions

You have fields: date, store, product, salesperson, quantity, revenue, discount, return_flag.

  • Write a one-sentence grain statement for the fact table.
  • List the dimension tables you need and their key attributes.
  • List the fact table columns (keys and numeric fields).
Expected Output
A clear grain statement, a list of dimensions (date, store, product, salesperson), and a fact table spec with foreign keys and numeric measures.

Data Modeling Basics — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Data Modeling Basics?

AI Assistant

Ask questions about this tool