How to learn Data Model Size Reduction for Performance Optimization in BI Developer for free

Why this matters

As a BI Developer, model size directly impacts refresh time, memory usage, and query speed. Smaller models load dashboards faster, reduce costs on shared capacity, and make DAX/SQL engines scan less data. Typical tasks include trimming wide fact tables, choosing efficient data types, offloading high-cardinality text to dimensions, and building aggregate tables for heavy reports.

[ ] Speed up refreshes from 60 min to under 15 by reducing model size
[ ] Fit your model into memory limits (e.g., shared capacity or on-prem servers)
[ ] Improve end-user experience with faster visuals and fewer timeouts

Concept explained simply

Modern BI engines are mostly columnar and dictionary-encoded. Each column stores a dictionary of unique values and references to them. The bigger the dictionary or the more segments to store, the larger and slower your model.

Mental model

Think of each column as a glossary:

The glossary (dictionary) lists every unique value once.
Each row stores only a small pointer to the glossary entry.
If your column has many unique values (high cardinality), the glossary grows and so does the storage footprint.

Therefore: reduce the number of unique values, reduce the number of columns, and use compact data types.

Core techniques (actionable)

Drop and hide
- Remove columns not used by visuals, measures, filters, or relationships.
- Hide columns that are only needed for relationships from report view.
Use compact data types
- Prefer integers for keys instead of GUIDs or long text.
- Use date instead of datetime if time is not needed.
- Reduce decimals to the necessary precision (e.g., 2 decimal places).
- Use booleans/flags (0/1) instead of "Yes/No" text.
Reduce cardinality
- Split descriptive text into a dimension table; keep only the integer key in the fact.
- Bucket or round continuous values when appropriate (e.g., ages into ranges).
- Trim whitespace and normalize casing to avoid duplicate distincts ("US" vs "us").
Right-size granularity
- Filter rows you truly do not need (e.g., old inactive entities, test data).
- Create aggregate tables for common queries (e.g., daily product sales) while keeping a detailed fact for drill-through.
Shift calculations out of the model
- Prefer measures over calculated columns for on-the-fly logic.
- Precompute stable derived fields in ETL so the BI model stores less.
Model structure
- Use a star schema: narrow fact + descriptive dimensions.
- Avoid many-to-many unless it is necessary and well-understood.

Quick checklist to apply today

[ ] List every column used by visuals/measures/filters; drop the rest.
[ ] Replace GUID/text keys with INT surrogate keys.
[ ] Convert datetime to date where possible.
[ ] Round or bucket high-cardinality numeric columns.
[ ] Move textual attributes into dimension tables.
[ ] Add aggregate tables for frequent rollups.
[ ] Remove historical rows not needed for analysis.

Worked examples

Example 1: Replacing text keys and trimming text

Before: fact_sales has CustomerId as a 36-char GUID and ProductName as text. Both are in the fact.

After: Create dim_customer (INT surrogate key) and dim_product; replace GUID and name with their INT keys in fact_sales. Keep text in dimensions only.

Result: Fact table shrinks significantly; dimensions handle descriptive text once.

Example 2: Reducing datetime precision

Before: OrderTimestamp stores second-level precision but reports filter by date.

After: Split into OrderDate (DATE) and OrderTime (TIME) only if needed. If time is not used, keep just OrderDate.

Result: Fewer distinct values and smaller dictionaries.

Example 3: Aggregation table

Before: Queries sum over 500M fact rows by day and product.

After: Build fact_sales_daily (one row per product, per day). Point summary visuals to the aggregate table; keep detailed fact for drill-through.

Result: Scans are against a much smaller table, speeding up visuals and reducing memory pressure.

Practice exercises

Complete the exercises below. Then check your answers in the provided solutions.

Exercise 1 preview

Identify which columns to drop, convert, split into dimensions, and which aggregates to introduce to reduce model size while keeping needed analytics.

Exercise 2 preview

Write SQL to implement surrogate keys, type changes, rounding, a dimension split, and a daily aggregate table.

[ ] Do Exercise 1: propose a reduction plan
[ ] Do Exercise 2: implement with SQL

Common mistakes and how to self-check

Mistake: Keeping long text in the fact table.
Fix: Move to a dimension; keep only integer keys in the fact.
Mistake: Using datetime when date is enough.
Fix: Convert to date; split time only if required.
Mistake: Calculated columns for dynamic logic.
Fix: Use measures or precompute in ETL to avoid inflating the model.
Mistake: High precision decimals for currency.
Fix: Round to cents or store as integer minor units (e.g., cents).
Mistake: Not removing unused columns.
Fix: Audit dependencies; drop columns not used in visuals/relationships.
Mistake: Over-aggregating and losing drill detail.
Fix: Keep a detailed fact for drill-through and a separate aggregate table for visuals.

Self-check routine

List every column: is it used in a visual, filter, join, or measure? If no, remove.
For each remaining column: can the type be smaller (INT, DATE, BOOLEAN)?
Scan top 10 high-cardinality columns: can any be bucketed or moved to a dimension?
Identify top 3 summary queries: create aggregate tables for them.
Refresh and compare: model size, refresh duration, and time-to-first-visual.

Practical projects

Retail Sales Slim-Down: Reduce a 200M-row sales model by moving all text attributes to dimensions, converting GUIDs to INT, and introducing a daily aggregate table. Measure size and refresh time before vs after.
Support Tickets Cleanup: For a ticketing model, replace free-text statuses with a dimension, bucket resolution times into ranges, and remove attachments/blobs. Validate no visual breaks.
Finance Precision Review: Convert currency decimals to two places or store as integer cents. Prove no material metric change and document performance impact.

Mini challenge

You inherit a fact table with 150 columns, including email, phone, notes, and JSON blobs. You must deliver a dashboard with daily revenue, by product and region, with drill to customer level. In one paragraph, outline your reduction plan: what to drop, what to dimensionalize, which aggregates to create, and any type changes. Keep it specific.

Who this is for

BI Developers, Analytics Engineers, and Data Analysts who build models for dashboards and need faster refresh and query performance.

Prerequisites

Basic SQL (SELECT, JOIN, GROUP BY)
Understanding of star schema (facts and dimensions)
Familiarity with your BI tool's data types and relationships

Learning path

Review model usage: visuals, filters, measures
Apply data type optimization and column pruning
Split text to dimensions; add surrogate keys
Introduce aggregate tables for common rollups
Validate metrics, refresh, and performance

Next steps

Complete the exercises below and take the Quick Test
Run your self-check routine on a real model at work
Document changes and their impact for your team

Progress & test

The quick test is available to everyone. If you log in, your exercise and test progress will be saved automatically.

Menu

Data Model Size Reduction

Table of Contents

Why this matters

Concept explained simply

Core techniques (actionable)

Worked examples

Example 1: Replacing text keys and trimming text

Example 2: Reducing datetime precision

Example 3: Aggregation table

Practice exercises

Common mistakes and how to self-check

Practical projects

Mini challenge

Who this is for

Prerequisites

Learning path

Next steps

Progress & test

Practice Exercises

Design a size-reduction plan for a sales model

Instructions

Expected Output

Implement reductions with SQL

Data Model Size Reduction — Quick Test

Have questions about Data Model Size Reduction?

AI Assistant