Why this matters
As a BI Developer, model size directly impacts refresh time, memory usage, and query speed. Smaller models load dashboards faster, reduce costs on shared capacity, and make DAX/SQL engines scan less data. Typical tasks include trimming wide fact tables, choosing efficient data types, offloading high-cardinality text to dimensions, and building aggregate tables for heavy reports.
- [ ] Speed up refreshes from 60 min to under 15 by reducing model size
- [ ] Fit your model into memory limits (e.g., shared capacity or on-prem servers)
- [ ] Improve end-user experience with faster visuals and fewer timeouts
Concept explained simply
Modern BI engines are mostly columnar and dictionary-encoded. Each column stores a dictionary of unique values and references to them. The bigger the dictionary or the more segments to store, the larger and slower your model.
Mental model
Think of each column as a glossary:
- The glossary (dictionary) lists every unique value once.
- Each row stores only a small pointer to the glossary entry.
- If your column has many unique values (high cardinality), the glossary grows and so does the storage footprint.
Therefore: reduce the number of unique values, reduce the number of columns, and use compact data types.
Core techniques (actionable)
- Drop and hide
- Remove columns not used by visuals, measures, filters, or relationships.
- Hide columns that are only needed for relationships from report view.
- Use compact data types
- Prefer integers for keys instead of GUIDs or long text.
- Use date instead of datetime if time is not needed.
- Reduce decimals to the necessary precision (e.g., 2 decimal places).
- Use booleans/flags (0/1) instead of "Yes/No" text.
- Reduce cardinality
- Split descriptive text into a dimension table; keep only the integer key in the fact.
- Bucket or round continuous values when appropriate (e.g., ages into ranges).
- Trim whitespace and normalize casing to avoid duplicate distincts ("US" vs "us").
- Right-size granularity
- Filter rows you truly do not need (e.g., old inactive entities, test data).
- Create aggregate tables for common queries (e.g., daily product sales) while keeping a detailed fact for drill-through.
- Shift calculations out of the model
- Prefer measures over calculated columns for on-the-fly logic.
- Precompute stable derived fields in ETL so the BI model stores less.
- Model structure
- Use a star schema: narrow fact + descriptive dimensions.
- Avoid many-to-many unless it is necessary and well-understood.
Quick checklist to apply today
- [ ] List every column used by visuals/measures/filters; drop the rest.
- [ ] Replace GUID/text keys with INT surrogate keys.
- [ ] Convert datetime to date where possible.
- [ ] Round or bucket high-cardinality numeric columns.
- [ ] Move textual attributes into dimension tables.
- [ ] Add aggregate tables for frequent rollups.
- [ ] Remove historical rows not needed for analysis.
Worked examples
Example 1: Replacing text keys and trimming text
Before: fact_sales has CustomerId as a 36-char GUID and ProductName as text. Both are in the fact.
After: Create dim_customer (INT surrogate key) and dim_product; replace GUID and name with their INT keys in fact_sales. Keep text in dimensions only.
Result: Fact table shrinks significantly; dimensions handle descriptive text once.
Example 2: Reducing datetime precision
Before: OrderTimestamp stores second-level precision but reports filter by date.
After: Split into OrderDate (DATE) and OrderTime (TIME) only if needed. If time is not used, keep just OrderDate.
Result: Fewer distinct values and smaller dictionaries.
Example 3: Aggregation table
Before: Queries sum over 500M fact rows by day and product.
After: Build fact_sales_daily (one row per product, per day). Point summary visuals to the aggregate table; keep detailed fact for drill-through.
Result: Scans are against a much smaller table, speeding up visuals and reducing memory pressure.
Practice exercises
Complete the exercises below. Then check your answers in the provided solutions.
Exercise 1 preview
Identify which columns to drop, convert, split into dimensions, and which aggregates to introduce to reduce model size while keeping needed analytics.
Exercise 2 preview
Write SQL to implement surrogate keys, type changes, rounding, a dimension split, and a daily aggregate table.
- [ ] Do Exercise 1: propose a reduction plan
- [ ] Do Exercise 2: implement with SQL
Common mistakes and how to self-check
- Mistake: Keeping long text in the fact table.
Fix: Move to a dimension; keep only integer keys in the fact. - Mistake: Using datetime when date is enough.
Fix: Convert to date; split time only if required. - Mistake: Calculated columns for dynamic logic.
Fix: Use measures or precompute in ETL to avoid inflating the model. - Mistake: High precision decimals for currency.
Fix: Round to cents or store as integer minor units (e.g., cents). - Mistake: Not removing unused columns.
Fix: Audit dependencies; drop columns not used in visuals/relationships. - Mistake: Over-aggregating and losing drill detail.
Fix: Keep a detailed fact for drill-through and a separate aggregate table for visuals.
Self-check routine
- List every column: is it used in a visual, filter, join, or measure? If no, remove.
- For each remaining column: can the type be smaller (INT, DATE, BOOLEAN)?
- Scan top 10 high-cardinality columns: can any be bucketed or moved to a dimension?
- Identify top 3 summary queries: create aggregate tables for them.
- Refresh and compare: model size, refresh duration, and time-to-first-visual.
Practical projects
- Retail Sales Slim-Down: Reduce a 200M-row sales model by moving all text attributes to dimensions, converting GUIDs to INT, and introducing a daily aggregate table. Measure size and refresh time before vs after.
- Support Tickets Cleanup: For a ticketing model, replace free-text statuses with a dimension, bucket resolution times into ranges, and remove attachments/blobs. Validate no visual breaks.
- Finance Precision Review: Convert currency decimals to two places or store as integer cents. Prove no material metric change and document performance impact.
Mini challenge
You inherit a fact table with 150 columns, including email, phone, notes, and JSON blobs. You must deliver a dashboard with daily revenue, by product and region, with drill to customer level. In one paragraph, outline your reduction plan: what to drop, what to dimensionalize, which aggregates to create, and any type changes. Keep it specific.
Who this is for
BI Developers, Analytics Engineers, and Data Analysts who build models for dashboards and need faster refresh and query performance.
Prerequisites
- Basic SQL (SELECT, JOIN, GROUP BY)
- Understanding of star schema (facts and dimensions)
- Familiarity with your BI tool's data types and relationships
Learning path
- Review model usage: visuals, filters, measures
- Apply data type optimization and column pruning
- Split text to dimensions; add surrogate keys
- Introduce aggregate tables for common rollups
- Validate metrics, refresh, and performance
Next steps
- Complete the exercises below and take the Quick Test
- Run your self-check routine on a real model at work
- Document changes and their impact for your team
Progress & test
The quick test is available to everyone. If you log in, your exercise and test progress will be saved automatically.