How to learn Aggregation Tables Concepts for Performance Optimization in BI Developer for free

Note: The Quick Test is available to everyone. Log in to save your progress.

Why this matters

BI dashboards slow down when every chart scans billions of raw rows. Aggregation tables pre-compute common summaries (daily, monthly, by product, by region), so dashboards query thousands of rows instead of billions. This cuts costs and latency, stabilizes SLAs, and keeps stakeholders confident in your data.

Real task: Make a sales dashboard load under 2 seconds during peak hours.
Real task: Reduce warehouse costs by avoiding repeated large scans for the same metrics.
Real task: Provide consistent definitions for KPIs across multiple reports.

Concept explained simply

An aggregation table is a summary table built at a higher grain (e.g., daily by product) than your raw fact table (e.g., every transaction). You choose grouping keys and measures, compute them ahead of time, and read from this smaller table in dashboards.

Mental model

Think of a pyramid: at the base is detailed facts; above are daily, weekly, monthly rollups; the higher you go, the fewer rows and the faster the queries.
Trade-off: speed vs flexibility. Higher aggregation gives faster reads but less drill-down detail.
Rule of thumb: design aggregates that answer 80% of dashboard questions without touching raw facts.

Key building blocks

Grain: the exact level of detail per row (e.g., one row per day per product per country).
Grouping keys: dimensions you group by (date, product_id, country, channel).
Measures: pre-computed metrics (sum_revenue, order_count, avg_price, distinct_buyers).
Time windows: daily, weekly, monthly; align with reporting needs.
Storage type: table or materialized view; both are fine if refreshed reliably.
Refresh strategy: full refresh daily vs incremental upserts by date partition.
Naming: include grain and keys in the name (e.g., agg_sales_d_product_country).

When to create an aggregation table

A chart reads the same measure groups repeatedly (e.g., daily sales by product).
Queries scan large fact tables with filters limited to popular dimensions.
Dashboards time out or cost spikes appear in query logs.
You need consistent metric logic shared across reports.

Worked examples

Example 1: Daily sales by product

Goal: Speed up charts that show daily revenue by product and country.

-- Grain: 1 row per day, product_id, country
-- Measures: sum_revenue, order_count
CREATE TABLE agg_sales_d_product_country AS
SELECT
  order_date AS day,
  product_id,
  country,
  SUM(revenue) AS sum_revenue,
  COUNT(*) AS order_count
FROM fact_sales
GROUP BY order_date, product_id, country;

-- Dashboard query becomes:
SELECT day, product_id, country, sum_revenue
FROM agg_sales_d_product_country
WHERE day BETWEEN '2025-01-01' AND '2025-01-31' AND country = 'US';

Effect: From billions of rows to thousands; typical 50-100x speedup.

Example 2: Monthly active users (MAU) by region

Distinct counts are expensive. Pre-aggregating MAU turns heavy queries into quick lookups.

-- Grain: 1 row per month, region
-- Measure: distinct active users
CREATE TABLE agg_mau_m_region AS
SELECT
  DATE_TRUNC('month', activity_ts) AS month,
  region,
  COUNT(DISTINCT user_id) AS mau
FROM fact_events
GROUP BY 1, 2;

Use this for tiles that show MAU trends without re-counting distinct users.

Example 3: Conversion rate by day and channel

Precompute both numerator and denominator to avoid SUM-of-ratios errors.

-- Grain: 1 row per day, channel
-- Measures: signups, sessions, plus the derived rate
CREATE TABLE agg_conv_d_channel AS
SELECT
  DATE_TRUNC('day', session_ts) AS day,
  channel,
  COUNT_IF(event = 'signup') AS signups,
  COUNT_IF(event = 'session') AS sessions,
  SAFE_DIVIDE(COUNT_IF(event = 'signup'), NULLIF(COUNT_IF(event = 'session'), 0)) AS conv_rate
FROM fact_web
GROUP BY 1, 2;

This ensures consistent conversion rate across dashboards.

Design checklist

Define the grain in one sentence: One row per [time] per [dimension1] per [dimension2].
List grouping keys used by the slowest dashboards.
Precompute stable measures (sums, counts, distincts) used repeatedly.
Validate row counts vs source (no unexpected duplication).
Choose refresh window (e.g., last 7 days incremental, older data frozen).
Document naming, grain, keys, and refresh schedule in the model doc.

Exercises

Do these now. They mirror the interactive exercises below and in the Exercises panel.

Exercise 1: Design a daily sales aggregate

From a transactional fact_sales(order_ts, product_id, country, channel, quantity, revenue), design an aggregation table to speed up daily revenue charts by product category and channel.

Pick the grain and grouping keys.
List measures.
Sketch SQL to create/populate it.
Describe refresh cadence.

Hints

Favor daily grain; category can be joined from a dim table at build time.
Precompute both sums and counts where useful.
Incremental loads by day reduce costs.

Exercise 2: Choose aggregates for a dashboard

Dashboard tiles needed: (a) Revenue last 7 days by country, (b) Top 10 products last 30 days, (c) Daily conversion rate by channel last 90 days.

Propose 1–2 aggregate tables that cover all tiles.
Define columns and measures for each.
Explain how each tile would query them.

Hints

Fewer aggregates that serve multiple tiles is better.
Top 10 can be computed on a pre-aggregated daily or monthly table.
Avoid computing ratios on the fly from already-aggregated ratios.

Self-check before moving on

Can you state the grain precisely for each proposed aggregate?
Do measures avoid double-counting when joined to dims?
Is there a clear incremental refresh plan?

Common mistakes and how to self-check

Unclear grain: If you cannot complete the sentence 'one row per ...', stop and clarify.
SUM of SUM: Re-aggregating pre-aggregated ratios leads to errors. Store components (numerator, denominator) with the ratio.
Too many keys: Adding rarely-used dimensions explodes row counts. Keep keys minimal.
No incremental strategy: Full refreshes on large tables are costly. Use partitioned incremental updates.
Stale data: Define freshness targets per dashboard (e.g., daily by 06:00).
Metric drift: If logic exists in multiple places, centralize it in the aggregate build.

Quick self-audit

Compare row counts vs expected (e.g., days × products × countries).
Recompute key measures from raw facts for a small date range and compare results.
Load 3–5 dashboard queries and confirm they hit the aggregate via EXPLAIN/Query plan.

Practical projects

Project 1: Build a daily revenue by product-country aggregate and swap 3 dashboard charts to use it. Measure latency and cost before/after.
Project 2: Create a monthly active users by region aggregate with incremental refresh of the last 2 months.
Project 3: Implement a conversion aggregate with numerator/denominator and validate against raw facts for a sample month.

Who this is for

BI Developers and Analytics Engineers optimizing dashboards and semantic layers.
Data Analysts needing faster, reliable KPI reads.

Prerequisites

Comfortable with SQL GROUP BY, window functions basics, and joins.
Understanding of facts, dimensions, and time-based partitioning concepts.

Learning path

Learn grains and grouping keys.
Design measures and handle ratios safely.
Define refresh strategies (incremental vs full).
Implement 1–2 aggregates and measure impact.
Generalize naming and documentation patterns for reuse.

Next steps

Do the exercises below and take the Quick Test.
Apply one aggregate to a real dashboard and benchmark performance.
Document grain, keys, measures, and refresh for your team.

Mini challenge

Your marketing dashboard times out when filtering last 90 days by channel and country. Propose a single aggregate that would fix 80% of queries. Define its grain, keys, and at least 3 measures. Keep row counts reasonable.

Menu

Aggregation Tables Concepts

Table of Contents