Query Performance Basics For BI

Learn Query Performance Basics For BI for free with explanations, exercises, and a quick test (for BI Developer).

Published: December 24, 2025 | Updated: December 24, 2025

Why this matters

As a BI Developer, you turn data into dashboards and reports. Slow queries mean slow dashboards, timeouts during stakeholder demos, and stressed refresh windows. Knowing the basics of query performance helps you deliver responsive reports, reduce warehouse costs, and keep SLAs.

Make daily dashboard refreshes complete on time.
Keep ad-hoc exploration snappy so analysts don’t wait minutes per click.
Cut compute costs by scanning fewer rows and columns.

Who this is for

- BI Developers, Analytics Engineers, and SQL users building dashboards, extracts, and scheduled jobs.

Prerequisites

Comfortable writing SELECT with WHERE, JOIN, GROUP BY, ORDER BY
Basic understanding of star schema (fact + dimensions)
Can read simple query plans (optional but helpful)

Learning path

Understand the performance mindset (scan less, filter early, return fewer columns).
Learn sargability and indexing basics.
Practice with star-schema queries (joins + filters + aggregates).
Use EXPLAIN to verify improvements.
Apply to a small project and measure time saved.

Concept explained simply

Most SQL engines spend time reading data. If you can make the engine read fewer rows and fewer columns, it usually runs much faster. You do that by:

Filtering early with sargable conditions (search-argument-able).
Joining efficiently (correct keys, selective filters first).
Selecting only needed columns (avoid SELECT *).
Aggregating after you reduce the data.

What is sargability?

A condition is sargable when the database can use an index or partition pruning to find rows directly. Example: WHERE sale_date BETWEEN '2024-01-01' AND '2024-01-31' is sargable; WHERE DATE(sale_date) = '2024-01-15' is not because it applies a function to the column.

Mental model

Imagine a conveyor belt of rows. Each operator (filter, join, group) touches those rows. Your job: shrink the belt early. Use precise WHERE clauses, avoid functions on the left-hand side, and choose columns wisely so the system can skip large chunks of data.

Core techniques that matter

Filter early and sargably: Avoid wrapping columns in functions inside WHERE.
Return only needed columns: Prevent wide scans.
Join on keys with supporting indexes or partition alignment.
Aggregate after filtering. Use DISTINCT carefully.
Use approximate functions when acceptable (e.g., approximate count) for exploratory BI.
Use EXPLAIN to confirm row counts drop where expected.

Index selection tips

Create indexes that match frequent WHERE and JOIN columns.
Put the most selective column first in composite indexes (for B-Tree engines).
For columnar warehouses, prefer sort/cluster keys and partitioning over many secondary indexes.

Worked examples

Assume a star schema:

fact_sales(sale_id, date_id, product_id, store_id, customer_id, qty, unit_price, revenue)
dim_date(date_id, date, year, month)
dim_product(product_id, category, brand, sku)
dim_store(store_id, region, country)

Example 1 — Sargable date filters and avoiding SELECT *

Goal: Revenue by category for the last 90 days.

Slow approach:

SELECT *
FROM fact_sales f
JOIN dim_date d ON f.date_id = d.date_id
JOIN dim_product p ON f.product_id = p.product_id
WHERE DATE(d.date) >= CURRENT_DATE - INTERVAL '90 day'
GROUP BY p.category;

Issues: SELECT * is wide; DATE(d.date) blocks index/partition pruning; GROUP BY without selecting only needed columns. Faster:

-- Only needed columns, sargable filter
SELECT p.category, SUM(f.revenue) AS revenue_90d
FROM fact_sales f
JOIN dim_date d   ON f.date_id = d.date_id
JOIN dim_product p ON f.product_id = p.product_id
WHERE d.date BETWEEN CURRENT_DATE - INTERVAL '90 day' AND CURRENT_DATE
GROUP BY p.category;

Optional supporting structures

-- Row-store example
CREATE INDEX ix_dim_date_date ON dim_date(date);
CREATE INDEX ix_fact_sales_product ON fact_sales(product_id);

In columnar warehouses, prefer date partitioning and clustering keys.

Example 2 — Push filters before joins

Goal: Top 10 brands by revenue this month.

Better to reduce early:

WITH recent AS (
  SELECT f.product_id, f.revenue
  FROM fact_sales f
  JOIN dim_date d ON f.date_id = d.date_id
  WHERE d.year = EXTRACT(YEAR FROM CURRENT_DATE)
    AND d.month = EXTRACT(MONTH FROM CURRENT_DATE)
)
SELECT p.brand, SUM(r.revenue) AS rev
FROM recent r
JOIN dim_product p ON r.product_id = p.product_id
GROUP BY p.brand
ORDER BY rev DESC
LIMIT 10;

We filter to current month before joining to product, reducing rows early.

Why not compute brand first?

If product is huge and current month reduces fact rows a lot, filter by date first to shrink the intermediate set. Always confirm with EXPLAIN to see row counts.

Example 3 — Avoid functions on columns in WHERE

Compare:

-- Not sargable
WHERE LOWER(p.brand) = 'acme'

-- Sargable (pre-normalize or store canonical form)
WHERE p.brand = 'Acme'

If you must case-fold, consider storing a normalized column (brand_norm) and index it, or use a case-insensitive collation.

Example 4 — EXISTS vs IN for large subqueries

-- Often faster with large candidate sets
SELECT COUNT(*)
FROM fact_sales f
WHERE EXISTS (
  SELECT 1
  FROM dim_store s
  WHERE s.store_id = f.store_id
    AND s.region = 'EMEA'
);

EXISTS lets the engine stop after the first match per row.

Example 5 — COUNT(DISTINCT) and approximations

-- Exact but heavy
SELECT COUNT(DISTINCT customer_id)
FROM fact_sales
WHERE revenue > 0;

-- Acceptable approximation for BI exploration (engine-dependent)
-- SELECT APPROX_COUNT_DISTINCT(customer_id) ...

Approximations can be much faster for exploratory dashboards when exactness is not required. Validate with stakeholders.

EXPLAIN basics

Use EXPLAIN to verify improvements.

Check estimated vs actual rows at each step (if available).
Ensure filters reduce rows before big joins.
Confirm index/partition pruning is used.

Mini checklist for reading plans

Filters happen early
Join on keys with stats/index/cluster usage
No unexpected full scans for small lookups

Exercises

These mirror the tasks in the Exercises panel. Do them here first. Everyone can take the exercises and quick test; only logged-in users get saved progress.

Exercise 1 — Rewrite for sargability and narrower scans

You have this slow query:

SELECT *
FROM fact_sales f
JOIN dim_product p ON p.product_id = f.product_id
JOIN dim_date d ON d.date_id = f.date_id
WHERE DATE(d.date) >= CURRENT_DATE - INTERVAL '30 day'
  AND LOWER(p.category) = 'electronics'
GROUP BY p.category;

Task:

Rewrite it to be sargable and to return only needed columns.
Suggest 1–2 supporting indexes or clustering keys (pick row-store or columnar, your choice).

Hints

Remove functions from the left side of predicates.
Select only the columns you need.
Filter before grouping.

Exercise 2 — Pick useful indexes/sort keys

Given these frequent queries, propose indexes or sort/cluster keys:

-- Q1
SELECT p.brand, SUM(f.revenue)
FROM fact_sales f
JOIN dim_product p ON p.product_id = f.product_id
JOIN dim_date d ON d.date_id = f.date_id
WHERE d.date BETWEEN CURRENT_DATE - INTERVAL '90 day' AND CURRENT_DATE
GROUP BY p.brand;

-- Q2
SELECT COUNT(*)
FROM fact_sales f
WHERE f.store_id = 1001
  AND f.date_id BETWEEN 20240101 AND 20240131;

Hints

Think about common filters and join keys.
For columnar warehouses, date partitioning + clustering on (date_id, product_id) often helps.

Common mistakes and self-check

Using functions on columns in WHERE (kills sargability).
SELECT * in production dashboards (scans too much).
Joining large tables before filtering (explodes rows early).
COUNT(DISTINCT) everywhere when approximate or surrogate counts suffice.
No EXPLAIN verification.

Self-check

Did I remove non-sargable predicates?
Did I return only the needed columns?
Do filters reduce rows before big joins?
Did I check the plan and row counts?

Practical projects

Speed up a monthly sales dashboard: baseline query time, apply sargable filters, drop SELECT *, re-measure.
Create a small summary table for last 90 days revenue by category and compare dashboard refresh time vs on-the-fly aggregation.
Write a runbook: common slow patterns and their faster equivalents for your team.

Mini challenge

Given a report that times out when filtering by brand and month, propose two changes to the SQL and one change to the data model (index/partition/cluster) to make it complete under 10 seconds. Write your reasoning in 3 bullet points.

Next steps

Practice on your real dashboards: add one sargable fix and measure results.
Learn window functions performance trade-offs for BI.
Explore partitioning and clustering strategies for your warehouse.

Quick Test

Take the quick test below to check your understanding. Everyone can take it; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Rewrite the query to avoid functions on columns in WHERE, reduce selected columns, and group efficiently. Propose 1–2 supporting indexes or clustering keys.

SELECT *
FROM fact_sales f
JOIN dim_product p ON p.product_id = f.product_id
JOIN dim_date d ON d.date_id = f.date_id
WHERE DATE(d.date) >= CURRENT_DATE - INTERVAL '30 day'
  AND LOWER(p.category) = 'electronics'
GROUP BY p.category;

Expected Output

A sargable query that selects only category and aggregated revenue for the last 30 days, plus 1–2 reasonable index/partition/cluster suggestions.