Why this matters
As a BI Developer, you turn data into dashboards and reports. Slow queries mean slow dashboards, timeouts during stakeholder demos, and stressed refresh windows. Knowing the basics of query performance helps you deliver responsive reports, reduce warehouse costs, and keep SLAs.
- Make daily dashboard refreshes complete on time.
- Keep ad-hoc exploration snappy so analysts don’t wait minutes per click.
- Cut compute costs by scanning fewer rows and columns.
Who this is for
- BI Developers, Analytics Engineers, and SQL users building dashboards, extracts, and scheduled jobs.
Prerequisites
- Comfortable writing SELECT with WHERE, JOIN, GROUP BY, ORDER BY
- Basic understanding of star schema (fact + dimensions)
- Can read simple query plans (optional but helpful)
Learning path
- Understand the performance mindset (scan less, filter early, return fewer columns).
- Learn sargability and indexing basics.
- Practice with star-schema queries (joins + filters + aggregates).
- Use EXPLAIN to verify improvements.
- Apply to a small project and measure time saved.
Concept explained simply
Most SQL engines spend time reading data. If you can make the engine read fewer rows and fewer columns, it usually runs much faster. You do that by:
- Filtering early with sargable conditions (search-argument-able).
- Joining efficiently (correct keys, selective filters first).
- Selecting only needed columns (avoid SELECT *).
- Aggregating after you reduce the data.
What is sargability?
A condition is sargable when the database can use an index or partition pruning to find rows directly. Example: WHERE sale_date BETWEEN '2024-01-01' AND '2024-01-31' is sargable; WHERE DATE(sale_date) = '2024-01-15' is not because it applies a function to the column.
Mental model
Imagine a conveyor belt of rows. Each operator (filter, join, group) touches those rows. Your job: shrink the belt early. Use precise WHERE clauses, avoid functions on the left-hand side, and choose columns wisely so the system can skip large chunks of data.
Core techniques that matter
- Filter early and sargably: Avoid wrapping columns in functions inside WHERE.
- Return only needed columns: Prevent wide scans.
- Join on keys with supporting indexes or partition alignment.
- Aggregate after filtering. Use DISTINCT carefully.
- Use approximate functions when acceptable (e.g., approximate count) for exploratory BI.
- Use EXPLAIN to confirm row counts drop where expected.
Index selection tips
- Create indexes that match frequent WHERE and JOIN columns.
- Put the most selective column first in composite indexes (for B-Tree engines).
- For columnar warehouses, prefer sort/cluster keys and partitioning over many secondary indexes.
Worked examples
Assume a star schema:
fact_sales(sale_id, date_id, product_id, store_id, customer_id, qty, unit_price, revenue)
dim_date(date_id, date, year, month)
dim_product(product_id, category, brand, sku)
dim_store(store_id, region, country)
Example 1 — Sargable date filters and avoiding SELECT *
Goal: Revenue by category for the last 90 days.
Slow approach:
SELECT *
FROM fact_sales f
JOIN dim_date d ON f.date_id = d.date_id
JOIN dim_product p ON f.product_id = p.product_id
WHERE DATE(d.date) >= CURRENT_DATE - INTERVAL '90 day'
GROUP BY p.category;
Issues: SELECT * is wide; DATE(d.date) blocks index/partition pruning; GROUP BY without selecting only needed columns. Faster:
-- Only needed columns, sargable filter
SELECT p.category, SUM(f.revenue) AS revenue_90d
FROM fact_sales f
JOIN dim_date d ON f.date_id = d.date_id
JOIN dim_product p ON f.product_id = p.product_id
WHERE d.date BETWEEN CURRENT_DATE - INTERVAL '90 day' AND CURRENT_DATE
GROUP BY p.category;
Optional supporting structures
-- Row-store example
CREATE INDEX ix_dim_date_date ON dim_date(date);
CREATE INDEX ix_fact_sales_product ON fact_sales(product_id);
In columnar warehouses, prefer date partitioning and clustering keys.
Example 2 — Push filters before joins
Goal: Top 10 brands by revenue this month.
Better to reduce early:
WITH recent AS (
SELECT f.product_id, f.revenue
FROM fact_sales f
JOIN dim_date d ON f.date_id = d.date_id
WHERE d.year = EXTRACT(YEAR FROM CURRENT_DATE)
AND d.month = EXTRACT(MONTH FROM CURRENT_DATE)
)
SELECT p.brand, SUM(r.revenue) AS rev
FROM recent r
JOIN dim_product p ON r.product_id = p.product_id
GROUP BY p.brand
ORDER BY rev DESC
LIMIT 10;
We filter to current month before joining to product, reducing rows early.
Why not compute brand first?
If product is huge and current month reduces fact rows a lot, filter by date first to shrink the intermediate set. Always confirm with EXPLAIN to see row counts.
Example 3 — Avoid functions on columns in WHERE
Compare:
-- Not sargable
WHERE LOWER(p.brand) = 'acme'
-- Sargable (pre-normalize or store canonical form)
WHERE p.brand = 'Acme'
If you must case-fold, consider storing a normalized column (brand_norm) and index it, or use a case-insensitive collation.
Example 4 — EXISTS vs IN for large subqueries
-- Often faster with large candidate sets
SELECT COUNT(*)
FROM fact_sales f
WHERE EXISTS (
SELECT 1
FROM dim_store s
WHERE s.store_id = f.store_id
AND s.region = 'EMEA'
);
EXISTS lets the engine stop after the first match per row.
Example 5 — COUNT(DISTINCT) and approximations
-- Exact but heavy
SELECT COUNT(DISTINCT customer_id)
FROM fact_sales
WHERE revenue > 0;
-- Acceptable approximation for BI exploration (engine-dependent)
-- SELECT APPROX_COUNT_DISTINCT(customer_id) ...
Approximations can be much faster for exploratory dashboards when exactness is not required. Validate with stakeholders.
EXPLAIN basics
Use EXPLAIN to verify improvements.
- Check estimated vs actual rows at each step (if available).
- Ensure filters reduce rows before big joins.
- Confirm index/partition pruning is used.
Mini checklist for reading plans
- Filters happen early
- Join on keys with stats/index/cluster usage
- No unexpected full scans for small lookups
Exercises
These mirror the tasks in the Exercises panel. Do them here first. Everyone can take the exercises and quick test; only logged-in users get saved progress.
Exercise 1 — Rewrite for sargability and narrower scans
You have this slow query:
SELECT *
FROM fact_sales f
JOIN dim_product p ON p.product_id = f.product_id
JOIN dim_date d ON d.date_id = f.date_id
WHERE DATE(d.date) >= CURRENT_DATE - INTERVAL '30 day'
AND LOWER(p.category) = 'electronics'
GROUP BY p.category;
Task:
- Rewrite it to be sargable and to return only needed columns.
- Suggest 1–2 supporting indexes or clustering keys (pick row-store or columnar, your choice).
Hints
- Remove functions from the left side of predicates.
- Select only the columns you need.
- Filter before grouping.
Exercise 2 — Pick useful indexes/sort keys
Given these frequent queries, propose indexes or sort/cluster keys:
-- Q1
SELECT p.brand, SUM(f.revenue)
FROM fact_sales f
JOIN dim_product p ON p.product_id = f.product_id
JOIN dim_date d ON d.date_id = f.date_id
WHERE d.date BETWEEN CURRENT_DATE - INTERVAL '90 day' AND CURRENT_DATE
GROUP BY p.brand;
-- Q2
SELECT COUNT(*)
FROM fact_sales f
WHERE f.store_id = 1001
AND f.date_id BETWEEN 20240101 AND 20240131;
Hints
- Think about common filters and join keys.
- For columnar warehouses, date partitioning + clustering on (date_id, product_id) often helps.
Common mistakes and self-check
- Using functions on columns in WHERE (kills sargability).
- SELECT * in production dashboards (scans too much).
- Joining large tables before filtering (explodes rows early).
- COUNT(DISTINCT) everywhere when approximate or surrogate counts suffice.
- No EXPLAIN verification.
Self-check
- Did I remove non-sargable predicates?
- Did I return only the needed columns?
- Do filters reduce rows before big joins?
- Did I check the plan and row counts?
Practical projects
- Speed up a monthly sales dashboard: baseline query time, apply sargable filters, drop SELECT *, re-measure.
- Create a small summary table for last 90 days revenue by category and compare dashboard refresh time vs on-the-fly aggregation.
- Write a runbook: common slow patterns and their faster equivalents for your team.
Mini challenge
Given a report that times out when filtering by brand and month, propose two changes to the SQL and one change to the data model (index/partition/cluster) to make it complete under 10 seconds. Write your reasoning in 3 bullet points.
Next steps
- Practice on your real dashboards: add one sargable fix and measure results.
- Learn window functions performance trade-offs for BI.
- Explore partitioning and clustering strategies for your warehouse.
Quick Test
Take the quick test below to check your understanding. Everyone can take it; only logged-in users get saved progress.