Why this matters
As a Data Platform Engineer, slow queries delay pipelines, break SLAs, and inflate compute costs. Real tasks include:
- Speeding up BI dashboards that time out during peak hours.
- Reducing ETL model runtimes from 40 minutes to under 10.
- Cutting warehouse spend by fixing queries that scan terabytes unnecessarily.
Mastering a repeatable approach lets you respond fast and prevent regressions.
Concept explained simply
Queries are slow for two big reasons:
- Too much work: reading/scanning many bytes, shuffling/sorting large data, joining huge intermediates.
- Wrong work: poor plans caused by functions on columns, bad join conditions, stale stats, or non-selective filters.
Mental model
Picture your query as a funnel:
- Project less: avoid SELECT *. Keep only required columns.
- Filter early: make filters sargable (index/partition friendly) so the engine skips data.
- Join smart: join small-to-large, use correct keys, and prefer broadcasting tiny tables.
- Aggregate/window late but efficiently: pre-aggregate where possible to shrink inputs.
Always measure with EXPLAIN/PROFILE to see where time and bytes are spent.
A practical checklist
- [ ] Confirm it is slow: compare runtime to historical baseline and note input sizes.
- [ ] Get a plan/profile: run EXPLAIN/EXPLAIN ANALYZE or query profile; record operators with highest time/bytes.
- [ ] Validate filters: avoid functions on columns (e.g., DATE(ts)); use ranges and correct data types.
- [ ] Reduce columns: replace SELECT * with only needed fields.
- [ ] Enable pruning: ensure filters hit partition/sort/cluster keys. Refresh/analyze statistics if estimates are off.
- [ ] Fix joins: correct keys, remove accidental cross joins, broadcast small tables, pre-aggregate large ones.
- [ ] Check skew: identify hot keys; mitigate with salting or adaptive/skew joins if supported.
- [ ] Tame heavy ops: rewrite window functions, decompose complex CTEs, avoid UDFs in tight loops.
- [ ] Soften sorts: avoid global sorts; use TOP-K patterns or partial sorts when possible.
- [ ] Check concurrency: if slow only at peak, consider scaling resources or dedicated queues/warehouses.
Worked examples
Example 1 — Non-sargable date filter causes full scan
-- Before (slow): scans partitions fully
overview as
SELECT user_id, created_at, revenue
FROM fact_events
WHERE DATE(created_at) = '2025-05-10';
What the profile shows
Large scan with poor pruning; filter applied after read. High bytes scanned vs small output.
Rewrite
-- After (fast): range filter enables pruning
SELECT user_id, created_at, revenue
FROM fact_events
WHERE created_at >= TIMESTAMP '2025-05-10 00:00:00'
AND created_at < TIMESTAMP '2025-05-11 00:00:00';
-- Also: remove unused columns to reduce I/O
Example 2 — Join explosion from wrong predicate
-- Before (slow): unintended cross join due to NULL join keys
SELECT f.order_id, d.country_name
FROM fact_orders f
JOIN dim_country d
ON f.country_code = d.country_code
WHERE f.order_date >= DATE '2025-06-01';
-- Some rows have f.country_code NULL; join produces many-to-many with default rows
What the profile shows
Huge intermediate after join, massive shuffle, skew on a few keys.
Rewrite
-- Enforce valid keys and broadcast tiny dim
SELECT /*+ BROADCAST(d) */ f.order_id, d.country_name
FROM fact_orders f
JOIN dim_country d
ON f.country_code = d.country_code
WHERE f.order_date >= DATE '2025-06-01'
AND f.country_code IS NOT NULL;
-- If hints unsupported: pre-filter dim and ensure NOT NULL constraints upstream
Example 3 — Heavy window function
-- Before: window over entire dataset
SELECT user_id, event_time,
SUM(amount) OVER (PARTITION BY user_id ORDER BY event_time
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_amount
FROM fact_payments
WHERE event_time >= DATE '2025-01-01';
What the profile shows
Large sort and spill due to wide partition and unbounded window.
Rewrite
-- Pre-aggregate by day, then do a smaller window
WITH daily AS (
SELECT user_id, DATE(event_time) AS d, SUM(amount) AS amt
FROM fact_payments
WHERE event_time >= DATE '2025-01-01'
GROUP BY user_id, DATE(event_time)
)
SELECT user_id, d,
SUM(amt) OVER (PARTITION BY user_id ORDER BY d
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_amount
FROM daily;
Step-by-step playbook
Exercises
These match the interactive exercises below. Do them here, then check your answers in the solution blocks.
Exercise 1 — Make the filter sargable
You have:
SELECT *
FROM events
WHERE DATE(created_at) = '2025-05-10'
AND LOWER(device_type) = 'ios';
- Rewrite to enable partition/index pruning on created_at.
- Avoid scanning unnecessary columns.
- Assume device_type is stored consistently lowercase (no need for LOWER()).
Hint
Use a half-open time range and project only the columns you need.
One possible answer
SELECT user_id, created_at, device_type
FROM events
WHERE created_at >= TIMESTAMP '2025-05-10 00:00:00'
AND created_at < TIMESTAMP '2025-05-11 00:00:00'
AND device_type = 'ios';
Exercise 2 — Choose an efficient join strategy
Tables: fact_sales (~2B rows), dim_store (~5k rows). You need revenue by region for last 7 days.
-- Current query (slow)
SELECT s.region, SUM(f.revenue) AS total_rev
FROM fact_sales f
JOIN dim_store s ON f.store_id = s.store_id
WHERE f.sale_ts >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY s.region;
- Rewrite to reduce shuffled data. Consider broadcasting dim_store or pre-aggregating fact_sales.
Hint
Aggregate the fact table before the join or broadcast the small dimension.
One possible answer
WITH fact_7d AS (
SELECT store_id, SUM(revenue) AS rev
FROM fact_sales
WHERE sale_ts >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY store_id
)
SELECT /*+ BROADCAST(s) */ s.region, SUM(f.rev) AS total_rev
FROM fact_7d f
JOIN dim_store s ON f.store_id = s.store_id
GROUP BY s.region;
Self-check checklist
- [ ] My rewrites reduced scanned bytes and/or removed unnecessary columns.
- [ ] My joins either broadcasted small tables or shrank large tables before joining.
- [ ] My filters are range-based and do not apply functions to columns.
Common mistakes and how to self-check
- Mistake: Tuning warehouse size first. Fix: Profile and reduce work before scaling compute.
- Mistake: SELECT *. Fix: Project only needed columns and re-measure.
- Mistake: Functions on columns (e.g., DATE(col)). Fix: Use range predicates.
- Mistake: Ignoring skew. Fix: Inspect key distributions; salt or split hot keys.
- Mistake: Blindly trusting estimates. Fix: Compare estimated vs actual rows; refresh stats if needed.
- Mistake: Complex CTE chains that block predicate pushdown. Fix: Inline or materialize strategically; verify filter pushdown in profile.
Practical projects
- Build a query tuning runbook: include screenshots of plans before/after, byte scans, and the exact code changes.
- Create a top-k pattern library: implement fast top-N per group without global sorts; document when to use.
- Skew lab: fabricate skewed data and demonstrate two mitigation techniques (salting and pre-splitting hot keys).
Who this is for, prerequisites, and learning path
Who this is for
Data Platform Engineers, Analytics Engineers, and Data Scientists who own or debug warehouse workloads.
Prerequisites
- Comfort with SQL (joins, aggregates, window functions).
- Basic understanding of your warehouse’s EXPLAIN/PROFILE tools.
- Familiarity with partitioning, clustering, and statistics concepts.
Learning path
- Start here: troubleshooting slow queries.
- Next: partitioning/clustering strategies and cost-aware SQL patterns.
- Then: workload management, concurrency controls, and autoscaling strategies.
Next steps
- Refactor one slow production query using the checklist; document impact.
- Schedule stats maintenance on critical tables.
- Prepare a small internal guide with 5 common rewrites for your team.
Mini challenge
Your query selects 8 columns, filters on DATE(event_ts) for yesterday, joins a 1k-row dim table, and does ORDER BY amount DESC LIMIT 20. It scans 1.5 TB and sorts 400M rows. Pick three actions that likely give the biggest win:
- Rewrite the filter to a half-open time range for pruning.
- Project only the columns used by ORDER BY and final output.
- Broadcast the small dimension and pre-aggregate fact data before the sort.
Tip: After changes, re-profile to confirm reduced scanned bytes and sorted rows. Quick test is available to everyone; sign in to save your progress.