luvv to helpDiscover the Best Free Online Tools
Topic 7 of 8

Troubleshooting Slow Queries

Learn Troubleshooting Slow Queries for free with explanations, exercises, and a quick test (for Data Platform Engineer).

Published: January 11, 2026 | Updated: January 11, 2026

Why this matters

As a Data Platform Engineer, slow queries delay pipelines, break SLAs, and inflate compute costs. Real tasks include:

  • Speeding up BI dashboards that time out during peak hours.
  • Reducing ETL model runtimes from 40 minutes to under 10.
  • Cutting warehouse spend by fixing queries that scan terabytes unnecessarily.

Mastering a repeatable approach lets you respond fast and prevent regressions.

Concept explained simply

Queries are slow for two big reasons:

  • Too much work: reading/scanning many bytes, shuffling/sorting large data, joining huge intermediates.
  • Wrong work: poor plans caused by functions on columns, bad join conditions, stale stats, or non-selective filters.

Mental model

Picture your query as a funnel:

  • Project less: avoid SELECT *. Keep only required columns.
  • Filter early: make filters sargable (index/partition friendly) so the engine skips data.
  • Join smart: join small-to-large, use correct keys, and prefer broadcasting tiny tables.
  • Aggregate/window late but efficiently: pre-aggregate where possible to shrink inputs.

Always measure with EXPLAIN/PROFILE to see where time and bytes are spent.

A practical checklist

  • [ ] Confirm it is slow: compare runtime to historical baseline and note input sizes.
  • [ ] Get a plan/profile: run EXPLAIN/EXPLAIN ANALYZE or query profile; record operators with highest time/bytes.
  • [ ] Validate filters: avoid functions on columns (e.g., DATE(ts)); use ranges and correct data types.
  • [ ] Reduce columns: replace SELECT * with only needed fields.
  • [ ] Enable pruning: ensure filters hit partition/sort/cluster keys. Refresh/analyze statistics if estimates are off.
  • [ ] Fix joins: correct keys, remove accidental cross joins, broadcast small tables, pre-aggregate large ones.
  • [ ] Check skew: identify hot keys; mitigate with salting or adaptive/skew joins if supported.
  • [ ] Tame heavy ops: rewrite window functions, decompose complex CTEs, avoid UDFs in tight loops.
  • [ ] Soften sorts: avoid global sorts; use TOP-K patterns or partial sorts when possible.
  • [ ] Check concurrency: if slow only at peak, consider scaling resources or dedicated queues/warehouses.

Worked examples

Example 1 — Non-sargable date filter causes full scan

-- Before (slow): scans partitions fully
overview as
SELECT user_id, created_at, revenue
FROM fact_events
WHERE DATE(created_at) = '2025-05-10';
What the profile shows

Large scan with poor pruning; filter applied after read. High bytes scanned vs small output.

Rewrite
-- After (fast): range filter enables pruning
SELECT user_id, created_at, revenue
FROM fact_events
WHERE created_at >= TIMESTAMP '2025-05-10 00:00:00'
  AND created_at < TIMESTAMP '2025-05-11 00:00:00';
-- Also: remove unused columns to reduce I/O

Example 2 — Join explosion from wrong predicate

-- Before (slow): unintended cross join due to NULL join keys
SELECT f.order_id, d.country_name
FROM fact_orders f
JOIN dim_country d
  ON f.country_code = d.country_code
WHERE f.order_date >= DATE '2025-06-01';
-- Some rows have f.country_code NULL; join produces many-to-many with default rows
What the profile shows

Huge intermediate after join, massive shuffle, skew on a few keys.

Rewrite
-- Enforce valid keys and broadcast tiny dim
SELECT /*+ BROADCAST(d) */ f.order_id, d.country_name
FROM fact_orders f
JOIN dim_country d
  ON f.country_code = d.country_code
WHERE f.order_date >= DATE '2025-06-01'
  AND f.country_code IS NOT NULL;
-- If hints unsupported: pre-filter dim and ensure NOT NULL constraints upstream

Example 3 — Heavy window function

-- Before: window over entire dataset
SELECT user_id, event_time,
       SUM(amount) OVER (PARTITION BY user_id ORDER BY event_time
                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_amount
FROM fact_payments
WHERE event_time >= DATE '2025-01-01';
What the profile shows

Large sort and spill due to wide partition and unbounded window.

Rewrite
-- Pre-aggregate by day, then do a smaller window
WITH daily AS (
  SELECT user_id, DATE(event_time) AS d, SUM(amount) AS amt
  FROM fact_payments
  WHERE event_time >= DATE '2025-01-01'
  GROUP BY user_id, DATE(event_time)
)
SELECT user_id, d,
       SUM(amt) OVER (PARTITION BY user_id ORDER BY d
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_amount
FROM daily;

Step-by-step playbook

1. Capture context: inputs, expected outputs, historical runtime, and when it regressed.
2. Run EXPLAIN/PROFILE. Note operators with top time/bytes and any spills.
3. Confirm row/byte estimates. If far off, refresh/analyze stats.
4. Make filters sargable and prune partitions; fix data types and casts.
5. Project only needed columns.
6. Fix joins: correct keys, pre-aggregate big tables, broadcast small ones, remove cross joins.
7. Reduce sorts/windows: prefer TOP-K, pre-aggregate, limit partitions.
8. Check skew: identify hot keys and mitigate.
9. Re-profile after each change; keep improvements that reduce bytes/time.
10. If only peak-time slowdowns remain, address concurrency/resources.

Exercises

These match the interactive exercises below. Do them here, then check your answers in the solution blocks.

Exercise 1 — Make the filter sargable

You have:

SELECT *
FROM events
WHERE DATE(created_at) = '2025-05-10'
  AND LOWER(device_type) = 'ios';
  • Rewrite to enable partition/index pruning on created_at.
  • Avoid scanning unnecessary columns.
  • Assume device_type is stored consistently lowercase (no need for LOWER()).
Hint

Use a half-open time range and project only the columns you need.

One possible answer
SELECT user_id, created_at, device_type
FROM events
WHERE created_at >= TIMESTAMP '2025-05-10 00:00:00'
  AND created_at < TIMESTAMP '2025-05-11 00:00:00'
  AND device_type = 'ios';

Exercise 2 — Choose an efficient join strategy

Tables: fact_sales (~2B rows), dim_store (~5k rows). You need revenue by region for last 7 days.

-- Current query (slow)
SELECT s.region, SUM(f.revenue) AS total_rev
FROM fact_sales f
JOIN dim_store s ON f.store_id = s.store_id
WHERE f.sale_ts >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY s.region;
  • Rewrite to reduce shuffled data. Consider broadcasting dim_store or pre-aggregating fact_sales.
Hint

Aggregate the fact table before the join or broadcast the small dimension.

One possible answer
WITH fact_7d AS (
  SELECT store_id, SUM(revenue) AS rev
  FROM fact_sales
  WHERE sale_ts >= CURRENT_DATE - INTERVAL '7 days'
  GROUP BY store_id
)
SELECT /*+ BROADCAST(s) */ s.region, SUM(f.rev) AS total_rev
FROM fact_7d f
JOIN dim_store s ON f.store_id = s.store_id
GROUP BY s.region;

Self-check checklist

  • [ ] My rewrites reduced scanned bytes and/or removed unnecessary columns.
  • [ ] My joins either broadcasted small tables or shrank large tables before joining.
  • [ ] My filters are range-based and do not apply functions to columns.

Common mistakes and how to self-check

  • Mistake: Tuning warehouse size first. Fix: Profile and reduce work before scaling compute.
  • Mistake: SELECT *. Fix: Project only needed columns and re-measure.
  • Mistake: Functions on columns (e.g., DATE(col)). Fix: Use range predicates.
  • Mistake: Ignoring skew. Fix: Inspect key distributions; salt or split hot keys.
  • Mistake: Blindly trusting estimates. Fix: Compare estimated vs actual rows; refresh stats if needed.
  • Mistake: Complex CTE chains that block predicate pushdown. Fix: Inline or materialize strategically; verify filter pushdown in profile.

Practical projects

  • Build a query tuning runbook: include screenshots of plans before/after, byte scans, and the exact code changes.
  • Create a top-k pattern library: implement fast top-N per group without global sorts; document when to use.
  • Skew lab: fabricate skewed data and demonstrate two mitigation techniques (salting and pre-splitting hot keys).

Who this is for, prerequisites, and learning path

Who this is for

Data Platform Engineers, Analytics Engineers, and Data Scientists who own or debug warehouse workloads.

Prerequisites

  • Comfort with SQL (joins, aggregates, window functions).
  • Basic understanding of your warehouse’s EXPLAIN/PROFILE tools.
  • Familiarity with partitioning, clustering, and statistics concepts.

Learning path

  • Start here: troubleshooting slow queries.
  • Next: partitioning/clustering strategies and cost-aware SQL patterns.
  • Then: workload management, concurrency controls, and autoscaling strategies.

Next steps

  • Refactor one slow production query using the checklist; document impact.
  • Schedule stats maintenance on critical tables.
  • Prepare a small internal guide with 5 common rewrites for your team.

Mini challenge

Your query selects 8 columns, filters on DATE(event_ts) for yesterday, joins a 1k-row dim table, and does ORDER BY amount DESC LIMIT 20. It scans 1.5 TB and sorts 400M rows. Pick three actions that likely give the biggest win:

  • Rewrite the filter to a half-open time range for pruning.
  • Project only the columns used by ORDER BY and final output.
  • Broadcast the small dimension and pre-aggregate fact data before the sort.

Tip: After changes, re-profile to confirm reduced scanned bytes and sorted rows. Quick test is available to everyone; sign in to save your progress.

Practice Exercises

2 exercises to complete

Instructions

Rewrite the query to enable partition/index pruning and avoid scanning unnecessary columns:

SELECT *
FROM events
WHERE DATE(created_at) = '2025-05-10'
  AND LOWER(device_type) = 'ios';
  • Use a half-open timestamp range on created_at.
  • Assume device_type is already lowercase.
  • Return only user_id, created_at, device_type.
Expected Output
A SQL statement using a half-open time range on created_at and projecting only the 3 requested columns.

Troubleshooting Slow Queries — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Troubleshooting Slow Queries?

AI Assistant

Ask questions about this tool