Why this matters
As a Product Analyst, your queries power dashboards, experiments, and decisions. Slow or inefficient SQL blocks teammates and inflates costs. Performance-aware queries help you:
- Refresh dashboards on time (daily/near-real-time).
- Analyze large event tables without timeouts.
- Run A/B test metrics across millions of rows quickly.
- Build reliable, reproducible queries that scale as data grows.
Concept explained simply
Performance-aware SQL means shaping your query so the database reads less, compares less, sorts less, and returns exactly what you need—no more. Think of it like packing a carry-on: remove the non-essentials early, keep items organized, and avoid last-minute repacking.
Mental model
Picture a pipeline of stages: Read → Filter → Join → Aggregate → Window → Sort → Return. Each stage can expand or shrink rows. Your job: shrink early, avoid unnecessary expansions, and only sort when truly needed.
- Reduce early: limit the time range and columns before joins.
- Join small-to-big: filter facts first, then join to dimensions.
- Aggregate before expensive steps if it reduces rows safely.
- Sort only when your output requires ordering.
- Use a plan (EXPLAIN) to verify how rows flow.
Core techniques (with quick tips)
Filter early and sargably (predicate pushdown)
- Filter by partition/time first: e.g., event_time >= CURRENT_DATE - INTERVAL '30 days'.
- Avoid wrapping columns in functions in WHERE, which can block index/partition pruning (e.g., prefer event_time >= DATE '2025-01-01' over DATE(event_time) >= ...).
- Use range filters that align to how data is stored/partitioned.
Select only what you need
- Avoid SELECT *. Pick the columns required for downstream steps.
- Wide rows cost memory and I/O, especially before joins and sorts.
Join efficiently
- Reduce the fact table before joining to dimensions.
- Join on keys with good selectivity and appropriate data types.
- Consider EXISTS for membership checks to avoid huge DISTINCTs.
Aggregate smartly
- Pre-aggregate to one row per entity before joining to other large sets when possible.
- Count distinct carefully; pre-aggregate or approximate if your engine supports it and precision is not critical.
Use window functions wisely
- Partition by the minimal keys necessary.
- Only include ORDER BY inside OVER() when truly required (it can force large sorts).
Sort and limit intentionally
- ORDER BY on big datasets is expensive; do it last and only if needed.
- LIMIT reduces returned rows, not always scanned rows—combine with selective filters and order on indexed/partition keys when possible.
CTEs and subqueries
- CTEs can improve readability; engines may inline or materialize them. Check with EXPLAIN.
- Keep CTEs small and purposeful; avoid stacking many heavy CTEs if one can be simplified.
Read the plan (EXPLAIN) like a pro
- Look at estimated rows per step; big blow-ups warn of problems.
- Check join methods and key usage.
- Watch for large sorts or scans on full tables/partitions.
Worked examples
1) Time filter: avoid wrapping the column
Goal: last 30 days of events per day.
Less efficient:
SELECT DATE(event_time) AS d, COUNT(*)
FROM analytics.events
WHERE DATE(event_time) >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1;
More performance-aware (filter is sargable):
SELECT CAST(event_time AS DATE) AS d, COUNT(*)
FROM analytics.events
WHERE event_time >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1;
Key idea: The WHERE clause avoids wrapping the column, enabling better partition/index pruning. Casting for GROUP BY is usually fine.
2) Join after reducing the fact table
Goal: DAU with country from users for last 7 days.
Less efficient:
SELECT u.country, COUNT(DISTINCT e.user_id) AS dau
FROM analytics.events e
JOIN analytics.users u ON e.user_id = u.user_id
WHERE e.event_time >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY u.country;
More performance-aware (pre-filter and slim columns):
WITH recent_events AS (
SELECT user_id
FROM analytics.events
WHERE event_time >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY user_id
)
SELECT u.country, COUNT(re.user_id) AS dau
FROM recent_events re
JOIN analytics.users u ON re.user_id = u.user_id
GROUP BY u.country;
Key idea: Shrink to one row per user before joining to the dimension table.
3) EXISTS instead of DISTINCT-heavy joins
Goal: Count signups who converted (made at least one order) per campaign last 14 days.
Less efficient (can create duplicates):
SELECT s.campaign_id, COUNT(DISTINCT s.user_id) AS converted_users
FROM analytics.signups s
JOIN analytics.orders o ON o.user_id = s.user_id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '14 days'
GROUP BY s.campaign_id;
More performance-aware (semi-join via EXISTS):
SELECT s.campaign_id, COUNT(*) AS converted_users
FROM (
SELECT DISTINCT user_id, campaign_id
FROM analytics.signups
) s
WHERE EXISTS (
SELECT 1
FROM analytics.orders o
WHERE o.user_id = s.user_id
AND o.created_at >= CURRENT_DATE - INTERVAL '14 days'
)
GROUP BY s.campaign_id;
Key idea: Use EXISTS to check membership without generating a large joined intermediate.
Exercises
These mirror the exercises below the lesson. Try them before opening solutions.
- Checklist before you run:
- Time filter is on the column, not wrapping the column with a function.
- Only required columns are selected before joins.
- Pre-aggregate to one row per entity when it reduces size safely.
- Use EXISTS for membership checks instead of DISTINCT-heavy joins.
Exercise 1 — Optimize DAU for last 30 days
Original query:
SELECT DATE(event_time) AS d, COUNT(DISTINCT user_id) AS dau
FROM analytics.events
WHERE DATE(event_time) >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1
ORDER BY 1;
Task: Rewrite it to be more performance-aware while returning the same result.
Hints
- Make the WHERE clause sargable on event_time.
- Consider pre-aggregating to one row per user per day.
Show solution
WITH recent AS (
SELECT user_id, CAST(event_time AS DATE) AS d
FROM analytics.events
WHERE event_time >= CURRENT_DATE - INTERVAL '30 days'
), per_day AS (
SELECT d, user_id
FROM recent
GROUP BY d, user_id
)
SELECT d, COUNT(*) AS dau
FROM per_day
GROUP BY d
ORDER BY d;
We filter on the raw column first, then collapse to one row per user per day before the final count.
Exercise 2 — Conversions by campaign with EXISTS
Original query:
SELECT s.campaign_id, COUNT(DISTINCT s.user_id) AS converted
FROM analytics.signups s
JOIN analytics.orders o ON o.user_id = s.user_id
GROUP BY s.campaign_id;
Task: Count signups that placed an order in the last 7 days, per campaign, using a more efficient pattern.
Hints
- Prefilter orders to last 7 days.
- Use EXISTS to avoid duplicate-creating joins.
Show solution
WITH recent_orders AS (
SELECT DISTINCT user_id
FROM analytics.orders
WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT s.campaign_id, COUNT(*) AS converted
FROM (
SELECT DISTINCT user_id, campaign_id
FROM analytics.signups
) s
WHERE EXISTS (
SELECT 1
FROM recent_orders ro
WHERE ro.user_id = s.user_id
)
GROUP BY s.campaign_id;
We reduce both sides first and use a semi-join pattern.
Common mistakes and self-check
- Wrapping columns in functions in WHERE. Self-check: Is your filter written as column OP constant?
- SELECT * before a join. Self-check: Do you only select columns used later?
- Counting distinct after a 1-to-many join. Self-check: Can you pre-aggregate to one row per entity first?
- Sorting unnecessarily. Self-check: Do you really need ORDER BY in the final output?
- Overusing CTEs that cause large intermediates. Self-check: Does EXPLAIN show materialization or large scans for the CTE?
Who this is for
- Product Analysts querying large event and transaction datasets.
- Anyone building dashboards, experiment readouts, and cohort analyses.
Prerequisites
- Comfort with SELECT, WHERE, JOIN, GROUP BY, and basic window functions.
- Basic understanding of your warehouse partitioning and table conventions.
Learning path
- Practice sargable filters on time-partitioned tables.
- Rewrite joins to pre-aggregate and/or use EXISTS where appropriate.
- Optimize window queries by reducing partitions and ORDER BY usage.
- Use EXPLAIN to confirm row counts and join methods.
- Apply patterns to a real dashboard or experiment query.
Practical projects
- Speed up a DAU/MAU dashboard query by 2Ă— by pushing filters and pre-aggregating.
- Rewrite an A/B test metric query to avoid DISTINCT-heavy joins.
- Build a daily cohort table that runs under a set time budget (e.g., 2 minutes).
Mini challenge
Pick one frequently used query at your organization that runs slowly. Apply at least three techniques from this lesson (filter pushdown, pre-aggregation, EXISTS). Document the before/after runtime, row counts at each step, and final correctness check (same result set).
Next steps
- Make a personal checklist you run through before saving a query to shared dashboards.
- Practice reading EXPLAIN plans for three of your team's heaviest queries.
- Share optimized patterns with teammates to standardize faster queries.
Quick Test
Take the Quick Test to check your understanding. Everyone can take it for free; if you are logged in, your progress will be saved.