luvv to helpDiscover the Best Free Online Tools
Topic 8 of 8

Performance Aware Queries

Learn Performance Aware Queries for free with explanations, exercises, and a quick test (for Product Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Why this matters

As a Product Analyst, your queries power dashboards, experiments, and decisions. Slow or inefficient SQL blocks teammates and inflates costs. Performance-aware queries help you:

  • Refresh dashboards on time (daily/near-real-time).
  • Analyze large event tables without timeouts.
  • Run A/B test metrics across millions of rows quickly.
  • Build reliable, reproducible queries that scale as data grows.

Concept explained simply

Performance-aware SQL means shaping your query so the database reads less, compares less, sorts less, and returns exactly what you need—no more. Think of it like packing a carry-on: remove the non-essentials early, keep items organized, and avoid last-minute repacking.

Mental model

Picture a pipeline of stages: Read → Filter → Join → Aggregate → Window → Sort → Return. Each stage can expand or shrink rows. Your job: shrink early, avoid unnecessary expansions, and only sort when truly needed.

  • Reduce early: limit the time range and columns before joins.
  • Join small-to-big: filter facts first, then join to dimensions.
  • Aggregate before expensive steps if it reduces rows safely.
  • Sort only when your output requires ordering.
  • Use a plan (EXPLAIN) to verify how rows flow.

Core techniques (with quick tips)

Filter early and sargably (predicate pushdown)
  • Filter by partition/time first: e.g., event_time >= CURRENT_DATE - INTERVAL '30 days'.
  • Avoid wrapping columns in functions in WHERE, which can block index/partition pruning (e.g., prefer event_time >= DATE '2025-01-01' over DATE(event_time) >= ...).
  • Use range filters that align to how data is stored/partitioned.
Select only what you need
  • Avoid SELECT *. Pick the columns required for downstream steps.
  • Wide rows cost memory and I/O, especially before joins and sorts.
Join efficiently
  • Reduce the fact table before joining to dimensions.
  • Join on keys with good selectivity and appropriate data types.
  • Consider EXISTS for membership checks to avoid huge DISTINCTs.
Aggregate smartly
  • Pre-aggregate to one row per entity before joining to other large sets when possible.
  • Count distinct carefully; pre-aggregate or approximate if your engine supports it and precision is not critical.
Use window functions wisely
  • Partition by the minimal keys necessary.
  • Only include ORDER BY inside OVER() when truly required (it can force large sorts).
Sort and limit intentionally
  • ORDER BY on big datasets is expensive; do it last and only if needed.
  • LIMIT reduces returned rows, not always scanned rows—combine with selective filters and order on indexed/partition keys when possible.
CTEs and subqueries
  • CTEs can improve readability; engines may inline or materialize them. Check with EXPLAIN.
  • Keep CTEs small and purposeful; avoid stacking many heavy CTEs if one can be simplified.
Read the plan (EXPLAIN) like a pro
  • Look at estimated rows per step; big blow-ups warn of problems.
  • Check join methods and key usage.
  • Watch for large sorts or scans on full tables/partitions.

Worked examples

1) Time filter: avoid wrapping the column

Goal: last 30 days of events per day.

Less efficient:

SELECT DATE(event_time) AS d, COUNT(*)
FROM analytics.events
WHERE DATE(event_time) >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1;

More performance-aware (filter is sargable):

SELECT CAST(event_time AS DATE) AS d, COUNT(*)
FROM analytics.events
WHERE event_time >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1;

Key idea: The WHERE clause avoids wrapping the column, enabling better partition/index pruning. Casting for GROUP BY is usually fine.

2) Join after reducing the fact table

Goal: DAU with country from users for last 7 days.

Less efficient:

SELECT u.country, COUNT(DISTINCT e.user_id) AS dau
FROM analytics.events e
JOIN analytics.users u ON e.user_id = u.user_id
WHERE e.event_time >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY u.country;

More performance-aware (pre-filter and slim columns):

WITH recent_events AS (
  SELECT user_id
  FROM analytics.events
  WHERE event_time >= CURRENT_DATE - INTERVAL '7 days'
  GROUP BY user_id
)
SELECT u.country, COUNT(re.user_id) AS dau
FROM recent_events re
JOIN analytics.users u ON re.user_id = u.user_id
GROUP BY u.country;

Key idea: Shrink to one row per user before joining to the dimension table.

3) EXISTS instead of DISTINCT-heavy joins

Goal: Count signups who converted (made at least one order) per campaign last 14 days.

Less efficient (can create duplicates):

SELECT s.campaign_id, COUNT(DISTINCT s.user_id) AS converted_users
FROM analytics.signups s
JOIN analytics.orders o ON o.user_id = s.user_id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '14 days'
GROUP BY s.campaign_id;

More performance-aware (semi-join via EXISTS):

SELECT s.campaign_id, COUNT(*) AS converted_users
FROM (
  SELECT DISTINCT user_id, campaign_id
  FROM analytics.signups
) s
WHERE EXISTS (
  SELECT 1
  FROM analytics.orders o
  WHERE o.user_id = s.user_id
    AND o.created_at >= CURRENT_DATE - INTERVAL '14 days'
)
GROUP BY s.campaign_id;

Key idea: Use EXISTS to check membership without generating a large joined intermediate.

Exercises

These mirror the exercises below the lesson. Try them before opening solutions.

  • Checklist before you run:
    • Time filter is on the column, not wrapping the column with a function.
    • Only required columns are selected before joins.
    • Pre-aggregate to one row per entity when it reduces size safely.
    • Use EXISTS for membership checks instead of DISTINCT-heavy joins.

Exercise 1 — Optimize DAU for last 30 days

Original query:

SELECT DATE(event_time) AS d, COUNT(DISTINCT user_id) AS dau
FROM analytics.events
WHERE DATE(event_time) >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1
ORDER BY 1;

Task: Rewrite it to be more performance-aware while returning the same result.

Hints
  • Make the WHERE clause sargable on event_time.
  • Consider pre-aggregating to one row per user per day.
Show solution
WITH recent AS (
  SELECT user_id, CAST(event_time AS DATE) AS d
  FROM analytics.events
  WHERE event_time >= CURRENT_DATE - INTERVAL '30 days'
), per_day AS (
  SELECT d, user_id
  FROM recent
  GROUP BY d, user_id
)
SELECT d, COUNT(*) AS dau
FROM per_day
GROUP BY d
ORDER BY d;

We filter on the raw column first, then collapse to one row per user per day before the final count.

Exercise 2 — Conversions by campaign with EXISTS

Original query:

SELECT s.campaign_id, COUNT(DISTINCT s.user_id) AS converted
FROM analytics.signups s
JOIN analytics.orders o ON o.user_id = s.user_id
GROUP BY s.campaign_id;

Task: Count signups that placed an order in the last 7 days, per campaign, using a more efficient pattern.

Hints
  • Prefilter orders to last 7 days.
  • Use EXISTS to avoid duplicate-creating joins.
Show solution
WITH recent_orders AS (
  SELECT DISTINCT user_id
  FROM analytics.orders
  WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT s.campaign_id, COUNT(*) AS converted
FROM (
  SELECT DISTINCT user_id, campaign_id
  FROM analytics.signups
) s
WHERE EXISTS (
  SELECT 1
  FROM recent_orders ro
  WHERE ro.user_id = s.user_id
)
GROUP BY s.campaign_id;

We reduce both sides first and use a semi-join pattern.

Common mistakes and self-check

  • Wrapping columns in functions in WHERE. Self-check: Is your filter written as column OP constant?
  • SELECT * before a join. Self-check: Do you only select columns used later?
  • Counting distinct after a 1-to-many join. Self-check: Can you pre-aggregate to one row per entity first?
  • Sorting unnecessarily. Self-check: Do you really need ORDER BY in the final output?
  • Overusing CTEs that cause large intermediates. Self-check: Does EXPLAIN show materialization or large scans for the CTE?

Who this is for

  • Product Analysts querying large event and transaction datasets.
  • Anyone building dashboards, experiment readouts, and cohort analyses.

Prerequisites

  • Comfort with SELECT, WHERE, JOIN, GROUP BY, and basic window functions.
  • Basic understanding of your warehouse partitioning and table conventions.

Learning path

  1. Practice sargable filters on time-partitioned tables.
  2. Rewrite joins to pre-aggregate and/or use EXISTS where appropriate.
  3. Optimize window queries by reducing partitions and ORDER BY usage.
  4. Use EXPLAIN to confirm row counts and join methods.
  5. Apply patterns to a real dashboard or experiment query.

Practical projects

  • Speed up a DAU/MAU dashboard query by 2Ă— by pushing filters and pre-aggregating.
  • Rewrite an A/B test metric query to avoid DISTINCT-heavy joins.
  • Build a daily cohort table that runs under a set time budget (e.g., 2 minutes).

Mini challenge

Pick one frequently used query at your organization that runs slowly. Apply at least three techniques from this lesson (filter pushdown, pre-aggregation, EXISTS). Document the before/after runtime, row counts at each step, and final correctness check (same result set).

Next steps

  • Make a personal checklist you run through before saving a query to shared dashboards.
  • Practice reading EXPLAIN plans for three of your team's heaviest queries.
  • Share optimized patterns with teammates to standardize faster queries.

Quick Test

Take the Quick Test to check your understanding. Everyone can take it for free; if you are logged in, your progress will be saved.

Practice Exercises

2 exercises to complete

Instructions

Rewrite the query to compute daily active users (unique user_id per date) for the last 30 days. Make the WHERE clause sargable and reduce rows before the final count.

Original:

SELECT DATE(event_time) AS d, COUNT(DISTINCT user_id) AS dau
FROM analytics.events
WHERE DATE(event_time) >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1
ORDER BY 1;
Expected Output
30 rows: each date with the DAU count; identical results to the original query.

Performance Aware Queries — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Performance Aware Queries?

AI Assistant

Ask questions about this tool