How to learn Performance Aware Queries for SQL in Product Analyst for free

Why this matters

As a Product Analyst, your queries power dashboards, experiments, and decisions. Slow or inefficient SQL blocks teammates and inflates costs. Performance-aware queries help you:

Refresh dashboards on time (daily/near-real-time).
Analyze large event tables without timeouts.
Run A/B test metrics across millions of rows quickly.
Build reliable, reproducible queries that scale as data grows.

Concept explained simply

Performance-aware SQL means shaping your query so the database reads less, compares less, sorts less, and returns exactly what you need—no more. Think of it like packing a carry-on: remove the non-essentials early, keep items organized, and avoid last-minute repacking.

Mental model

Picture a pipeline of stages: Read → Filter → Join → Aggregate → Window → Sort → Return. Each stage can expand or shrink rows. Your job: shrink early, avoid unnecessary expansions, and only sort when truly needed.

Reduce early: limit the time range and columns before joins.
Join small-to-big: filter facts first, then join to dimensions.
Aggregate before expensive steps if it reduces rows safely.
Sort only when your output requires ordering.
Use a plan (EXPLAIN) to verify how rows flow.

Core techniques (with quick tips)

Filter early and sargably (predicate pushdown)

Filter by partition/time first: e.g., event_time >= CURRENT_DATE - INTERVAL '30 days'.
Avoid wrapping columns in functions in WHERE, which can block index/partition pruning (e.g., prefer event_time >= DATE '2025-01-01' over DATE(event_time) >= ...).
Use range filters that align to how data is stored/partitioned.

Select only what you need

Avoid SELECT *. Pick the columns required for downstream steps.
Wide rows cost memory and I/O, especially before joins and sorts.

Join efficiently

Reduce the fact table before joining to dimensions.
Join on keys with good selectivity and appropriate data types.
Consider EXISTS for membership checks to avoid huge DISTINCTs.

Aggregate smartly

Pre-aggregate to one row per entity before joining to other large sets when possible.
Count distinct carefully; pre-aggregate or approximate if your engine supports it and precision is not critical.

Use window functions wisely

Partition by the minimal keys necessary.
Only include ORDER BY inside OVER() when truly required (it can force large sorts).

Sort and limit intentionally

ORDER BY on big datasets is expensive; do it last and only if needed.
LIMIT reduces returned rows, not always scanned rows—combine with selective filters and order on indexed/partition keys when possible.

CTEs and subqueries

CTEs can improve readability; engines may inline or materialize them. Check with EXPLAIN.
Keep CTEs small and purposeful; avoid stacking many heavy CTEs if one can be simplified.

Read the plan (EXPLAIN) like a pro

Look at estimated rows per step; big blow-ups warn of problems.
Check join methods and key usage.
Watch for large sorts or scans on full tables/partitions.

Worked examples

1) Time filter: avoid wrapping the column

Goal: last 30 days of events per day.

Less efficient:

SELECT DATE(event_time) AS d, COUNT(*)
FROM analytics.events
WHERE DATE(event_time) >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1;

More performance-aware (filter is sargable):

SELECT CAST(event_time AS DATE) AS d, COUNT(*)
FROM analytics.events
WHERE event_time >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1;

Key idea: The WHERE clause avoids wrapping the column, enabling better partition/index pruning. Casting for GROUP BY is usually fine.

2) Join after reducing the fact table

Goal: DAU with country from users for last 7 days.

Less efficient:

SELECT u.country, COUNT(DISTINCT e.user_id) AS dau
FROM analytics.events e
JOIN analytics.users u ON e.user_id = u.user_id
WHERE e.event_time >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY u.country;

More performance-aware (pre-filter and slim columns):

WITH recent_events AS (
  SELECT user_id
  FROM analytics.events
  WHERE event_time >= CURRENT_DATE - INTERVAL '7 days'
  GROUP BY user_id
)
SELECT u.country, COUNT(re.user_id) AS dau
FROM recent_events re
JOIN analytics.users u ON re.user_id = u.user_id
GROUP BY u.country;

Key idea: Shrink to one row per user before joining to the dimension table.

3) EXISTS instead of DISTINCT-heavy joins

Goal: Count signups who converted (made at least one order) per campaign last 14 days.

Less efficient (can create duplicates):

SELECT s.campaign_id, COUNT(DISTINCT s.user_id) AS converted_users
FROM analytics.signups s
JOIN analytics.orders o ON o.user_id = s.user_id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '14 days'
GROUP BY s.campaign_id;

More performance-aware (semi-join via EXISTS):

SELECT s.campaign_id, COUNT(*) AS converted_users
FROM (
  SELECT DISTINCT user_id, campaign_id
  FROM analytics.signups
) s
WHERE EXISTS (
  SELECT 1
  FROM analytics.orders o
  WHERE o.user_id = s.user_id
    AND o.created_at >= CURRENT_DATE - INTERVAL '14 days'
)
GROUP BY s.campaign_id;

Key idea: Use EXISTS to check membership without generating a large joined intermediate.

Exercises

These mirror the exercises below the lesson. Try them before opening solutions.

Checklist before you run:
- Time filter is on the column, not wrapping the column with a function.
- Only required columns are selected before joins.
- Pre-aggregate to one row per entity when it reduces size safely.
- Use EXISTS for membership checks instead of DISTINCT-heavy joins.

Exercise 1 — Optimize DAU for last 30 days

Original query:

SELECT DATE(event_time) AS d, COUNT(DISTINCT user_id) AS dau
FROM analytics.events
WHERE DATE(event_time) >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1
ORDER BY 1;

Task: Rewrite it to be more performance-aware while returning the same result.

Hints

Make the WHERE clause sargable on event_time.
Consider pre-aggregating to one row per user per day.

Show solution

WITH recent AS (
  SELECT user_id, CAST(event_time AS DATE) AS d
  FROM analytics.events
  WHERE event_time >= CURRENT_DATE - INTERVAL '30 days'
), per_day AS (
  SELECT d, user_id
  FROM recent
  GROUP BY d, user_id
)
SELECT d, COUNT(*) AS dau
FROM per_day
GROUP BY d
ORDER BY d;

We filter on the raw column first, then collapse to one row per user per day before the final count.

Exercise 2 — Conversions by campaign with EXISTS

Original query:

SELECT s.campaign_id, COUNT(DISTINCT s.user_id) AS converted
FROM analytics.signups s
JOIN analytics.orders o ON o.user_id = s.user_id
GROUP BY s.campaign_id;

Task: Count signups that placed an order in the last 7 days, per campaign, using a more efficient pattern.

Hints

Prefilter orders to last 7 days.
Use EXISTS to avoid duplicate-creating joins.

Show solution

WITH recent_orders AS (
  SELECT DISTINCT user_id
  FROM analytics.orders
  WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT s.campaign_id, COUNT(*) AS converted
FROM (
  SELECT DISTINCT user_id, campaign_id
  FROM analytics.signups
) s
WHERE EXISTS (
  SELECT 1
  FROM recent_orders ro
  WHERE ro.user_id = s.user_id
)
GROUP BY s.campaign_id;

We reduce both sides first and use a semi-join pattern.

Common mistakes and self-check

Wrapping columns in functions in WHERE. Self-check: Is your filter written as column OP constant?
SELECT * before a join. Self-check: Do you only select columns used later?
Counting distinct after a 1-to-many join. Self-check: Can you pre-aggregate to one row per entity first?
Sorting unnecessarily. Self-check: Do you really need ORDER BY in the final output?
Overusing CTEs that cause large intermediates. Self-check: Does EXPLAIN show materialization or large scans for the CTE?

Who this is for

Product Analysts querying large event and transaction datasets.
Anyone building dashboards, experiment readouts, and cohort analyses.

Prerequisites

Comfort with SELECT, WHERE, JOIN, GROUP BY, and basic window functions.
Basic understanding of your warehouse partitioning and table conventions.

Learning path

Practice sargable filters on time-partitioned tables.
Rewrite joins to pre-aggregate and/or use EXISTS where appropriate.
Optimize window queries by reducing partitions and ORDER BY usage.
Use EXPLAIN to confirm row counts and join methods.
Apply patterns to a real dashboard or experiment query.

Practical projects

Speed up a DAU/MAU dashboard query by 2× by pushing filters and pre-aggregating.
Rewrite an A/B test metric query to avoid DISTINCT-heavy joins.
Build a daily cohort table that runs under a set time budget (e.g., 2 minutes).

Mini challenge

Pick one frequently used query at your organization that runs slowly. Apply at least three techniques from this lesson (filter pushdown, pre-aggregation, EXISTS). Document the before/after runtime, row counts at each step, and final correctness check (same result set).

Next steps

Make a personal checklist you run through before saving a query to shared dashboards.
Practice reading EXPLAIN plans for three of your team's heaviest queries.
Share optimized patterns with teammates to standardize faster queries.

Quick Test

Take the Quick Test to check your understanding. Everyone can take it for free; if you are logged in, your progress will be saved.

Menu

Performance Aware Queries

Table of Contents