Why this matters
As a Data Analyst, you will often need to filter results based on other results, compare a value to group statistics, or check if related records exist. Subqueries let you do this cleanly without creating many temporary tables. Knowing when to use uncorrelated vs correlated subqueries helps you write readable, efficient SQL for reports, dashboards, cohort analyses, anomaly checks, and quality assurance.
- Find products priced above the overall average price
- List customers who have no orders in the last 90 days
- Flag orders where line items do not sum to the order total
Concept explained simply
A subquery is a query inside another query. It can return a single value, a list of values, a table, or a boolean (when used with EXISTS).
- Uncorrelated subquery: Runs once, independent of the outer query. Think: a helper result you compute first, then use.
- Correlated subquery: Runs for each row of the outer query, because it references that row. Think: ask a question about “this row’s related rows.”
Mental model
Picture the outer query as a list you scan. For each row:
- Uncorrelated: you reuse the same precomputed answer (like a single average price).
- Correlated: you ask a row-specific question (e.g., “does this customer have an order in 2024?”).
Where subqueries can appear
- SELECT: scalar (single value) subqueries per outer row
- FROM: derived table (a subquery that acts like a table)
- WHERE / HAVING: predicates using IN, EXISTS, ANY/SOME, ALL, comparisons
Portable date extraction note
Use EXTRACT(YEAR FROM order_date)=2024 for ANSI SQL. Some systems also support YEAR(order_date)=2024.
Worked examples
1) Uncorrelated scalar subquery in WHERE
Goal: List products priced above the overall average price.
SELECT p.product_id, p.product_name, p.price
FROM products p
WHERE p.price > (
SELECT AVG(price) FROM products
);
Why it works: The inner AVG runs once. The result is compared for each product.
2) Correlated subquery with EXISTS
Goal: Customers who placed at least one order in 2024.
SELECT c.customer_id, c.customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND EXTRACT(YEAR FROM o.order_date) = 2024
);
Why it works: The inner query references c.customer_id (outer row), so it runs per customer.
3) Anti-join with NOT EXISTS (customers with no 2024 orders)
SELECT c.customer_id, c.customer_name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND EXTRACT(YEAR FROM o.order_date) = 2024
);
Why it works: NOT EXISTS filters out customers who have a matching 2024 order.
4) Subquery in FROM (derived table)
Goal: Top 5 customers by 2024 spend using a derived table.
SELECT t.customer_id, t.total_spend_2024
FROM (
SELECT o.customer_id, SUM(o.total_amount) AS total_spend_2024
FROM orders o
WHERE EXTRACT(YEAR FROM o.order_date) = 2024
GROUP BY o.customer_id
) t
ORDER BY t.total_spend_2024 DESC
FETCH FIRST 5 ROWS ONLY;
Why it works: The inner query aggregates once; the outer query ranks and limits.
5) Scalar correlated subquery in SELECT
Goal: For each customer, show their most recent order date.
SELECT c.customer_id,
c.customer_name,
(
SELECT MAX(o.order_date)
FROM orders o
WHERE o.customer_id = c.customer_id
) AS last_order_date
FROM customers c;
Why it works: The inner query returns a single value per customer row.
When to use what
- Use EXISTS/NOT EXISTS for presence/absence checks. It short-circuits on first match and avoids duplicates.
- Use IN when comparing a value to a small list from a subquery.
- Use JOINs to bring columns from related tables. Use subqueries when you only need to filter or compute a single value.
- Use derived tables (FROM) to stage complex aggregations for clarity.
Performance tips
- Index join keys used in correlated subqueries (e.g., orders.customer_id).
- Prefer EXISTS to IN when the inner set can be large or non-unique.
- Ensure scalar subqueries really return one value; add LIMIT or aggregate if needed.
- Consider rewriting heavy correlated subqueries as JOIN + GROUP BY if they run slowly.
Exercises (do these now)
These mirror the exercises provided below. Try to solve without looking at the solutions. Then check your work.
Exercise 1 — Orders above average total (uncorrelated)
Return orders where total_amount is greater than the average total_amount of all orders.
- Expected columns: order_id, total_amount
- Constraint: Use a subquery in WHERE
Show a small hint
Compare total_amount to a single AVG() computed once in a subquery.
Exercise 2 — Customers with 2+ orders in 2024 (correlated)
Return customers who placed at least two orders in 2024.
- Expected columns: customer_id, customer_name
- Constraint: Use a correlated subquery (scalar or EXISTS) in WHERE
Show a small hint
A scalar subquery can compute COUNT(*) per customer_id for 2024.
Exercise 3 — Products never ordered (anti-join with NOT EXISTS)
Return products that have no matching rows in order_items.
- Expected columns: product_id, product_name
- Constraint: Use NOT EXISTS
Show a small hint
Correlate order_items.product_id to products.product_id.
Self-check checklist
- Your queries run without errors.
- Exercise 1 uses one AVG() subquery, not a join.
- Exercise 2 references the outer customer in the inner WHERE clause.
- Exercise 3 uses NOT EXISTS and correlates on product_id.
Common mistakes and how to self-check
- Mistake: Using a scalar subquery that returns multiple rows. Fix: Aggregate (e.g., MAX) or filter to one row.
- Mistake: IN with a large, duplicate set causing slowdowns. Fix: Switch to EXISTS or SELECT DISTINCT in the subquery.
- Mistake: Forgetting to correlate (missing join keys) so the subquery runs once and yields wrong logic. Fix: Confirm inner WHERE references outer columns.
- Mistake: Selecting columns from the inner subquery in WHERE. Fix: WHERE subqueries cannot output columns to the SELECT list; join instead.
- Mistake: Comparing NULLs incorrectly. Fix: Use IS NULL/IS NOT NULL or COALESCE where appropriate.
Quick self-review
- Can I explain why my subquery is correlated or uncorrelated?
- If the subquery runs per row, are the key columns indexed?
- Does the filter logic match the business question exactly?
Practical projects
- Churn detector: List customers with no orders in the last 90 days using NOT EXISTS; email them a win-back offer list.
- Price positioner: Flag products priced above category average (scalar subquery) and summarize counts per category.
- High-value cohort: Build a derived table for 2024 spend per customer, then rank and segment into tiers.
Who this is for
- Junior to intermediate Data Analysts who know basic SELECT/JOIN/GROUP BY.
- BI developers or data-savvy PMs who write SQL for analytics.
Prerequisites
- Comfort with SELECT, WHERE, JOIN, GROUP BY
- Basic understanding of indexes and primary/foreign keys
Learning path
- Refresh JOINs and GROUP BY
- Learn uncorrelated subqueries (scalar and IN)
- Learn correlated subqueries (EXISTS/NOT EXISTS)
- Practice derived tables in FROM
- Optimize and rewrite between subqueries and joins
Next steps
- Practice with real datasets (orders, products, customers).
- Try rewriting your subqueries as joins and compare results and performance.
- Take the Quick Test below to check mastery. Everyone can take it; saving progress requires login.
Mini challenge
Return products whose price is greater than the average price within their own category (category_id). Use a correlated subquery.
Show one possible answer
SELECT p.product_id, p.product_name, p.category_id, p.price
FROM products p
WHERE p.price > (
SELECT AVG(p2.price)
FROM products p2
WHERE p2.category_id = p.category_id
);
Check: The inner subquery references p.category_id (correlated), and AVG runs per category.
Ready to test yourself? Open the Quick Test section. Estimated time: 5–7 minutes. Everyone can take it; only logged-in users get saved progress.