How to learn Subqueries Correlated Subqueries Uncorrelated Subqueries for SQL in Data Analyst for free

Why this matters

As a Data Analyst, you will often need to filter results based on other results, compare a value to group statistics, or check if related records exist. Subqueries let you do this cleanly without creating many temporary tables. Knowing when to use uncorrelated vs correlated subqueries helps you write readable, efficient SQL for reports, dashboards, cohort analyses, anomaly checks, and quality assurance.

Find products priced above the overall average price
List customers who have no orders in the last 90 days
Flag orders where line items do not sum to the order total

Concept explained simply

A subquery is a query inside another query. It can return a single value, a list of values, a table, or a boolean (when used with EXISTS).

Uncorrelated subquery: Runs once, independent of the outer query. Think: a helper result you compute first, then use.
Correlated subquery: Runs for each row of the outer query, because it references that row. Think: ask a question about “this row’s related rows.”

Mental model

Picture the outer query as a list you scan. For each row:

Uncorrelated: you reuse the same precomputed answer (like a single average price).
Correlated: you ask a row-specific question (e.g., “does this customer have an order in 2024?”).

Where subqueries can appear

SELECT: scalar (single value) subqueries per outer row
FROM: derived table (a subquery that acts like a table)
WHERE / HAVING: predicates using IN, EXISTS, ANY/SOME, ALL, comparisons

Portable date extraction note

Use EXTRACT(YEAR FROM order_date)=2024 for ANSI SQL. Some systems also support YEAR(order_date)=2024.

Worked examples

1) Uncorrelated scalar subquery in WHERE

Goal: List products priced above the overall average price.

SELECT p.product_id, p.product_name, p.price
FROM products p
WHERE p.price > (
  SELECT AVG(price) FROM products
);

Why it works: The inner AVG runs once. The result is compared for each product.

2) Correlated subquery with EXISTS

Goal: Customers who placed at least one order in 2024.

SELECT c.customer_id, c.customer_name
FROM customers c
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
    AND EXTRACT(YEAR FROM o.order_date) = 2024
);

Why it works: The inner query references c.customer_id (outer row), so it runs per customer.

3) Anti-join with NOT EXISTS (customers with no 2024 orders)

SELECT c.customer_id, c.customer_name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
    AND EXTRACT(YEAR FROM o.order_date) = 2024
);

Why it works: NOT EXISTS filters out customers who have a matching 2024 order.

4) Subquery in FROM (derived table)

Goal: Top 5 customers by 2024 spend using a derived table.

SELECT t.customer_id, t.total_spend_2024
FROM (
  SELECT o.customer_id, SUM(o.total_amount) AS total_spend_2024
  FROM orders o
  WHERE EXTRACT(YEAR FROM o.order_date) = 2024
  GROUP BY o.customer_id
) t
ORDER BY t.total_spend_2024 DESC
FETCH FIRST 5 ROWS ONLY;

Why it works: The inner query aggregates once; the outer query ranks and limits.

5) Scalar correlated subquery in SELECT

Goal: For each customer, show their most recent order date.

SELECT c.customer_id,
       c.customer_name,
       (
         SELECT MAX(o.order_date)
         FROM orders o
         WHERE o.customer_id = c.customer_id
       ) AS last_order_date
FROM customers c;

Why it works: The inner query returns a single value per customer row.

When to use what

Use EXISTS/NOT EXISTS for presence/absence checks. It short-circuits on first match and avoids duplicates.
Use IN when comparing a value to a small list from a subquery.
Use JOINs to bring columns from related tables. Use subqueries when you only need to filter or compute a single value.
Use derived tables (FROM) to stage complex aggregations for clarity.

Performance tips

Index join keys used in correlated subqueries (e.g., orders.customer_id).
Prefer EXISTS to IN when the inner set can be large or non-unique.
Ensure scalar subqueries really return one value; add LIMIT or aggregate if needed.
Consider rewriting heavy correlated subqueries as JOIN + GROUP BY if they run slowly.

Exercises (do these now)

These mirror the exercises provided below. Try to solve without looking at the solutions. Then check your work.

Exercise 1 — Orders above average total (uncorrelated)

Return orders where total_amount is greater than the average total_amount of all orders.

Expected columns: order_id, total_amount
Constraint: Use a subquery in WHERE

Show a small hint

Compare total_amount to a single AVG() computed once in a subquery.

Exercise 2 — Customers with 2+ orders in 2024 (correlated)

Return customers who placed at least two orders in 2024.

Expected columns: customer_id, customer_name
Constraint: Use a correlated subquery (scalar or EXISTS) in WHERE

Show a small hint

A scalar subquery can compute COUNT(*) per customer_id for 2024.

Exercise 3 — Products never ordered (anti-join with NOT EXISTS)

Return products that have no matching rows in order_items.

Expected columns: product_id, product_name
Constraint: Use NOT EXISTS

Show a small hint

Correlate order_items.product_id to products.product_id.

Self-check checklist

Your queries run without errors.
Exercise 1 uses one AVG() subquery, not a join.
Exercise 2 references the outer customer in the inner WHERE clause.
Exercise 3 uses NOT EXISTS and correlates on product_id.

Common mistakes and how to self-check

Mistake: Using a scalar subquery that returns multiple rows. Fix: Aggregate (e.g., MAX) or filter to one row.
Mistake: IN with a large, duplicate set causing slowdowns. Fix: Switch to EXISTS or SELECT DISTINCT in the subquery.
Mistake: Forgetting to correlate (missing join keys) so the subquery runs once and yields wrong logic. Fix: Confirm inner WHERE references outer columns.
Mistake: Selecting columns from the inner subquery in WHERE. Fix: WHERE subqueries cannot output columns to the SELECT list; join instead.
Mistake: Comparing NULLs incorrectly. Fix: Use IS NULL/IS NOT NULL or COALESCE where appropriate.

Quick self-review

Can I explain why my subquery is correlated or uncorrelated?
If the subquery runs per row, are the key columns indexed?
Does the filter logic match the business question exactly?

Practical projects

Churn detector: List customers with no orders in the last 90 days using NOT EXISTS; email them a win-back offer list.
Price positioner: Flag products priced above category average (scalar subquery) and summarize counts per category.
High-value cohort: Build a derived table for 2024 spend per customer, then rank and segment into tiers.

Who this is for

Junior to intermediate Data Analysts who know basic SELECT/JOIN/GROUP BY.
BI developers or data-savvy PMs who write SQL for analytics.

Prerequisites

Comfort with SELECT, WHERE, JOIN, GROUP BY
Basic understanding of indexes and primary/foreign keys

Learning path

Refresh JOINs and GROUP BY
Learn uncorrelated subqueries (scalar and IN)
Learn correlated subqueries (EXISTS/NOT EXISTS)
Practice derived tables in FROM
Optimize and rewrite between subqueries and joins

Next steps

Practice with real datasets (orders, products, customers).
Try rewriting your subqueries as joins and compare results and performance.
Take the Quick Test below to check mastery. Everyone can take it; saving progress requires login.

Mini challenge

Return products whose price is greater than the average price within their own category (category_id). Use a correlated subquery.

Show one possible answer

SELECT p.product_id, p.product_name, p.category_id, p.price
FROM products p
WHERE p.price > (
  SELECT AVG(p2.price)
  FROM products p2
  WHERE p2.category_id = p.category_id
);

Check: The inner subquery references p.category_id (correlated), and AVG runs per category.

Ready to test yourself? Open the Quick Test section. Estimated time: 5–7 minutes. Everyone can take it; only logged-in users get saved progress.

Menu

Subqueries Correlated Subqueries Uncorrelated Subqueries

Table of Contents

Why this matters

Concept explained simply

Mental model

Worked examples

1) Uncorrelated scalar subquery in WHERE

2) Correlated subquery with EXISTS

3) Anti-join with NOT EXISTS (customers with no 2024 orders)

4) Subquery in FROM (derived table)

5) Scalar correlated subquery in SELECT

When to use what

Performance tips

Exercises (do these now)

Exercise 1 — Orders above average total (uncorrelated)

Exercise 2 — Customers with 2+ orders in 2024 (correlated)

Exercise 3 — Products never ordered (anti-join with NOT EXISTS)

Self-check checklist

Common mistakes and how to self-check

Practical projects

Who this is for

Prerequisites

Learning path

Next steps

Mini challenge

Practice Exercises

Orders above average total (uncorrelated)

Instructions

Expected Output

Customers with 2+ orders in 2024 (correlated)

Products never ordered (NOT EXISTS)

Subqueries Correlated Subqueries Uncorrelated Subqueries — Quick Test

Have questions about Subqueries Correlated Subqueries Uncorrelated Subqueries?

AI Assistant