Topic Not Found

Why this matters

As a Data Visualization Engineer, you turn raw tables into clear metrics and charts. Aggregations, GROUP BY, and HAVING are how you compute totals, averages, rates, and thresholds for dashboards. You'll use them to build KPI tiles (revenue, orders), segment metrics (by product, by channel), and filter groups (show only high-performing categories).

Concept explained simply

Aggregation collapses many rows into one numeric summary: COUNT, SUM, AVG, MIN, MAX. GROUP BY forms buckets (groups) based on one or more columns. HAVING filters entire groups after aggregation. WHERE filters individual rows before grouping.

Mental model: buckets and labels

Imagine pouring rows into labeled buckets (labels are the GROUP BY columns). Inside each bucket, you compute a single number using an aggregate function. HAVING decides which buckets stay. WHERE decides which rows even reach the buckets.

Execution order (helpful for debugging)

FROM (and JOIN)
WHERE (filters rows)
GROUP BY (creates buckets)
HAVING (filters buckets)
SELECT (computes expressions)
ORDER BY (sorts result)

Quick reference

COUNT(*) counts rows (includes NULLs). COUNT(col) counts non-NULL values in col.
SUM/AVG ignore NULL values.
Every non-aggregated column in SELECT must appear in GROUP BY.
Use HAVING with aggregates (e.g., HAVING SUM(...) > 1000). WHERE cannot use aggregates.
Conditional aggregation: SUM(CASE WHEN condition THEN 1 ELSE 0 END) as metric.

Worked examples

Example 1 — Orders per customer

-- Count completed orders per customer
SELECT
  customer_id,
  COUNT(*) AS completed_orders
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
ORDER BY completed_orders DESC;

What it shows: WHERE filtered to completed rows first, then GROUP BY counted per customer.

Example 2 — Monthly revenue, show only months above a threshold

-- Dialect note: use your platform's month trunc. Examples:
-- Postgres: DATE_TRUNC('month', order_date)
-- BigQuery: DATE_TRUNC(order_date, MONTH)
-- SQLite: STRFTIME('%Y-%m-01', order_date)

SELECT
  DATE_TRUNC('month', order_date) AS month,
  SUM(total_amount) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', order_date)
HAVING SUM(total_amount) >= 5000
ORDER BY month;

Key idea: HAVING filters the aggregated groups, not the raw rows.

Example 3 — Return rate by category

-- Compute returns / orders by category via conditional aggregation
SELECT
  p.category,
  COUNT(*) AS total_items,
  SUM(CASE WHEN o.status = 'returned' THEN 1 ELSE 0 END) AS returned_items,
  1.0 * SUM(CASE WHEN o.status = 'returned' THEN 1 ELSE 0 END) / COUNT(*) AS return_rate
FROM order_items oi
JOIN orders o   ON o.order_id = oi.order_id
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.category
HAVING 1.0 * SUM(CASE WHEN o.status = 'returned' THEN 1 ELSE 0 END) / COUNT(*) > 0.1
ORDER BY return_rate DESC;

Conditional aggregation lets you build metrics like rates without subqueries.

Who this is for

Data Visualization Engineers building dashboards and KPI tiles.
Analysts preparing grouped datasets for BI tools.
Anyone new to SQL who needs reliable summary metrics.

Prerequisites

Basic SELECT, WHERE, ORDER BY.
Comfort with simple JOINs (inner joins at minimum).

Learning path

Read the concepts and examples above.
Complete the exercises on realistic tables.
Take the Quick Test to confirm understanding. Note: everyone can take the test; only logged-in users have saved progress.
Apply the patterns to a small project (see Practical projects).

Exercises

Use this mini dataset for both exercises.

Sample tables

products

product_id | category
-----------+-----------
1          | Electronics
2          | Home
3          | Home
4          | Books

order_items

order_id | product_id | quantity | unit_price
---------+------------+----------+-----------
1        | 1          | 2        | 200
1        | 4          | 1        | 20
2        | 2          | 3        | 50
2        | 3          | 1        | 40
3        | 1          | 1        | 200
3        | 2          | 2        | 50
4        | 4          | 5        | 10
5        | 3          | 4        | 40

customers

customer_id | customer_name
------------+--------------
1           | Alice
2           | Bob
3           | Cara

orders

order_id | customer_id | order_date  | status     | total_amount
---------+-------------+-------------+------------+-------------
1        | 1           | 2024-01-03  | completed  | 420
2        | 2           | 2024-01-10  | completed  | 190
3        | 1           | 2024-02-01  | completed  | 300
4        | 3           | 2024-02-15  | canceled   | 50
5        | 2           | 2024-03-01  | completed  | 160

Exercise 1 — Revenue by category with HAVING

Compute revenue per category from order_items and products. Only show categories with revenue ≥ 500. Sort by revenue DESC.

Revenue = SUM(quantity * unit_price)
Join order_items to products
GROUP BY category, filter with HAVING

Hint

WHERE is not for aggregates; use HAVING SUM(...) >= 500.
Alias your SUM as revenue, but in HAVING prefer using the expression for portability.

Expected output

category     | revenue
-------------+--------
Electronics  | 600

Exercise 2 — Active customers with average order size

For each customer, count completed orders and compute the average completed order amount. Only show customers with at least 2 completed orders. Sort by completed_orders DESC, then avg_order_amount DESC.

Hints

Filter completed orders with WHERE before grouping.
AVG(total_amount) and COUNT(*) are both computed after grouping.

Expected output

customer_id | completed_orders | avg_order_amount
-----------+-------------------+-----------------
1          | 2                 | 360
2          | 2                 | 175

Exercise checklist

Grouped by the correct columns
Used HAVING (not WHERE) for aggregate filters
Verified no unintended double-counting from joins
Sorted the final results as requested

Common mistakes and self-check

Mistake: Using WHERE SUM(...). Fix: Use HAVING SUM(...).
Mistake: Selecting non-aggregated columns not in GROUP BY. Fix: Add them to GROUP BY or aggregate them.
Mistake: Double-counting after joins (e.g., joining orders to items and also to a many-to-many table). Fix: Aggregate to the correct grain before additional joins, or use DISTINCT carefully.
Mistake: COUNT(col) expecting to count NULLs. Fix: Use COUNT(*) for all rows.

Self-check routine

Describe the grain of your output (e.g., per category per month). Ensure GROUP BY matches this grain.
Print a small sample with SELECT ... LIMIT 5 to inspect duplications before aggregating.
Cross-check totals with an independent query (e.g., compute grand total separately).

Practical projects

Build a KPI summary: total revenue, orders, customers; plus revenue by category with a HAVING threshold.
Create a monthly dashboard dataset: revenue, orders, average order value by month, top categories.
Construct a return-rate report by category and by month using conditional aggregation.

Mini challenge

Using the sample tables, list categories where the number of items sold (quantity) is at least 6. Show category and items_sold, sorted DESC.

Show answer

SELECT
  p.category,
  SUM(oi.quantity) AS items_sold
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.category
HAVING SUM(oi.quantity) >= 6
ORDER BY items_sold DESC;

Items sold by category: Electronics = 3, Home = 10, Books = 6 — so Home and Books match ≥ 6.

Next steps

Practice with larger, real datasets: replicate your company’s KPI tiles with repeatable queries.
Learn to combine GROUP BY with window functions to add cumulative and ranking metrics.
Proceed to the Quick Test below to lock in the concepts. Everyone can take it; only logged-in users see saved progress.

Menu

Aggregations Group By Having

Table of Contents

Why this matters

Concept explained simply

Quick reference

Worked examples

Who this is for

Prerequisites

Learning path

Exercises

Exercise 1 — Revenue by category with HAVING

Exercise 2 — Active customers with average order size

Exercise checklist

Common mistakes and self-check

Practical projects

Mini challenge

Next steps

Practice Exercises

Revenue by category with HAVING

Instructions

Expected Output

Active customers with average order size

Aggregations Group By Having — Quick Test

Have questions about Aggregations Group By Having?

AI Assistant