Who this is for
- Business Analysts who need to summarize data (sales by customer, tickets by agent, revenue by month).
- Anyone comfortable with basic SELECT and WHERE who wants to produce clean, aggregated reports.
Prerequisites
- Know SELECT, FROM, WHERE, ORDER BY.
- Know basic aggregates: COUNT, SUM, AVG, MIN, MAX.
Learning path
- Before: SELECT basics, WHERE filtering, aggregate functions.
- Now: GROUP BY and HAVING.
- Next: Joins with grouped data, CASE in aggregates, window functions (moving averages), and report formatting.
Why this matters
Real analyst tasks depend on grouping:
- Monthly revenue by product category to track trends.
- Customer segmentation: total orders and average order value per customer.
- Operational dashboards: tickets closed per agent per week.
- Anomaly checks: days with unusually high order counts.
Concept explained simply
GROUP BY turns row-level data into summary rows. Aggregates (SUM, COUNT, AVG, MIN, MAX) collapse many rows into one per group.
HAVING filters groups after they are formed. WHERE filters rows before grouping.
Mental model: buckets, then filters
Imagine you drop each row into a bucket labeled by your GROUP BY columns (e.g., one bucket per customer_id). Then you total or count inside each bucket. After that, HAVING keeps or removes whole buckets based on the aggregated results.
- WHERE: “Which rows get into buckets?”
- GROUP BY: “What defines each bucket?”
- Aggregates: “What math inside each bucket?”
- HAVING: “Which buckets survive?”
Key syntax
SELECT group_col_1, group_col_2, AGG_FUNC(measure) AS metric
FROM table
WHERE row_filter
GROUP BY group_col_1, group_col_2
HAVING aggregate_filter
ORDER BY metric DESC;
- Every selected column must be either in GROUP BY or aggregated.
- HAVING can use aggregates (e.g., HAVING SUM(amount) > 1000). WHERE cannot.
- Filter raw rows (e.g., status = 'COMPLETED') in WHERE before grouping for correct results and performance.
Worked examples
Example 1: Total sales per customer and keep big spenders
Task: Show customers whose total completed sales in 2024 are at least 1,000.
SELECT customer_id,
SUM(amount) AS total_amount
FROM orders
WHERE status = 'COMPLETED'
AND created_at >= '2024-01-01'
AND created_at < '2025-01-01'
GROUP BY customer_id
HAVING SUM(amount) >= 1000
ORDER BY total_amount DESC;
Result idea: one row per customer_id that meets the threshold, highest spend first.
Example 2: Daily order counts excluding cancelled
Task: Count orders per calendar day for non-cancelled orders.
SELECT DATE(created_at) AS order_date,
COUNT(*) AS order_count
FROM orders
WHERE status <> 'CANCELLED'
GROUP BY DATE(created_at)
ORDER BY order_date;
Key point: We exclude cancelled orders in WHERE before grouping.
Example 3: Average order value by region (ignore null regions)
Task: Average order amount for known customer regions, show regions with AVG > 50.
SELECT region,
AVG(amount) AS avg_order_value
FROM orders
WHERE region IS NOT NULL
GROUP BY region
HAVING AVG(amount) > 50
ORDER BY avg_order_value DESC;
Tip: Use WHERE region IS NOT NULL to avoid a NULL group unless you intentionally want it.
Optional: Multi-column grouping
You can group by more than one column to get a finer grain.
SELECT customer_id, DATE(created_at) AS order_date, COUNT(*) AS orders
FROM orders
GROUP BY customer_id, DATE(created_at)
ORDER BY customer_id, order_date;
Practice: try it yourself
Use a table like orders(id, customer_id, amount, status, created_at, region).
- Exercise 1 (mirrors ex1): Total completed 2024 sales per customer, keep customers with total_amount >= 2000, ordered by total_amount DESC.
- Exercise 2 (mirrors ex2): Count orders per day excluding CANCELLED; show only days with at least 50 orders.
- Checklist before you run:
- Did you put non-aggregated selected columns in GROUP BY?
- Did you use WHERE for row filters and HAVING for aggregate filters?
- Is your date range correctly bounded (start inclusive, end exclusive)?
Common mistakes and self-check
- Selecting a column that is neither grouped nor aggregated. Self-check: Every non-aggregated select item should appear in GROUP BY.
- Using WHERE with aggregates. Self-check: If your filter includes SUM/COUNT/AVG, it must be in HAVING.
- Filtering after grouping when you meant before. Self-check: If the filter is about raw rows (status, date, region), it belongs in WHERE.
- Grouping by expressions inconsistently. Self-check: If you select DATE(created_at), group by the same expression DATE(created_at).
- Forgetting NULL handling. Self-check: Decide whether NULL should be excluded (WHERE col IS NOT NULL) or grouped as its own bucket.
Practical projects
- Customer leaderboard: Build a top-20 customers report showing total sales, average order value, and order count for the past quarter. Add a HAVING clause to keep only customers with at least 3 orders.
- Operations heatmap: Count tickets resolved per agent per weekday, highlighting agents with AVG resolution time below a threshold (requires a table with duration fields).
- Daily anomaly watch: For each day, count orders and flag days above a configurable cutoff using HAVING COUNT(*) >= threshold. Export as a clean table for dashboarding.
Mini challenge
Dataset assumption: orders(id, customer_id, amount, status, created_at), customers(id, region).
Task: Find regions with at least 10 completed orders in Q2 2024 and an average order amount of 75 or more. Show region, order_count, avg_amount; sort by avg_amount DESC.
Show one possible solution
SELECT c.region,
COUNT(*) AS order_count,
AVG(o.amount) AS avg_amount
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'COMPLETED'
AND o.created_at >= '2024-04-01'
AND o.created_at < '2024-07-01'
AND c.region IS NOT NULL
GROUP BY c.region
HAVING COUNT(*) >= 10
AND AVG(o.amount) >= 75
ORDER BY avg_amount DESC;
Next steps
- Try the exercises below, then take the Quick Test at the end of this page.
- Apply GROUP BY in a real dataset you use at work or study.
- Move on to joining grouped results with dimension tables (e.g., products, customers).
Progress note: The Quick Test is available to everyone. Only logged-in users will have their progress saved.