luvv to helpDiscover the Best Free Online Tools
Topic 7 of 12

Group By And Having

Learn Group By And Having for free with explanations, exercises, and a quick test (for Business Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Who this is for

  • Business Analysts who need to summarize data (sales by customer, tickets by agent, revenue by month).
  • Anyone comfortable with basic SELECT and WHERE who wants to produce clean, aggregated reports.

Prerequisites

  • Know SELECT, FROM, WHERE, ORDER BY.
  • Know basic aggregates: COUNT, SUM, AVG, MIN, MAX.

Learning path

  • Before: SELECT basics, WHERE filtering, aggregate functions.
  • Now: GROUP BY and HAVING.
  • Next: Joins with grouped data, CASE in aggregates, window functions (moving averages), and report formatting.

Why this matters

Real analyst tasks depend on grouping:

  • Monthly revenue by product category to track trends.
  • Customer segmentation: total orders and average order value per customer.
  • Operational dashboards: tickets closed per agent per week.
  • Anomaly checks: days with unusually high order counts.

Concept explained simply

GROUP BY turns row-level data into summary rows. Aggregates (SUM, COUNT, AVG, MIN, MAX) collapse many rows into one per group.

HAVING filters groups after they are formed. WHERE filters rows before grouping.

Mental model: buckets, then filters

Imagine you drop each row into a bucket labeled by your GROUP BY columns (e.g., one bucket per customer_id). Then you total or count inside each bucket. After that, HAVING keeps or removes whole buckets based on the aggregated results.

  • WHERE: “Which rows get into buckets?”
  • GROUP BY: “What defines each bucket?”
  • Aggregates: “What math inside each bucket?”
  • HAVING: “Which buckets survive?”

Key syntax

SELECT   group_col_1, group_col_2, AGG_FUNC(measure) AS metric
FROM     table
WHERE    row_filter
GROUP BY group_col_1, group_col_2
HAVING   aggregate_filter
ORDER BY metric DESC;
  • Every selected column must be either in GROUP BY or aggregated.
  • HAVING can use aggregates (e.g., HAVING SUM(amount) > 1000). WHERE cannot.
  • Filter raw rows (e.g., status = 'COMPLETED') in WHERE before grouping for correct results and performance.

Worked examples

Example 1: Total sales per customer and keep big spenders

Task: Show customers whose total completed sales in 2024 are at least 1,000.

SELECT   customer_id,
         SUM(amount) AS total_amount
FROM     orders
WHERE    status = 'COMPLETED'
  AND    created_at >= '2024-01-01'
  AND    created_at < '2025-01-01'
GROUP BY customer_id
HAVING   SUM(amount) >= 1000
ORDER BY total_amount DESC;

Result idea: one row per customer_id that meets the threshold, highest spend first.

Example 2: Daily order counts excluding cancelled

Task: Count orders per calendar day for non-cancelled orders.

SELECT   DATE(created_at) AS order_date,
         COUNT(*) AS order_count
FROM     orders
WHERE    status <> 'CANCELLED'
GROUP BY DATE(created_at)
ORDER BY order_date;

Key point: We exclude cancelled orders in WHERE before grouping.

Example 3: Average order value by region (ignore null regions)

Task: Average order amount for known customer regions, show regions with AVG > 50.

SELECT   region,
         AVG(amount) AS avg_order_value
FROM     orders
WHERE    region IS NOT NULL
GROUP BY region
HAVING   AVG(amount) > 50
ORDER BY avg_order_value DESC;

Tip: Use WHERE region IS NOT NULL to avoid a NULL group unless you intentionally want it.

Optional: Multi-column grouping

You can group by more than one column to get a finer grain.

SELECT   customer_id, DATE(created_at) AS order_date, COUNT(*) AS orders
FROM     orders
GROUP BY customer_id, DATE(created_at)
ORDER BY customer_id, order_date;

Practice: try it yourself

Use a table like orders(id, customer_id, amount, status, created_at, region).

  • Exercise 1 (mirrors ex1): Total completed 2024 sales per customer, keep customers with total_amount >= 2000, ordered by total_amount DESC.
  • Exercise 2 (mirrors ex2): Count orders per day excluding CANCELLED; show only days with at least 50 orders.
  • Checklist before you run:
    • Did you put non-aggregated selected columns in GROUP BY?
    • Did you use WHERE for row filters and HAVING for aggregate filters?
    • Is your date range correctly bounded (start inclusive, end exclusive)?

Common mistakes and self-check

  • Selecting a column that is neither grouped nor aggregated. Self-check: Every non-aggregated select item should appear in GROUP BY.
  • Using WHERE with aggregates. Self-check: If your filter includes SUM/COUNT/AVG, it must be in HAVING.
  • Filtering after grouping when you meant before. Self-check: If the filter is about raw rows (status, date, region), it belongs in WHERE.
  • Grouping by expressions inconsistently. Self-check: If you select DATE(created_at), group by the same expression DATE(created_at).
  • Forgetting NULL handling. Self-check: Decide whether NULL should be excluded (WHERE col IS NOT NULL) or grouped as its own bucket.

Practical projects

  • Customer leaderboard: Build a top-20 customers report showing total sales, average order value, and order count for the past quarter. Add a HAVING clause to keep only customers with at least 3 orders.
  • Operations heatmap: Count tickets resolved per agent per weekday, highlighting agents with AVG resolution time below a threshold (requires a table with duration fields).
  • Daily anomaly watch: For each day, count orders and flag days above a configurable cutoff using HAVING COUNT(*) >= threshold. Export as a clean table for dashboarding.

Mini challenge

Dataset assumption: orders(id, customer_id, amount, status, created_at), customers(id, region).

Task: Find regions with at least 10 completed orders in Q2 2024 and an average order amount of 75 or more. Show region, order_count, avg_amount; sort by avg_amount DESC.

Show one possible solution
SELECT   c.region,
         COUNT(*) AS order_count,
         AVG(o.amount) AS avg_amount
FROM     orders o
JOIN     customers c ON c.id = o.customer_id
WHERE    o.status = 'COMPLETED'
  AND    o.created_at >= '2024-04-01'
  AND    o.created_at < '2024-07-01'
  AND    c.region IS NOT NULL
GROUP BY c.region
HAVING   COUNT(*) >= 10
   AND   AVG(o.amount) >= 75
ORDER BY avg_amount DESC;

Next steps

  • Try the exercises below, then take the Quick Test at the end of this page.
  • Apply GROUP BY in a real dataset you use at work or study.
  • Move on to joining grouped results with dimension tables (e.g., products, customers).

Progress note: The Quick Test is available to everyone. Only logged-in users will have their progress saved.

Practice Exercises

2 exercises to complete

Instructions

Using orders(id, customer_id, amount, status, created_at):

  • Sum amount per customer for orders with status = 'COMPLETED'.
  • Limit to 2024 (from 2024-01-01 inclusive to 2025-01-01 exclusive).
  • Keep only customers with total_amount >= 2000.
  • Order by total_amount DESC.

Return columns: customer_id, total_amount.

Expected Output
Rows of customers meeting the threshold, e.g., customer_id | total_amount sorted from highest to lowest.

Group By And Having — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Group By And Having?

AI Assistant

Ask questions about this tool