Topic Not Found

Why this matters

As a BI Analyst, you will regularly summarize data to answer questions like: Which customers drive revenue? Which regions underperform? How many orders were delivered on time? Aggregations, GROUP BY, and HAVING let you turn raw rows into decision-ready metrics.

Build KPIs: revenue, average order value, conversion rates
Slice metrics by customer, product, region, or time
Filter groups to focus on material segments (e.g., customers with 5+ orders)

Concept explained simply

Aggregate functions (COUNT, SUM, AVG, MIN, MAX) reduce many rows to a single value. GROUP BY partitions rows into buckets and applies aggregates per bucket. WHERE filters rows before grouping; HAVING filters the grouped results after aggregates are calculated.

Mental model: boxes and labels

1) You first pick rows (WHERE). 2) You put remaining rows into labeled boxes (GROUP BY columns). 3) For each box, you compute summaries (aggregates). 4) You keep or discard boxes (HAVING). 5) You print the final columns (SELECT) and sort (ORDER BY).

Key syntax you’ll use

SELECT group_col, AGG_FUNC(expr) AS metric
FROM table
WHERE prefilter
GROUP BY group_col
HAVING AGG_FUNC(expr) condition
ORDER BY metric DESC;

Common aggregate functions

COUNT(*) — counts rows (includes NULLs)
COUNT(col) — counts non-NULL values in col
COUNT(DISTINCT col) — counts unique non-NULL values
SUM(col), AVG(col), MIN(col), MAX(col)

Worked examples

1) Customers with at least 5 delivered orders in 2024

SELECT customer_id,
       COUNT(*) AS orders_count
FROM orders
WHERE status = 'delivered'
  AND order_date >= DATE '2024-01-01'
  AND order_date < DATE '2025-01-01'
GROUP BY customer_id
HAVING COUNT(*) >= 5
ORDER BY orders_count DESC;

Why it works: WHERE limits to 2024 delivered orders; GROUP BY makes one group per customer; HAVING keeps only groups meeting the threshold.

2) Monthly revenue in 2024

SELECT EXTRACT(YEAR FROM order_date) AS yr,
       EXTRACT(MONTH FROM order_date) AS mo,
       SUM(total_amount) AS revenue
FROM orders
WHERE order_date >= DATE '2024-01-01'
  AND order_date < DATE '2025-01-01'
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
ORDER BY yr, mo;

Tip: Use your database’s date functions if they differ; the idea is to group by month and sum.

3) Shipping methods with slow delivery

SELECT shipping_method,
       AVG(delivery_days) AS avg_days
FROM shipments
WHERE shipped_date >= DATE '2024-01-01'
  AND shipped_date < DATE '2025-01-01'
GROUP BY shipping_method
HAVING AVG(delivery_days) > 5
ORDER BY avg_days DESC;

Use HAVING when the condition involves an aggregate (AVG in this case).

4) Region x order-size buckets with minimum volume

SELECT COALESCE(c.region, 'Unknown') AS region,
       CASE
         WHEN o.total_amount < 50 THEN 'low'
         WHEN o.total_amount < 200 THEN 'medium'
         ELSE 'high'
       END AS order_size_bucket,
       COUNT(*) AS orders
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'delivered'
GROUP BY COALESCE(c.region, 'Unknown'),
         CASE WHEN o.total_amount < 50 THEN 'low'
              WHEN o.total_amount < 200 THEN 'medium'
              ELSE 'high' END
HAVING COUNT(*) >= 100
ORDER BY region, order_size_bucket;

Bucket with CASE, handle NULLs with COALESCE, then filter groups by volume.

Practice checklist

Filter raw rows with WHERE before grouping
Only columns in SELECT are either grouped or aggregated
Use HAVING for aggregate-based filters
Guard against join duplicates (pre-aggregate or DISTINCT if needed)
Handle NULLs explicitly (COALESCE, COUNT vs COUNT(col))

Exercises

These mirror the interactive exercises below. Try to write queries first; use hints and solutions only if needed.

Exercise 1: 2024 revenue per customer with threshold

Tables: orders(order_id, customer_id, status, total_amount, order_date)

Sum revenue per customer for delivered orders in 2024
Keep only customers with revenue ≥ 1000
Sort by revenue desc

Hint

Filter in WHERE, group by customer_id, HAVING SUM(total_amount) ≥ 1000.

Exercise 2: Distinct buyers per product in Q2 2024

Tables: orders(order_id, customer_id, order_date), order_items(order_id, product_id, quantity, price)

Join orders to order_items
Count distinct customers per product for April–June 2024
Keep products with ≥ 50 unique buyers; sort by that count desc

Hint

COUNT(DISTINCT o.customer_id) in SELECT and HAVING; group by product_id.

Exercise 3: Region-month AOV with minimum orders

Tables: customers(customer_id, region), orders(order_id, customer_id, status, total_amount, order_date)

Join customers to orders
For 2024 delivered orders, compute average order value per region and month
Keep only region-month groups with ≥ 200 orders
Sort by region asc, month asc

Hint

Group by region and EXTRACT(YEAR/MONTH). Use COUNT(*) in HAVING.

Common mistakes and self-check

Using WHERE with aggregates instead of HAVING. Fix: move aggregate condition to HAVING.
Selecting non-aggregated, non-grouped columns. Fix: add to GROUP BY or aggregate it.
Join duplicates inflating counts/sums. Fix: pre-aggregate the many-side before joining or COUNT(DISTINCT key) carefully.
COUNT(col) vs COUNT(*). Remember COUNT(col) ignores NULLs.
Grouping by expressions but selecting aliases. Many databases require repeating the expression in GROUP BY.
Date grouping mismatches (e.g., grouping by day but labeling as month). Ensure the grouping expression matches the label.

Quick self-audit

Can you explain why each column appears in SELECT (grouped or aggregated)?
Does WHERE reduce rows before grouping as intended?
Could joins multiply rows? If yes, have you guarded against it?
Do your HAVING conditions reference aggregates only?

Practical projects

Cohort revenue summary: monthly revenue and order count per acquisition cohort, hide cohorts with < 100 orders
Product performance board: distinct buyers, revenue, and average discount per product category, show categories with revenue ≥ 10,000
On-time delivery: percent on-time per warehouse and month, highlight warehouses with < 90% on-time

Who this is for

Aspiring or current BI Analysts who need reliable, sliceable metrics
Data-savvy PMs/Analysts consolidating reports from SQL sources

Prerequisites

Basic SELECT, WHERE, ORDER BY
Joins (INNER/LEFT)
Familiarity with dates and simple CASE

Learning path

Refresh filtering and joins on a small dataset
Practice simple GROUP BY with one column
Add HAVING to enforce thresholds
Scale up: multi-column groups, CASE buckets, date groups
Battle-test: handle joins and DISTINCT to avoid double counting

Next steps

Apply these patterns to your reporting tables and compare against existing dashboards
Prepare snippets for common metrics (monthly revenue, active customers, AOV)

Quick Test

Take the quick test to check your understanding. Available to everyone; if you log in, your progress will be saved.

Mini challenge

Build a query that returns product categories with revenue ≥ 10,000 and a refund rate < 5% in 2024. Show: category, orders_count, revenue, refund_rate. Hint: compute refund_rate as SUM(refunded_amount)/SUM(total_amount) and filter with HAVING.

Menu

Aggregations Group By Having

Table of Contents

Why this matters

Concept explained simply

Key syntax you’ll use

Worked examples

Practice checklist

Exercises

Common mistakes and self-check

Practical projects

Who this is for

Prerequisites

Learning path

Next steps

Quick Test

Mini challenge

Practice Exercises

2024 revenue per customer with threshold

Instructions

Expected Output

Distinct buyers per product in Q2 2024

Region-month AOV with minimum orders

Aggregations Group By Having — Quick Test

Have questions about Aggregations Group By Having?

AI Assistant