Why this matters
As a BI Analyst, you will regularly summarize data to answer questions like: Which customers drive revenue? Which regions underperform? How many orders were delivered on time? Aggregations, GROUP BY, and HAVING let you turn raw rows into decision-ready metrics.
- Build KPIs: revenue, average order value, conversion rates
- Slice metrics by customer, product, region, or time
- Filter groups to focus on material segments (e.g., customers with 5+ orders)
Concept explained simply
Aggregate functions (COUNT, SUM, AVG, MIN, MAX) reduce many rows to a single value. GROUP BY partitions rows into buckets and applies aggregates per bucket. WHERE filters rows before grouping; HAVING filters the grouped results after aggregates are calculated.
Mental model: boxes and labels
1) You first pick rows (WHERE). 2) You put remaining rows into labeled boxes (GROUP BY columns). 3) For each box, you compute summaries (aggregates). 4) You keep or discard boxes (HAVING). 5) You print the final columns (SELECT) and sort (ORDER BY).
Key syntax you’ll use
SELECT group_col, AGG_FUNC(expr) AS metric
FROM table
WHERE prefilter
GROUP BY group_col
HAVING AGG_FUNC(expr) condition
ORDER BY metric DESC;
Common aggregate functions
- COUNT(*) — counts rows (includes NULLs)
- COUNT(col) — counts non-NULL values in col
- COUNT(DISTINCT col) — counts unique non-NULL values
- SUM(col), AVG(col), MIN(col), MAX(col)
Worked examples
1) Customers with at least 5 delivered orders in 2024
SELECT customer_id,
COUNT(*) AS orders_count
FROM orders
WHERE status = 'delivered'
AND order_date >= DATE '2024-01-01'
AND order_date < DATE '2025-01-01'
GROUP BY customer_id
HAVING COUNT(*) >= 5
ORDER BY orders_count DESC;
Why it works: WHERE limits to 2024 delivered orders; GROUP BY makes one group per customer; HAVING keeps only groups meeting the threshold.
2) Monthly revenue in 2024
SELECT EXTRACT(YEAR FROM order_date) AS yr,
EXTRACT(MONTH FROM order_date) AS mo,
SUM(total_amount) AS revenue
FROM orders
WHERE order_date >= DATE '2024-01-01'
AND order_date < DATE '2025-01-01'
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
ORDER BY yr, mo;
Tip: Use your database’s date functions if they differ; the idea is to group by month and sum.
3) Shipping methods with slow delivery
SELECT shipping_method,
AVG(delivery_days) AS avg_days
FROM shipments
WHERE shipped_date >= DATE '2024-01-01'
AND shipped_date < DATE '2025-01-01'
GROUP BY shipping_method
HAVING AVG(delivery_days) > 5
ORDER BY avg_days DESC;
Use HAVING when the condition involves an aggregate (AVG in this case).
4) Region x order-size buckets with minimum volume
SELECT COALESCE(c.region, 'Unknown') AS region,
CASE
WHEN o.total_amount < 50 THEN 'low'
WHEN o.total_amount < 200 THEN 'medium'
ELSE 'high'
END AS order_size_bucket,
COUNT(*) AS orders
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'delivered'
GROUP BY COALESCE(c.region, 'Unknown'),
CASE WHEN o.total_amount < 50 THEN 'low'
WHEN o.total_amount < 200 THEN 'medium'
ELSE 'high' END
HAVING COUNT(*) >= 100
ORDER BY region, order_size_bucket;
Bucket with CASE, handle NULLs with COALESCE, then filter groups by volume.
Practice checklist
- Filter raw rows with WHERE before grouping
- Only columns in SELECT are either grouped or aggregated
- Use HAVING for aggregate-based filters
- Guard against join duplicates (pre-aggregate or DISTINCT if needed)
- Handle NULLs explicitly (COALESCE, COUNT vs COUNT(col))
Exercises
These mirror the interactive exercises below. Try to write queries first; use hints and solutions only if needed.
Exercise 1: 2024 revenue per customer with threshold
Tables: orders(order_id, customer_id, status, total_amount, order_date)
- Sum revenue per customer for delivered orders in 2024
- Keep only customers with revenue ≥ 1000
- Sort by revenue desc
Hint
Filter in WHERE, group by customer_id, HAVING SUM(total_amount) ≥ 1000.
Exercise 2: Distinct buyers per product in Q2 2024
Tables: orders(order_id, customer_id, order_date), order_items(order_id, product_id, quantity, price)
- Join orders to order_items
- Count distinct customers per product for April–June 2024
- Keep products with ≥ 50 unique buyers; sort by that count desc
Hint
COUNT(DISTINCT o.customer_id) in SELECT and HAVING; group by product_id.
Exercise 3: Region-month AOV with minimum orders
Tables: customers(customer_id, region), orders(order_id, customer_id, status, total_amount, order_date)
- Join customers to orders
- For 2024 delivered orders, compute average order value per region and month
- Keep only region-month groups with ≥ 200 orders
- Sort by region asc, month asc
Hint
Group by region and EXTRACT(YEAR/MONTH). Use COUNT(*) in HAVING.
Common mistakes and self-check
- Using WHERE with aggregates instead of HAVING. Fix: move aggregate condition to HAVING.
- Selecting non-aggregated, non-grouped columns. Fix: add to GROUP BY or aggregate it.
- Join duplicates inflating counts/sums. Fix: pre-aggregate the many-side before joining or COUNT(DISTINCT key) carefully.
- COUNT(col) vs COUNT(*). Remember COUNT(col) ignores NULLs.
- Grouping by expressions but selecting aliases. Many databases require repeating the expression in GROUP BY.
- Date grouping mismatches (e.g., grouping by day but labeling as month). Ensure the grouping expression matches the label.
Quick self-audit
- Can you explain why each column appears in SELECT (grouped or aggregated)?
- Does WHERE reduce rows before grouping as intended?
- Could joins multiply rows? If yes, have you guarded against it?
- Do your HAVING conditions reference aggregates only?
Practical projects
- Cohort revenue summary: monthly revenue and order count per acquisition cohort, hide cohorts with < 100 orders
- Product performance board: distinct buyers, revenue, and average discount per product category, show categories with revenue ≥ 10,000
- On-time delivery: percent on-time per warehouse and month, highlight warehouses with < 90% on-time
Who this is for
- Aspiring or current BI Analysts who need reliable, sliceable metrics
- Data-savvy PMs/Analysts consolidating reports from SQL sources
Prerequisites
- Basic SELECT, WHERE, ORDER BY
- Joins (INNER/LEFT)
- Familiarity with dates and simple CASE
Learning path
- Refresh filtering and joins on a small dataset
- Practice simple GROUP BY with one column
- Add HAVING to enforce thresholds
- Scale up: multi-column groups, CASE buckets, date groups
- Battle-test: handle joins and DISTINCT to avoid double counting
Next steps
- Apply these patterns to your reporting tables and compare against existing dashboards
- Prepare snippets for common metrics (monthly revenue, active customers, AOV)
Quick Test
Take the quick test to check your understanding. Available to everyone; if you log in, your progress will be saved.
Mini challenge
Build a query that returns product categories with revenue ≥ 10,000 and a refund rate < 5% in 2024. Show: category, orders_count, revenue, refund_rate. Hint: compute refund_rate as SUM(refunded_amount)/SUM(total_amount) and filter with HAVING.