Why this matters
As a Data Visualization Engineer, you turn raw tables into clear metrics and charts. Aggregations, GROUP BY, and HAVING are how you compute totals, averages, rates, and thresholds for dashboards. You'll use them to build KPI tiles (revenue, orders), segment metrics (by product, by channel), and filter groups (show only high-performing categories).
Concept explained simply
Aggregation collapses many rows into one numeric summary: COUNT, SUM, AVG, MIN, MAX. GROUP BY forms buckets (groups) based on one or more columns. HAVING filters entire groups after aggregation. WHERE filters individual rows before grouping.
Mental model: buckets and labels
Imagine pouring rows into labeled buckets (labels are the GROUP BY columns). Inside each bucket, you compute a single number using an aggregate function. HAVING decides which buckets stay. WHERE decides which rows even reach the buckets.
Execution order (helpful for debugging)
- FROM (and JOIN)
- WHERE (filters rows)
- GROUP BY (creates buckets)
- HAVING (filters buckets)
- SELECT (computes expressions)
- ORDER BY (sorts result)
Quick reference
- COUNT(*) counts rows (includes NULLs). COUNT(col) counts non-NULL values in col.
- SUM/AVG ignore NULL values.
- Every non-aggregated column in SELECT must appear in GROUP BY.
- Use HAVING with aggregates (e.g., HAVING SUM(...) > 1000). WHERE cannot use aggregates.
- Conditional aggregation: SUM(CASE WHEN condition THEN 1 ELSE 0 END) as metric.
Worked examples
Example 1 — Orders per customer
-- Count completed orders per customer
SELECT
customer_id,
COUNT(*) AS completed_orders
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
ORDER BY completed_orders DESC;
What it shows: WHERE filtered to completed rows first, then GROUP BY counted per customer.
Example 2 — Monthly revenue, show only months above a threshold
-- Dialect note: use your platform's month trunc. Examples:
-- Postgres: DATE_TRUNC('month', order_date)
-- BigQuery: DATE_TRUNC(order_date, MONTH)
-- SQLite: STRFTIME('%Y-%m-01', order_date)
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', order_date)
HAVING SUM(total_amount) >= 5000
ORDER BY month;
Key idea: HAVING filters the aggregated groups, not the raw rows.
Example 3 — Return rate by category
-- Compute returns / orders by category via conditional aggregation
SELECT
p.category,
COUNT(*) AS total_items,
SUM(CASE WHEN o.status = 'returned' THEN 1 ELSE 0 END) AS returned_items,
1.0 * SUM(CASE WHEN o.status = 'returned' THEN 1 ELSE 0 END) / COUNT(*) AS return_rate
FROM order_items oi
JOIN orders o ON o.order_id = oi.order_id
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.category
HAVING 1.0 * SUM(CASE WHEN o.status = 'returned' THEN 1 ELSE 0 END) / COUNT(*) > 0.1
ORDER BY return_rate DESC;
Conditional aggregation lets you build metrics like rates without subqueries.
Who this is for
- Data Visualization Engineers building dashboards and KPI tiles.
- Analysts preparing grouped datasets for BI tools.
- Anyone new to SQL who needs reliable summary metrics.
Prerequisites
- Basic SELECT, WHERE, ORDER BY.
- Comfort with simple JOINs (inner joins at minimum).
Learning path
- Read the concepts and examples above.
- Complete the exercises on realistic tables.
- Take the Quick Test to confirm understanding. Note: everyone can take the test; only logged-in users have saved progress.
- Apply the patterns to a small project (see Practical projects).
Exercises
Use this mini dataset for both exercises.
Sample tables
products
product_id | category
-----------+-----------
1 | Electronics
2 | Home
3 | Home
4 | Books
order_items
order_id | product_id | quantity | unit_price
---------+------------+----------+-----------
1 | 1 | 2 | 200
1 | 4 | 1 | 20
2 | 2 | 3 | 50
2 | 3 | 1 | 40
3 | 1 | 1 | 200
3 | 2 | 2 | 50
4 | 4 | 5 | 10
5 | 3 | 4 | 40
customers
customer_id | customer_name
------------+--------------
1 | Alice
2 | Bob
3 | Cara
orders
order_id | customer_id | order_date | status | total_amount
---------+-------------+-------------+------------+-------------
1 | 1 | 2024-01-03 | completed | 420
2 | 2 | 2024-01-10 | completed | 190
3 | 1 | 2024-02-01 | completed | 300
4 | 3 | 2024-02-15 | canceled | 50
5 | 2 | 2024-03-01 | completed | 160
Exercise 1 — Revenue by category with HAVING
Compute revenue per category from order_items and products. Only show categories with revenue ≥ 500. Sort by revenue DESC.
- Revenue = SUM(quantity * unit_price)
- Join order_items to products
- GROUP BY category, filter with HAVING
Hint
- WHERE is not for aggregates; use HAVING SUM(...) >= 500.
- Alias your SUM as revenue, but in HAVING prefer using the expression for portability.
Expected output
category | revenue
-------------+--------
Electronics | 600
Exercise 2 — Active customers with average order size
For each customer, count completed orders and compute the average completed order amount. Only show customers with at least 2 completed orders. Sort by completed_orders DESC, then avg_order_amount DESC.
Hints
- Filter completed orders with WHERE before grouping.
- AVG(total_amount) and COUNT(*) are both computed after grouping.
Expected output
customer_id | completed_orders | avg_order_amount
-----------+-------------------+-----------------
1 | 2 | 360
2 | 2 | 175
Exercise checklist
- Grouped by the correct columns
- Used HAVING (not WHERE) for aggregate filters
- Verified no unintended double-counting from joins
- Sorted the final results as requested
Common mistakes and self-check
- Mistake: Using WHERE SUM(...). Fix: Use HAVING SUM(...).
- Mistake: Selecting non-aggregated columns not in GROUP BY. Fix: Add them to GROUP BY or aggregate them.
- Mistake: Double-counting after joins (e.g., joining orders to items and also to a many-to-many table). Fix: Aggregate to the correct grain before additional joins, or use DISTINCT carefully.
- Mistake: COUNT(col) expecting to count NULLs. Fix: Use COUNT(*) for all rows.
Self-check routine
- Describe the grain of your output (e.g., per category per month). Ensure GROUP BY matches this grain.
- Print a small sample with SELECT ... LIMIT 5 to inspect duplications before aggregating.
- Cross-check totals with an independent query (e.g., compute grand total separately).
Practical projects
- Build a KPI summary: total revenue, orders, customers; plus revenue by category with a HAVING threshold.
- Create a monthly dashboard dataset: revenue, orders, average order value by month, top categories.
- Construct a return-rate report by category and by month using conditional aggregation.
Mini challenge
Using the sample tables, list categories where the number of items sold (quantity) is at least 6. Show category and items_sold, sorted DESC.
Show answer
SELECT
p.category,
SUM(oi.quantity) AS items_sold
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.category
HAVING SUM(oi.quantity) >= 6
ORDER BY items_sold DESC;
Items sold by category: Electronics = 3, Home = 10, Books = 6 — so Home and Books match ≥ 6.
Next steps
- Practice with larger, real datasets: replicate your company’s KPI tiles with repeatable queries.
- Learn to combine GROUP BY with window functions to add cumulative and ranking metrics.
- Proceed to the Quick Test below to lock in the concepts. Everyone can take it; only logged-in users see saved progress.