Why this matters
As a Data Analyst, you constantly summarize data to answer questions like: Which products bring the most revenue? How many users were active per day? Which channels perform best by region? Aggregations (SUM, COUNT, AVG, MIN, MAX) with GROUP BY and HAVING are the core tools to turn raw rows into decision-ready metrics.
- Marketing: Compare weekly sign-ups by campaign and hide low-volume noise using HAVING.
- Product: Count daily active users (DAU) by platform and alert when any platform falls below a threshold.
- Finance: Sum revenue by customer and flag accounts over budget.
Concept explained simply
Aggregations collapse many rows into a single value: e.g., SUM of revenue, COUNT of orders. GROUP BY splits your table into buckets (groups), then the aggregation runs inside each bucket. HAVING filters after the aggregation, so you can keep or remove whole groups based on their aggregated values.
Mental model
Imagine you pour rows into labeled bins (GROUP BY). In each bin, you compress rows into summary numbers (aggregates). After compressing, you can toss out bins that don't meet your rules (HAVING). WHERE happens before grouping; HAVING happens after grouping.
Order of execution (simplified)
- FROM (and JOIN)
- WHERE (filters rows)
- GROUP BY (creates groups)
- HAVING (filters groups)
- SELECT (choose/compute output columns)
- ORDER BY (sorts results)
Worked examples
Example 1 — Orders per customer
Goal: Count orders for each customer.
-- Table: orders(id, customer_id, order_date, amount)
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC;
Why: Quickly spot high-activity customers.
Example 2 — Revenue per product with HAVING
Goal: Show products with total revenue over 1,000.
-- Table: order_items(order_id, product_id, qty, price)
SELECT
product_id,
SUM(qty * price) AS revenue
FROM order_items
GROUP BY product_id
HAVING SUM(qty * price) > 1000
ORDER BY revenue DESC;
Key point: HAVING filters groups using aggregated values.
Example 3 — Monthly active users (MAU) with threshold
Goal: Count distinct active users per month and show only months with at least 100 users.
-- Table: events(user_id, event_timestamp)
SELECT
DATE_TRUNC('month', event_timestamp) AS month,
COUNT(DISTINCT user_id) AS mau
FROM events
GROUP BY DATE_TRUNC('month', event_timestamp)
HAVING COUNT(DISTINCT user_id) >= 100
ORDER BY month;
Note: DATE_TRUNC function name may vary slightly by SQL dialect.
Useful patterns
- Pre-filter with WHERE to speed up grouping
SELECT category, SUM(amount) AS revenue FROM sales WHERE sale_date >= DATE '2025-01-01' GROUP BY category; - Multiple aggregates
SELECT campaign, COUNT(*) AS clicks, COUNT(DISTINCT user_id) AS unique_users, AVG(conversion_rate) AS avg_cr FROM campaign_stats GROUP BY campaign; - Filter groups by two conditions
SELECT store_id, SUM(revenue) AS rev, COUNT(*) AS txns FROM receipts GROUP BY store_id HAVING SUM(revenue) > 5000 AND COUNT(*) >= 50;
Who this is for
- New and practicing Data Analysts who need to summarize datasets reliably.
- Anyone preparing for SQL interviews with practical aggregation tasks.
Prerequisites
- Basic SELECT, FROM, WHERE, ORDER BY.
- Comfort reading simple schemas and recognizing numeric vs text fields.
Learning path
- Aggregations, GROUP BY, HAVING (this lesson)
- Joins (combine dimensions with facts)
- Window functions (rank, moving averages)
- CTEs and subqueries (structure complex analyses)
Exercises you can run
Use the sample schemas shown in each task. Think before you run. Then compare with the solution.
Exercise 1 — Big and loyal customers
Schema: orders(id, customer_id, order_date, amount)
Task: For each customer, compute total_spend and order_count. Return only customers with order_count >= 3 and total_spend > 500. Sort by total_spend desc.
Hint
Think SUM(amount) and COUNT(*). WHERE cannot check totals; use HAVING.
Expected output shape
Columns: customer_id, total_spend (numeric), order_count (integer). One row per qualifying customer.
Show solution
SELECT
customer_id,
SUM(amount) AS total_spend,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 3 AND SUM(amount) > 500
ORDER BY total_spend DESC;
Exercise 2 — Daily active users by country
Schema: events(user_id, event_date, country)
Task: For each event_date and country, compute dau = COUNT(DISTINCT user_id). Return only groups with dau >= 100. Sort by event_date, country.
Hint
GROUP BY both columns. Use HAVING to enforce dau threshold.
Expected output shape
Columns: event_date, country, dau (integer). One row per (date, country) with dau ≥ 100.
Show solution
SELECT
event_date,
country,
COUNT(DISTINCT user_id) AS dau
FROM events
GROUP BY event_date, country
HAVING COUNT(DISTINCT user_id) >= 100
ORDER BY event_date, country;
Common mistakes and how to self-check
- Using WHERE for aggregated filters. Fix: move to HAVING (e.g., HAVING SUM(amount) > 1000).
- Selecting non-aggregated columns not in GROUP BY. Fix: every selected column must be aggregated or listed in GROUP BY.
- COUNT(column) vs COUNT(*). COUNT(column) ignores NULLs; COUNT(*) counts rows. Choose intentionally.
- Grouping by formatted expressions inconsistently. Fix: group by the exact expression you select (e.g., DATE_TRUNC(...) in both SELECT and GROUP BY).
- Double-counting joins. Fix: aggregate on the correct granularity or deduplicate before grouping.
Self-check quick list
Practical projects
- Store performance dashboard: Aggregate weekly revenue, transactions, and average order value by store_id. Use HAVING to hide low-traffic stores.
- Product portfolio health: Revenue, units, and return rate by category and product_id. Filter categories with at least 50 orders.
- Engagement monitor: DAU/WAU/MAU by platform. Flag platforms with DAU under a threshold using HAVING.
Mini challenge
Schema: sales(order_id, product_id, category, quantity, price, sale_date)
- Task A: Show revenue per category and filter categories with revenue >= 10,000.
- Task B: Within each category, show products with revenue >= 20% of the category’s average product revenue (hint: two-step approach: aggregate to product, then compare to category average in HAVING or a CTE).
Reveal a possible approach
WITH product_rev AS (
SELECT category, product_id, SUM(quantity * price) AS revenue
FROM sales
GROUP BY category, product_id
), cat_avg AS (
SELECT category, AVG(revenue) AS avg_rev
FROM product_rev
GROUP BY category
)
SELECT p.category, p.product_id, p.revenue
FROM product_rev p
JOIN cat_avg c ON c.category = p.category
WHERE p.revenue >= 0.2 * c.avg_rev
ORDER BY p.category, p.revenue DESC;
Next steps
- Practice on a real dataset (e.g., your company’s fact tables) while tracking time-to-run and verifying counts.
- Learn window functions to rank within groups and compute running totals.
- Combine GROUP BY with JOINs to bring in dimensions like region or channel.
Ready to check yourself?
The Quick Test below is available to everyone. Only logged-in users get saved progress and stats.