Aggregations Group By Having

Learn Aggregations Group By Having for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Why this matters

As a Data Analyst, you constantly summarize data to answer questions like: Which products bring the most revenue? How many users were active per day? Which channels perform best by region? Aggregations (SUM, COUNT, AVG, MIN, MAX) with GROUP BY and HAVING are the core tools to turn raw rows into decision-ready metrics.

Marketing: Compare weekly sign-ups by campaign and hide low-volume noise using HAVING.
Product: Count daily active users (DAU) by platform and alert when any platform falls below a threshold.
Finance: Sum revenue by customer and flag accounts over budget.

Concept explained simply

Aggregations collapse many rows into a single value: e.g., SUM of revenue, COUNT of orders. GROUP BY splits your table into buckets (groups), then the aggregation runs inside each bucket. HAVING filters after the aggregation, so you can keep or remove whole groups based on their aggregated values.

Mental model

Imagine you pour rows into labeled bins (GROUP BY). In each bin, you compress rows into summary numbers (aggregates). After compressing, you can toss out bins that don't meet your rules (HAVING). WHERE happens before grouping; HAVING happens after grouping.

Order of execution (simplified)

FROM (and JOIN)
WHERE (filters rows)
GROUP BY (creates groups)
HAVING (filters groups)
SELECT (choose/compute output columns)
ORDER BY (sorts results)

Worked examples

Example 1 — Orders per customer

Goal: Count orders for each customer.

-- Table: orders(id, customer_id, order_date, amount)
SELECT
  customer_id,
  COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC;

Why: Quickly spot high-activity customers.

Example 2 — Revenue per product with HAVING

Goal: Show products with total revenue over 1,000.

-- Table: order_items(order_id, product_id, qty, price)
SELECT
  product_id,
  SUM(qty * price) AS revenue
FROM order_items
GROUP BY product_id
HAVING SUM(qty * price) > 1000
ORDER BY revenue DESC;

Key point: HAVING filters groups using aggregated values.

Example 3 — Monthly active users (MAU) with threshold

Goal: Count distinct active users per month and show only months with at least 100 users.

-- Table: events(user_id, event_timestamp)
SELECT
  DATE_TRUNC('month', event_timestamp) AS month,
  COUNT(DISTINCT user_id) AS mau
FROM events
GROUP BY DATE_TRUNC('month', event_timestamp)
HAVING COUNT(DISTINCT user_id) >= 100
ORDER BY month;

Note: DATE_TRUNC function name may vary slightly by SQL dialect.

Useful patterns

Pre-filter with WHERE to speed up grouping

SELECT category, SUM(amount) AS revenue
FROM sales
WHERE sale_date >= DATE '2025-01-01'
GROUP BY category;

Multiple aggregates

SELECT
  campaign,
  COUNT(*) AS clicks,
  COUNT(DISTINCT user_id) AS unique_users,
  AVG(conversion_rate) AS avg_cr
FROM campaign_stats
GROUP BY campaign;

Filter groups by two conditions

SELECT store_id, SUM(revenue) AS rev, COUNT(*) AS txns
FROM receipts
GROUP BY store_id
HAVING SUM(revenue) > 5000 AND COUNT(*) >= 50;

Who this is for

New and practicing Data Analysts who need to summarize datasets reliably.
Anyone preparing for SQL interviews with practical aggregation tasks.

Prerequisites

Basic SELECT, FROM, WHERE, ORDER BY.
Comfort reading simple schemas and recognizing numeric vs text fields.

Learning path

Aggregations, GROUP BY, HAVING (this lesson)
Joins (combine dimensions with facts)
Window functions (rank, moving averages)
CTEs and subqueries (structure complex analyses)

Exercises you can run

Use the sample schemas shown in each task. Think before you run. Then compare with the solution.

Exercise 1 — Big and loyal customers

Schema: orders(id, customer_id, order_date, amount)

Task: For each customer, compute total_spend and order_count. Return only customers with order_count >= 3 and total_spend > 500. Sort by total_spend desc.

I used GROUP BY customer_id.
I filtered by aggregated values using HAVING.
I returned columns customer_id, total_spend, order_count.

Hint

Think SUM(amount) and COUNT(*). WHERE cannot check totals; use HAVING.

Expected output shape

Columns: customer_id, total_spend (numeric), order_count (integer). One row per qualifying customer.

Show solution

SELECT
  customer_id,
  SUM(amount) AS total_spend,
  COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 3 AND SUM(amount) > 500
ORDER BY total_spend DESC;

Exercise 2 — Daily active users by country

Schema: events(user_id, event_date, country)

Task: For each event_date and country, compute dau = COUNT(DISTINCT user_id). Return only groups with dau >= 100. Sort by event_date, country.

I grouped by event_date and country.
I used COUNT(DISTINCT user_id).
I filtered groups using HAVING.

Hint

GROUP BY both columns. Use HAVING to enforce dau threshold.

Expected output shape

Columns: event_date, country, dau (integer). One row per (date, country) with dau ≥ 100.

Show solution

SELECT
  event_date,
  country,
  COUNT(DISTINCT user_id) AS dau
FROM events
GROUP BY event_date, country
HAVING COUNT(DISTINCT user_id) >= 100
ORDER BY event_date, country;

Common mistakes and how to self-check

Using WHERE for aggregated filters. Fix: move to HAVING (e.g., HAVING SUM(amount) > 1000).
Selecting non-aggregated columns not in GROUP BY. Fix: every selected column must be aggregated or listed in GROUP BY.
COUNT(column) vs COUNT(*). COUNT(column) ignores NULLs; COUNT(*) counts rows. Choose intentionally.
Grouping by formatted expressions inconsistently. Fix: group by the exact expression you select (e.g., DATE_TRUNC(...) in both SELECT and GROUP BY).
Double-counting joins. Fix: aggregate on the correct granularity or deduplicate before grouping.

Self-check quick list

Do all SELECT columns appear in GROUP BY or use an aggregate?
Are group filters in HAVING, not WHERE?
Did I pre-filter rows in WHERE to speed up the query?
Did I handle NULLs correctly in COUNT/AVG?

Practical projects

Store performance dashboard: Aggregate weekly revenue, transactions, and average order value by store_id. Use HAVING to hide low-traffic stores.
Product portfolio health: Revenue, units, and return rate by category and product_id. Filter categories with at least 50 orders.
Engagement monitor: DAU/WAU/MAU by platform. Flag platforms with DAU under a threshold using HAVING.

Mini challenge

Schema: sales(order_id, product_id, category, quantity, price, sale_date)

Task A: Show revenue per category and filter categories with revenue >= 10,000.
Task B: Within each category, show products with revenue >= 20% of the category’s average product revenue (hint: two-step approach: aggregate to product, then compare to category average in HAVING or a CTE).

Reveal a possible approach

WITH product_rev AS (
  SELECT category, product_id, SUM(quantity * price) AS revenue
  FROM sales
  GROUP BY category, product_id
), cat_avg AS (
  SELECT category, AVG(revenue) AS avg_rev
  FROM product_rev
  GROUP BY category
)
SELECT p.category, p.product_id, p.revenue
FROM product_rev p
JOIN cat_avg c ON c.category = p.category
WHERE p.revenue >= 0.2 * c.avg_rev
ORDER BY p.category, p.revenue DESC;

Next steps

Practice on a real dataset (e.g., your company’s fact tables) while tracking time-to-run and verifying counts.
Learn window functions to rank within groups and compute running totals.
Combine GROUP BY with JOINs to bring in dimensions like region or channel.

Ready to check yourself?

The Quick Test below is available to everyone. Only logged-in users get saved progress and stats.

Practice Exercises

2 exercises to complete

Instructions

Schema: orders(id, customer_id, order_date, amount)

Task: For each customer, compute total_spend and order_count. Return only customers with order_count >= 3 and total_spend > 500. Sort by total_spend desc.

Expected Output

Columns: customer_id, total_spend (numeric), order_count (integer). One row per qualifying customer.