Who this is for
Data Analysts who run reports, build dashboards, and need precise group-level metrics like running totals, percent-of-category, and first/last records per group.
Prerequisites
- Comfort with SELECT, WHERE, ORDER BY
- GROUP BY and aggregate functions (SUM, COUNT, AVG)
- Basic date functions (e.g., DATE_TRUNC)
Why this matters
In real analytics work, you often need results within groups, not just at the whole-table level. Examples:
- Compute each product’s share within its category per month
- Find the first purchase per customer
- Create running totals and moving averages by store
- Deduplicate rows, keeping the latest per user
Partitioning lets you do all of that directly in SQL, without exporting to spreadsheets.
Real-world scenario
You’re asked: “What percent of each category’s revenue did Product X capture this month?” A GROUP BY alone can’t attach category totals to each product row. Window functions with PARTITION BY can.
Concept explained simply
Partitioning has two common meanings in SQL work:
- Analytics partitioning (window functions): PARTITION BY splits your result set into groups for calculations that stay at the row level (no row loss). Think: "+compute within each mini-table".
- Storage partitioning (table partitioning): Physically splits a large table into chunks (e.g., by date) to make queries faster and maintenance easier. You still query the table as one logical object.
Mental model
Imagine your data is sliced into stacks—one stack per group (e.g., per customer or per month+category). Window functions calculate statistics on each stack and write the result back on every row in that stack. GROUP BY, by contrast, compresses a stack down to one row.
Worked examples
Example 1: Percent of category per month (window SUM with PARTITION BY)
-- Goal: for each month and category, show product revenue and product share of category
WITH monthly AS (
SELECT
DATE_TRUNC('month', sale_date) AS sale_month,
category,
product_id,
SUM(revenue) AS product_revenue
FROM sales
GROUP BY 1,2,3
)
SELECT
sale_month,
category,
product_id,
product_revenue,
product_revenue
/ NULLIF(SUM(product_revenue) OVER (PARTITION BY sale_month, category), 0) AS category_share
FROM monthly
ORDER BY sale_month, category, product_id;
Why it works: SUM(...) OVER(PARTITION BY sale_month, category) gives the category total for each row. We divide product value by that total.
Example 2: First purchase per customer (ROW_NUMBER)
-- Keep the earliest order per customer
WITH ranked AS (
SELECT
customer_id,
order_id,
order_date,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date ASC, order_id ASC
) AS rn
FROM orders
)
SELECT *
FROM ranked
WHERE rn = 1;
Why it works: ROW_NUMBER restarts for each customer (the partition), ordering by earliest date.
Example 3: Running total by store (frame clause)
SELECT
store_id,
sale_date,
daily_sales,
SUM(daily_sales) OVER (
PARTITION BY store_id
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM store_sales
ORDER BY store_id, sale_date;
Why it works: The frame defines a cumulative sum from the start of each store’s partition to the current row.
Example 4 (awareness): Table partitioning by date
-- Conceptual example (syntax varies by database)
-- Postgres-style declarative partitioning
CREATE TABLE events (
event_id BIGINT,
event_time TIMESTAMPTZ,
user_id BIGINT
) PARTITION BY RANGE (event_time);
-- Partitions (monthly)
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
Why it matters: Large tables scan faster when the query touches only relevant partitions (e.g., WHERE event_time >= '2024-02-01').
Common mistakes and self-check
- Confusing GROUP BY with PARTITION BY: If rows disappear, you used GROUP BY; window functions keep row granularity.
- Forgetting ORDER BY in window calculations that depend on sequence: Running totals and first/last need ORDER BY.
- Using default window frames carelessly: Defaults can be RANGE-based and include ties. Prefer explicit ROWS frames for numeric stability.
- Dividing by a window total without NULLIF: Prevent divide-by-zero with NULLIF(total, 0).
- Partitioning by too many columns: Over-partitioning creates tiny groups and can be slow. Partition only by what defines the group.
Self-check prompts
- Can I explain the difference between GROUP BY and PARTITION BY in one sentence?
- Does my running total query specify ORDER BY and a clear frame?
- If I need the first/last row, am I using ROW_NUMBER or RANK with a tie-breaker?
- Did I sanity-check totals by re-aggregating results to the partition level?
Exercises (do these before the test)
These mirror the exercises below so you can practice in your SQL environment. Keep row-level results; avoid GROUP BY in the final SELECT unless stated.
Exercise 1: Category share by month
Table: sales(sale_date, category, product_id, revenue). Task: For each month and category, show product_id, revenue, and product share of its category.
- Step 1: Aggregate to monthly product revenue.
- Step 2: Use a window SUM over (month, category) to compute the denominator.
- Step 3: Divide product revenue by the window total, guarding for zero.
Hint
DATE_TRUNC to month, then SUM(revenue) OVER (PARTITION BY month, category).
Exercise 2: First purchase per customer
Table: orders(order_id, customer_id, order_date, amount). Task: Return one row per customer for their first purchase (earliest order_date). If ties, pick the smallest order_id.
- Use ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date ASC, order_id ASC).
- Filter rn = 1.
Hint
ROW_NUMBER restarts per customer when you use PARTITION BY customer_id.
- Checklist before you move on:
- Your query keeps row-level detail (no unintended aggregation)
- You used PARTITION BY correctly to scope calculations
- You explicitly defined ORDER BY for sequence-dependent windows
Practical projects
- Customer lifecycle panel: For each customer, compute first purchase date, last purchase date, purchase count, total revenue, and average days between purchases using window functions.
- Store performance dashboard table: Daily sales, 7-day moving average, running total, and percent change vs previous day by store.
- Large events table playbook: Propose a date-based table partitioning scheme with monthly partitions and an archiving rule. Include expected query filters to benefit from pruning.
Learning path
Next steps
- Re-implement the worked examples on your data warehouse.
- Do the two exercises and check results with small test subsets.
- Then take the quick test below. Everyone can take it; if you log in, your progress is saved.
Mini challenge
Table: subscriptions(user_id, plan, start_date, end_date). Produce, for each user, the active plan on their most recent start_date and the number of plans they have had. Return user_id, latest_plan, plan_count.
Hint
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY start_date DESC) to get latest_plan, and COUNT(*) OVER (PARTITION BY user_id) for plan_count.
Ready for the quick test?
Take the Quick Test now. Everyone can try it for free; logging in will save your progress.