luvv to helpDiscover the Best Free Online Tools
Topic 29 of 31

Partitioning Basics

Learn Partitioning Basics for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Who this is for

Data Analysts who run reports, build dashboards, and need precise group-level metrics like running totals, percent-of-category, and first/last records per group.

Prerequisites

  • Comfort with SELECT, WHERE, ORDER BY
  • GROUP BY and aggregate functions (SUM, COUNT, AVG)
  • Basic date functions (e.g., DATE_TRUNC)

Why this matters

In real analytics work, you often need results within groups, not just at the whole-table level. Examples:

  • Compute each product’s share within its category per month
  • Find the first purchase per customer
  • Create running totals and moving averages by store
  • Deduplicate rows, keeping the latest per user

Partitioning lets you do all of that directly in SQL, without exporting to spreadsheets.

Real-world scenario

You’re asked: “What percent of each category’s revenue did Product X capture this month?” A GROUP BY alone can’t attach category totals to each product row. Window functions with PARTITION BY can.

Concept explained simply

Partitioning has two common meanings in SQL work:

  • Analytics partitioning (window functions): PARTITION BY splits your result set into groups for calculations that stay at the row level (no row loss). Think: "+compute within each mini-table".
  • Storage partitioning (table partitioning): Physically splits a large table into chunks (e.g., by date) to make queries faster and maintenance easier. You still query the table as one logical object.

Mental model

Imagine your data is sliced into stacks—one stack per group (e.g., per customer or per month+category). Window functions calculate statistics on each stack and write the result back on every row in that stack. GROUP BY, by contrast, compresses a stack down to one row.

Worked examples

Example 1: Percent of category per month (window SUM with PARTITION BY)
-- Goal: for each month and category, show product revenue and product share of category
WITH monthly AS (
  SELECT
    DATE_TRUNC('month', sale_date) AS sale_month,
    category,
    product_id,
    SUM(revenue) AS product_revenue
  FROM sales
  GROUP BY 1,2,3
)
SELECT
  sale_month,
  category,
  product_id,
  product_revenue,
  product_revenue
    / NULLIF(SUM(product_revenue) OVER (PARTITION BY sale_month, category), 0) AS category_share
FROM monthly
ORDER BY sale_month, category, product_id;

Why it works: SUM(...) OVER(PARTITION BY sale_month, category) gives the category total for each row. We divide product value by that total.

Example 2: First purchase per customer (ROW_NUMBER)
-- Keep the earliest order per customer
WITH ranked AS (
  SELECT
    customer_id,
    order_id,
    order_date,
    ROW_NUMBER() OVER (
      PARTITION BY customer_id
      ORDER BY order_date ASC, order_id ASC
    ) AS rn
  FROM orders
)
SELECT *
FROM ranked
WHERE rn = 1;

Why it works: ROW_NUMBER restarts for each customer (the partition), ordering by earliest date.

Example 3: Running total by store (frame clause)
SELECT
  store_id,
  sale_date,
  daily_sales,
  SUM(daily_sales) OVER (
    PARTITION BY store_id
    ORDER BY sale_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM store_sales
ORDER BY store_id, sale_date;

Why it works: The frame defines a cumulative sum from the start of each store’s partition to the current row.

Example 4 (awareness): Table partitioning by date
-- Conceptual example (syntax varies by database)
-- Postgres-style declarative partitioning
CREATE TABLE events (
  event_id BIGINT,
  event_time TIMESTAMPTZ,
  user_id BIGINT
) PARTITION BY RANGE (event_time);

-- Partitions (monthly)
CREATE TABLE events_2024_01 PARTITION OF events
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

Why it matters: Large tables scan faster when the query touches only relevant partitions (e.g., WHERE event_time >= '2024-02-01').

Common mistakes and self-check

  • Confusing GROUP BY with PARTITION BY: If rows disappear, you used GROUP BY; window functions keep row granularity.
  • Forgetting ORDER BY in window calculations that depend on sequence: Running totals and first/last need ORDER BY.
  • Using default window frames carelessly: Defaults can be RANGE-based and include ties. Prefer explicit ROWS frames for numeric stability.
  • Dividing by a window total without NULLIF: Prevent divide-by-zero with NULLIF(total, 0).
  • Partitioning by too many columns: Over-partitioning creates tiny groups and can be slow. Partition only by what defines the group.
Self-check prompts
  • Can I explain the difference between GROUP BY and PARTITION BY in one sentence?
  • Does my running total query specify ORDER BY and a clear frame?
  • If I need the first/last row, am I using ROW_NUMBER or RANK with a tie-breaker?
  • Did I sanity-check totals by re-aggregating results to the partition level?

Exercises (do these before the test)

These mirror the exercises below so you can practice in your SQL environment. Keep row-level results; avoid GROUP BY in the final SELECT unless stated.

Exercise 1: Category share by month

Table: sales(sale_date, category, product_id, revenue). Task: For each month and category, show product_id, revenue, and product share of its category.

  • Step 1: Aggregate to monthly product revenue.
  • Step 2: Use a window SUM over (month, category) to compute the denominator.
  • Step 3: Divide product revenue by the window total, guarding for zero.
Hint

DATE_TRUNC to month, then SUM(revenue) OVER (PARTITION BY month, category).

Exercise 2: First purchase per customer

Table: orders(order_id, customer_id, order_date, amount). Task: Return one row per customer for their first purchase (earliest order_date). If ties, pick the smallest order_id.

  • Use ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date ASC, order_id ASC).
  • Filter rn = 1.
Hint

ROW_NUMBER restarts per customer when you use PARTITION BY customer_id.

  • Checklist before you move on:
    • Your query keeps row-level detail (no unintended aggregation)
    • You used PARTITION BY correctly to scope calculations
    • You explicitly defined ORDER BY for sequence-dependent windows

Practical projects

  • Customer lifecycle panel: For each customer, compute first purchase date, last purchase date, purchase count, total revenue, and average days between purchases using window functions.
  • Store performance dashboard table: Daily sales, 7-day moving average, running total, and percent change vs previous day by store.
  • Large events table playbook: Propose a date-based table partitioning scheme with monthly partitions and an archiving rule. Include expected query filters to benefit from pruning.

Learning path

1) Solidify GROUP BY aggregates
2) Learn window functions: PARTITION BY, ORDER BY, frames (ROWS vs RANGE)
3) Apply ranking functions: ROW_NUMBER, RANK, DENSE_RANK
4) Apply aggregate windows: SUM, COUNT, AVG, MIN, MAX
5) Learn storage partitioning basics (RANGE/LIST/HASH; time-based)
6) Combine both in real metrics and deduplication tasks

Next steps

  • Re-implement the worked examples on your data warehouse.
  • Do the two exercises and check results with small test subsets.
  • Then take the quick test below. Everyone can take it; if you log in, your progress is saved.

Mini challenge

Table: subscriptions(user_id, plan, start_date, end_date). Produce, for each user, the active plan on their most recent start_date and the number of plans they have had. Return user_id, latest_plan, plan_count.

Hint

ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY start_date DESC) to get latest_plan, and COUNT(*) OVER (PARTITION BY user_id) for plan_count.

Ready for the quick test?

Take the Quick Test now. Everyone can try it for free; logging in will save your progress.

Practice Exercises

2 exercises to complete

Instructions

Table: sales(sale_date, category, product_id, revenue).

Task: For each month and category, show product_id, monthly product revenue, and the product’s share of that category in that month.

  • Aggregate to monthly product revenue.
  • Compute category total with SUM(...) OVER (PARTITION BY month, category).
  • Divide product revenue by the category total. Protect against divide-by-zero.
Expected Output
Columns: sale_month, category, product_id, product_revenue, category_share (0–1). Example rows: 2024-01-01, Beverages, P01, 1200, 0.60 2024-01-01, Beverages, P02, 800, 0.40 2024-01-01, Snacks, P09, 500, 1.00

Partitioning Basics — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Partitioning Basics?

AI Assistant

Ask questions about this tool