How to learn Groupby Aggregations Multiple Aggregations Transform Filter for Python pandas in Data Analyst for free

Why this matters

As a Data Analyst, you often need to summarize large datasets by groups: customers, regions, channels, cohorts. pandas groupby + agg/transform/filter lets you compute KPIs, normalize values within groups, and keep only groups that meet business rules.

Marketing: Calculate conversion rate by campaign and country, then keep campaigns with enough impressions.
Product: Identify features with high average rating and at least N reviews.
Finance: Report revenue, orders, and unique customers by month and segment.

Concept explained simply

Think of groupby as arranging rows into labeled bins. Then you choose what to do with each bin:

agg: collapse each bin to a smaller summary (count, sum, mean, custom).
transform: compute a per-group value but return the original shape, aligned to each row (good for percentages, z-scores).
filter: keep or drop entire groups based on a condition.

Mental model: split → apply → combine. groupby creates the split, your function is apply, and pandas recombines results.

Key patterns you’ll use

Multiple aggregations per column with clear names using named aggregations.
Different aggregations for different columns in one pass.
transform to compute per-group totals/means that you can use per row.
filter to keep groups that meet business thresholds (volume, quality).

Syntax quick reference

import pandas as pd

# Named aggregations
(df.groupby(['colA','colB'])
   .agg(total=('value','sum'), avg=('value','mean'), n=('id','count'))
   .reset_index())

# Multiple aggs on same column
(df.groupby('group')['value']
   .agg(total='sum', avg='mean', q90=lambda s: s.quantile(0.9)))

# transform: returns same length as original
s = df.groupby('group')['value'].transform('sum')

# filter: keep whole groups that satisfy condition
filtered = df.groupby('group').filter(lambda g: g['value'].sum() > 100)

Worked examples

Example 1: Business summary with multiple aggregations

Goal: By region and category, compute total revenue, average order value, number of orders, and unique customers.

Show code and output

import pandas as pd

df = pd.DataFrame({
    'region': ['North','North','South','South','South'],
    'category': ['A','A','A','B','B'],
    'amount': [100,200,50,300,150],
    'order_id': [1,2,3,4,5],
    'customer_id': [10,10,20,30,30]
})

summary = (df.groupby(['region','category'])
             .agg(total_amount=('amount','sum'),
                  avg_amount=('amount','mean'),
                  orders=('order_id','count'),
                  unique_customers=('customer_id','nunique'))
             .reset_index()
             .sort_values(['region','category']))
print(summary)

Expected shape: one row per region-category with the four metrics.

Example 2: Within-group normalization using transform

Goal: Compute percent of region total and a per-user z-score for order amount.

Show code and output

# Percent of region total
region_total = df.groupby('region')['amount'].transform('sum')
df['pct_region'] = df['amount'] / region_total

# Z-score by customer (ddof=0 for population std)
df['amount_z_by_customer'] = df.groupby('customer_id')['amount'] \
                             .transform(lambda s: (s - s.mean()) / s.std(ddof=0))

print(df[['region','customer_id','amount','pct_region','amount_z_by_customer']])

Expected: pct_region sums to 1 within each region; z-scores are centered around 0 per customer.

Example 3: Keep only strong products (group filter)

Goal: Keep products with at least 3 ratings and average rating ≥ 4.0.

Show code and output

df2 = pd.DataFrame({
    'product': ['P1','P1','P1','P2','P2','P3'],
    'rating':  [4.5, 4.2, 3.9, 4.8, 4.9, 3.0]
})

strong = df2.groupby('product').filter(
    lambda g: (len(g) >= 3) and (g['rating'].mean() >= 4.0)
)
print(strong)

Expected: Only rows for products that meet both conditions remain.

Exercises you can run

Mirror of the exercises below. Run them in your own environment and compare with the expected outputs.

Exercise 1: Multi-aggregation with named outputs.
Exercise 2: transform for within-group metrics.
Exercise 3: filter to enforce business thresholds.

Common mistakes and self-check

Mistake: Using agg when you need per-row aligned values. Self-check: If you need the same number of rows as input, use transform.
Mistake: Forgetting reset_index() after groupby.agg, then being confused by index. Self-check: Do you need grouping keys as columns? Use as_index=False or reset_index().
Mistake: filter condition returns a Series. Self-check: Your function must return a single True/False per group.
Mistake: Mixing row-wise operations inside agg. Self-check: agg reduces; avoid referencing outside the grouped columns in reductions.
Mistake: Dividing by the global total instead of group total. Self-check: Compute totals via transform per group.

Who this is for

Data Analysts and BI professionals summarizing datasets by segments.
Anyone preparing dashboards, cohort analyses, or A/B test reports.

Prerequisites

Basic pandas: DataFrame creation, column selection, filtering.
Python basics: functions and lambdas.
Comfort with descriptive statistics (mean, count, sum).

Learning path

Review groupby basics: what forms the group keys and how indexes behave.
Practice named aggregations with multiple metrics per group.
Use transform to compute per-group totals/means and create ratio columns.
Apply filter to enforce volume or quality thresholds on groups.
Combine: filter → agg → transform for robust reporting.

Practical projects

E-commerce KPI board: By country and device, compute revenue, AOV, orders, unique buyers; add each order’s percent of country total.
Product quality screen: Keep products with ≥ 20 reviews and avg rating ≥ 4.2; rank by revenue.
Sales team report: For each rep, calculate monthly totals and a z-score of deal size within rep; highlight outliers.

Mini challenge

Given orders(user_id, region, amount, channel), create:

A summary per region-channel with total_amount, avg_amount, orders, unique_users.
A column on the original DataFrame with percent of channel total within the same region.
Filter to keep only region-channel pairs with ≥ 100 orders and total_amount ≥ 50,000.

Tip: Use a multi-key groupby for both agg and transform.

Next steps

Rerun your last team report using named aggregations for clarity and fewer post-processing steps.
Add at least one transform-based metric (e.g., percent-of-group) to your next dashboard.
Use filter to enforce a minimum volume for any KPI you present.

Quick Test

Take the quick test below to check your understanding. Available to everyone; only logged-in users get saved progress.

Menu

Groupby Aggregations Multiple Aggregations Transform Filter

Table of Contents

Why this matters

Concept explained simply

Key patterns you’ll use

Worked examples

Example 1: Business summary with multiple aggregations

Example 2: Within-group normalization using transform

Example 3: Keep only strong products (group filter)

Exercises you can run

Common mistakes and self-check

Who this is for

Prerequisites

Learning path

Practical projects

Mini challenge

Next steps

Quick Test

Practice Exercises

Multi-aggregation with named outputs

Instructions

Expected Output

transform for percent-of-group

Filter groups by volume and quality

Groupby Aggregations Multiple Aggregations Transform Filter — Quick Test

Have questions about Groupby Aggregations Multiple Aggregations Transform Filter?

AI Assistant