How to learn Binning With Cut Qcut for Python pandas in Data Analyst for free

Who this is for

Data Analysts and learners who want to transform continuous variables (ages, income, scores) into meaningful categories using pandas pd.cut and pd.qcut.

Prerequisites

Basic Python and pandas (Series/DataFrame basics)
Comfort with filtering, value_counts, and simple aggregations
Installed pandas in your environment

Learning path

Understand when to bin data and why it helps analysis
Learn pd.cut for equal-width or custom edges
Learn pd.qcut for equal-frequency (quantile) bins
Handle labels, boundaries, missing values, and duplicates
Apply bins to analysis and visualization

Why this matters

Customer analytics: create age groups, tenure buckets, or spend tiers to compare behavior.
Reporting: simplify continuous metrics into clear segments (low/medium/high) for dashboards.
Model features: engineer categorical features from continuous variables.

Real tasks you might do

Group customers into quartiles by monthly spend to target campaigns.
Bucket delivery times into on-time, slightly late, very late for SLA monitoring.
Convert exam scores to grade bands for education reports.

Concept explained simply

Binning turns a continuous number into a category by asking: “Which interval does this number fall into?”

pd.cut: you choose the cut points (bins). Good for business-defined bands, like [0, 50), [50, 100).
pd.qcut: pandas chooses cut points so each bin has (about) the same number of rows. Good for quartiles/deciles.

Mental model

Imagine laying a ruler (the number line). With pd.cut you mark your own tick marks. With pd.qcut the data itself decides where the tick marks go so each section holds a similar amount of data.

Key functions and parameters

pd.cut(x, bins, right=True, include_lowest=False, labels=None, precision=3)
- bins: int (equal-width) or list of edges
- right: whether intervals include the right edge
- include_lowest: include the first interval’s left edge
- labels: list or False (False returns integer bin codes)
pd.qcut(x, q, labels=None, duplicates='raise')
- q: int (e.g., 4 for quartiles) or list of quantiles (0 to 1)
- duplicates: 'drop' to handle non-unique quantile edges

Worked examples

1) Equal-width bins with pd.cut

import pandas as pd
import numpy as np

ages = pd.Series([5, 17, 18, 29, 35, 49, 50, 72, np.nan])
# Define edges and labels
edges = [0, 18, 35, 50, float('inf')]
labels = ['0–17', '18–34', '35–49', '50+']

age_group = pd.cut(ages, bins=edges, right=False, labels=labels)
print(age_group)
print(age_group.value_counts(dropna=False))

We used right=False to include the left edge and exclude the right, so 18 goes to 18–34 and 50 goes to 50+.

2) Automatic equal-width from number of bins

import pandas as pd
import numpy as np

scores = pd.Series([32, 45, 58, 63, 71, 79, 84, 92])
# 4 equal-width bins between min and max
binned = pd.cut(scores, bins=4)
print(binned.cat.categories)
print(binned.value_counts())

pd.cut computed 4 intervals spanning the min and max of scores.

3) Equal-frequency bins with pd.qcut (quartiles)

import pandas as pd

sales = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120])
quartile = pd.qcut(sales, q=4, labels=['Q1','Q2','Q3','Q4'])
print(quartile)
print(quartile.value_counts())

Each quartile has 3 values, so counts are balanced.

Edge cases and tips

NaN stays NaN after binning; handle with fillna if needed.
Labels length must equal number of intervals; if you pass 4 edges, you get 3 intervals.
pd.qcut may fail when many identical values cause duplicate quantile edges; set duplicates='drop' to proceed with fewer bins.
Boundary inclusion: right=False includes the left edge; include_lowest=True includes the very first left edge.

How to use bins in analysis

Group and aggregate: df.groupby('bin')['metric'].mean()
Distribution checks: bin_col.value_counts(normalize=True)
Visualization: bar charts of counts per bin, or color-encoding in scatter plots

Practice: follow along

Step 1: Create a Series of continuous values (e.g., ages or sales).

Step 2: Bin with pd.cut using custom edges and friendly labels.

Step 3: Bin with pd.qcut into quartiles and compare value_counts.

Step 4: Try different right/include_lowest settings; see where boundary values land.

Exercises

Complete the tasks below. Then open the solution blocks to compare.

Exercise 1 — Age groups with pd.cut

Recreate the AgeGroup labels for a small list of ages using left-closed, right-open intervals.

Data: [5, 17, 18, 29, 35, 49, 50, 72, NaN]
Edges: [0, 18, 35, 50, inf]
Labels: ['0–17', '18–34', '35–49', '50+']
Use right=False

Show a hint

Use pd.cut(..., bins=edges, labels=labels, right=False). Remember that NaN stays NaN.

Exercise 2 — Quartiles with pd.qcut

Bin the following incomes into quartiles labeled Q1–Q4.

Data: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]
Hint: pd.qcut(incomes, 4, labels=[...])

Show a hint

Value counts for each label should be equal when there are no duplicates causing edge collisions.

Self-check checklist

You verified boundary placement with right=False vs right=True.
Labels length matched number of intervals.
value_counts shows balanced counts for qcut on evenly spaced data.
You noted NaN values remain unbinned unless filled.

Common mistakes and how to spot them

Labels mismatch: ValueError about labels length. Fix by matching labels to number of intervals.
Wrong boundary inclusion: Values equal to an edge fall into the unexpected bin. Fix by adjusting right and include_lowest.
Using qcut on tiny datasets: Bins may collapse. Use duplicates='drop' or fewer bins.
Forgetting ordered categories: If you plan to sort bins logically, ensure the categorical is ordered (cat.as_ordered()).

Practical projects

Customer spend tiers: Create deciles with qcut for monthly spend. Compare churn rates by tier.
Delivery performance: Bin delivery minutes into [0, 10), [10, 30), [30, 60), [60, inf) and chart on-time vs late proportions.

Mini challenge

You have transaction amounts: [2, 5, 9, 15, 20, 26, 33, 47, 51, 68, 72, 90, 105, 130].

Create 5 equal-frequency bins with qcut (labels Q1–Q5). If duplicates cause issues, allow duplicates='drop' and note the final number of bins.
Create custom business bins: [0, 20, 50, 100, inf] with labels ['Micro','Small','Medium','Large'] using cut with right=False.
Report counts per bin for both methods and 1 insight you observe.

Need a nudge?

Start with pd.Series(data). Use value_counts() with sort=False to keep label order.

Next steps

Combine bins with groupby to compute KPIs per band.
Try deciles (q=10) and compare uplift in segmentation analyses.
Use pd.IntervalIndex to introspect interval boundaries when debugging.

Quick test info

Everyone can take the quick test. Only logged-in users have their progress saved automatically.

Menu

Binning With Cut Qcut

Table of Contents