luvv to helpDiscover the Best Free Online Tools
Topic 22 of 30

Binning With Cut Qcut

Learn Binning With Cut Qcut for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Who this is for

Data Analysts and learners who want to transform continuous variables (ages, income, scores) into meaningful categories using pandas pd.cut and pd.qcut.

Prerequisites

  • Basic Python and pandas (Series/DataFrame basics)
  • Comfort with filtering, value_counts, and simple aggregations
  • Installed pandas in your environment

Learning path

  1. Understand when to bin data and why it helps analysis
  2. Learn pd.cut for equal-width or custom edges
  3. Learn pd.qcut for equal-frequency (quantile) bins
  4. Handle labels, boundaries, missing values, and duplicates
  5. Apply bins to analysis and visualization

Why this matters

  • Customer analytics: create age groups, tenure buckets, or spend tiers to compare behavior.
  • Reporting: simplify continuous metrics into clear segments (low/medium/high) for dashboards.
  • Model features: engineer categorical features from continuous variables.
Real tasks you might do
  • Group customers into quartiles by monthly spend to target campaigns.
  • Bucket delivery times into on-time, slightly late, very late for SLA monitoring.
  • Convert exam scores to grade bands for education reports.

Concept explained simply

Binning turns a continuous number into a category by asking: “Which interval does this number fall into?”

  • pd.cut: you choose the cut points (bins). Good for business-defined bands, like [0, 50), [50, 100).
  • pd.qcut: pandas chooses cut points so each bin has (about) the same number of rows. Good for quartiles/deciles.

Mental model

Imagine laying a ruler (the number line). With pd.cut you mark your own tick marks. With pd.qcut the data itself decides where the tick marks go so each section holds a similar amount of data.

Key functions and parameters

  • pd.cut(x, bins, right=True, include_lowest=False, labels=None, precision=3)
    • bins: int (equal-width) or list of edges
    • right: whether intervals include the right edge
    • include_lowest: include the first interval’s left edge
    • labels: list or False (False returns integer bin codes)
  • pd.qcut(x, q, labels=None, duplicates='raise')
    • q: int (e.g., 4 for quartiles) or list of quantiles (0 to 1)
    • duplicates: 'drop' to handle non-unique quantile edges

Worked examples

1) Equal-width bins with pd.cut

import pandas as pd
import numpy as np

ages = pd.Series([5, 17, 18, 29, 35, 49, 50, 72, np.nan])
# Define edges and labels
edges = [0, 18, 35, 50, float('inf')]
labels = ['0–17', '18–34', '35–49', '50+']

age_group = pd.cut(ages, bins=edges, right=False, labels=labels)
print(age_group)
print(age_group.value_counts(dropna=False))

We used right=False to include the left edge and exclude the right, so 18 goes to 18–34 and 50 goes to 50+.

2) Automatic equal-width from number of bins

import pandas as pd
import numpy as np

scores = pd.Series([32, 45, 58, 63, 71, 79, 84, 92])
# 4 equal-width bins between min and max
binned = pd.cut(scores, bins=4)
print(binned.cat.categories)
print(binned.value_counts())

pd.cut computed 4 intervals spanning the min and max of scores.

3) Equal-frequency bins with pd.qcut (quartiles)

import pandas as pd

sales = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120])
quartile = pd.qcut(sales, q=4, labels=['Q1','Q2','Q3','Q4'])
print(quartile)
print(quartile.value_counts())

Each quartile has 3 values, so counts are balanced.

Edge cases and tips
  • NaN stays NaN after binning; handle with fillna if needed.
  • Labels length must equal number of intervals; if you pass 4 edges, you get 3 intervals.
  • pd.qcut may fail when many identical values cause duplicate quantile edges; set duplicates='drop' to proceed with fewer bins.
  • Boundary inclusion: right=False includes the left edge; include_lowest=True includes the very first left edge.

How to use bins in analysis

  • Group and aggregate: df.groupby('bin')['metric'].mean()
  • Distribution checks: bin_col.value_counts(normalize=True)
  • Visualization: bar charts of counts per bin, or color-encoding in scatter plots

Practice: follow along

Step 1: Create a Series of continuous values (e.g., ages or sales).
Step 2: Bin with pd.cut using custom edges and friendly labels.
Step 3: Bin with pd.qcut into quartiles and compare value_counts.
Step 4: Try different right/include_lowest settings; see where boundary values land.

Exercises

Complete the tasks below. Then open the solution blocks to compare.

Exercise 1 — Age groups with pd.cut

Recreate the AgeGroup labels for a small list of ages using left-closed, right-open intervals.

  • Data: [5, 17, 18, 29, 35, 49, 50, 72, NaN]
  • Edges: [0, 18, 35, 50, inf]
  • Labels: ['0–17', '18–34', '35–49', '50+']
  • Use right=False
Show a hint

Use pd.cut(..., bins=edges, labels=labels, right=False). Remember that NaN stays NaN.

Exercise 2 — Quartiles with pd.qcut

Bin the following incomes into quartiles labeled Q1–Q4.

  • Data: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]
  • Hint: pd.qcut(incomes, 4, labels=[...])
Show a hint

Value counts for each label should be equal when there are no duplicates causing edge collisions.

Self-check checklist

  • You verified boundary placement with right=False vs right=True.
  • Labels length matched number of intervals.
  • value_counts shows balanced counts for qcut on evenly spaced data.
  • You noted NaN values remain unbinned unless filled.

Common mistakes and how to spot them

  • Labels mismatch: ValueError about labels length. Fix by matching labels to number of intervals.
  • Wrong boundary inclusion: Values equal to an edge fall into the unexpected bin. Fix by adjusting right and include_lowest.
  • Using qcut on tiny datasets: Bins may collapse. Use duplicates='drop' or fewer bins.
  • Forgetting ordered categories: If you plan to sort bins logically, ensure the categorical is ordered (cat.as_ordered()).

Practical projects

  • Customer spend tiers: Create deciles with qcut for monthly spend. Compare churn rates by tier.
  • Delivery performance: Bin delivery minutes into [0, 10), [10, 30), [30, 60), [60, inf) and chart on-time vs late proportions.

Mini challenge

You have transaction amounts: [2, 5, 9, 15, 20, 26, 33, 47, 51, 68, 72, 90, 105, 130].

  • Create 5 equal-frequency bins with qcut (labels Q1–Q5). If duplicates cause issues, allow duplicates='drop' and note the final number of bins.
  • Create custom business bins: [0, 20, 50, 100, inf] with labels ['Micro','Small','Medium','Large'] using cut with right=False.
  • Report counts per bin for both methods and 1 insight you observe.
Need a nudge?

Start with pd.Series(data). Use value_counts() with sort=False to keep label order.

Next steps

  • Combine bins with groupby to compute KPIs per band.
  • Try deciles (q=10) and compare uplift in segmentation analyses.
  • Use pd.IntervalIndex to introspect interval boundaries when debugging.
Quick test info

Everyone can take the quick test. Only logged-in users have their progress saved automatically.

Practice Exercises

2 exercises to complete

Instructions

Given ages = [5, 17, 18, 29, 35, 49, 50, 72, None], bin into labels ['0–17','18–34','35–49','50+'] using edges [0, 18, 35, 50, inf]. Use right=False. Return the list of labels (NaN as 'NaN').

Expected Output
['0–17','0–17','18–34','18–34','35–49','35–49','50+','50+','NaN']

Binning With Cut Qcut — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Binning With Cut Qcut?

AI Assistant

Ask questions about this tool