luvv to helpDiscover the Best Free Online Tools
Topic 23 of 30

Value Counts Crosstab

Learn Value Counts Crosstab for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

As a Data Analyst, you will constantly answer questions like: What are the top customer segments? How often do users churn by plan? Which regions return products more? Two pandas tools make this fast and reliable: value_counts (frequency of a single column) and crosstab (contingency table across two or more variables). Mastering them means you can create clear summaries, percent distributions, and quick sanity checks in seconds.

  • Quickly find the top N categories and their share.
  • Compare distributions across segments (e.g., churn by region, returns by channel).
  • Produce totals and row/column percentages without custom code.

Concept explained simply

value_counts counts how many times each category appears in a Series. crosstab creates a table (rows vs columns) tallying counts for each combination of categories. Both can show percentages and include totals.

Mental model
  • value_counts: Imagine a tally sheet for one question: How many of each answer?
  • crosstab: Imagine a 2D grid: rows are one question (e.g., Region), columns another (e.g., Returned), and cells hold counts. Add margins for totals and normalize for percentages.

Quick syntax cheat-sheet

# 1) value_counts basics
s.value_counts()                       # counts, sorted by count desc
s.value_counts(normalize=True)         # share (proportions) sum to 1
s.value_counts(dropna=False)           # include NaN as a category
s.value_counts(sort=False)             # keep original order (no sorting)
s.value_counts().head(5)               # top 5 categories

# 2) crosstab basics
pd.crosstab(df['row_col'], df['col_col'])

# Add totals (margins)
pd.crosstab(df['A'], df['B'], margins=True, margins_name='Total')

# Percentages
pd.crosstab(df['A'], df['B'], normalize='index')   # row-wise percent
pd.crosstab(df['A'], df['B'], normalize='columns') # column-wise percent
pd.crosstab(df['A'], df['B'], normalize='all')     # table-wise percent

# Keep missing combinations
pd.crosstab(df['A'], df['B'], dropna=False)

# With bins (e.g., age groups)
pd.crosstab(pd.cut(df['age'], bins=[0,25,40,60,100], include_lowest=True), df['B'])

Worked examples

Example 1 — Top products with shares

import pandas as pd
products = pd.Series(['A','B','A','C','B','A','B','B','C','A', None])
counts = products.value_counts(dropna=False)
shares = products.value_counts(normalize=True, dropna=False)
print(counts)
print(shares.round(3))
What to notice
  • Including NaN exposes missing data rate.
  • normalize=True returns proportions useful for easy charting and comparisons.

Example 2 — Returns by channel with totals and row percentages

df = pd.DataFrame({
    'channel': ['Online','Online','Store','Store','Online','Store','Online','Online','Store','Store'],
    'returned': ['Yes','No','No','Yes','No','No','No','Yes','No','Yes']
})

ct = pd.crosstab(df['channel'], df['returned'], margins=True, margins_name='Total')
row_pct = pd.crosstab(df['channel'], df['returned'], normalize='index')
print(ct)
print((row_pct*100).round(1))
What to notice
  • margins adds a Total row/column for quick sanity checks.
  • normalize='index' shows per-channel shares (each row sums to 1).

Example 3 — Age-band vs churn, including missing and friendly percentages

df = pd.DataFrame({
    'age': [22, 37, 41, 59, None, 26, 63, 44, 38, 23],
    'churned': ['No','No','Yes','No','Yes','No','Yes','Yes','No','No']
})

age_band = pd.cut(df['age'], bins=[0,25,40,60,100], include_lowest=True)
ct = pd.crosstab(age_band, df['churned'], dropna=False)
row_pct = pd.crosstab(age_band, df['churned'], normalize='index', dropna=False)
print(ct)
print((row_pct*100).round(1).astype(str) + '%')
What to notice
  • Binning creates interpretable groups.
  • dropna=False keeps the (NaN, ...) band if any, helpful to audit missing ages.
  • Row percentages simplify comparison across age bands.

Exercises

Use the sample dataset below for both exercises. You can copy-paste it into a Python session.

import pandas as pd

df = pd.DataFrame({
    'order_id': [1,2,3,4,5,6,7,8,9,10,11,12],
    'region': ['North','South','East','West','North','East','West','South','North','West','East','South'],
    'channel': ['Online','Store','Store','Online','Online','Online','Store','Store','Online','Store','Online','Online'],
    'returned': ['No','No','Yes','No','Yes','No','No','Yes','No','Yes','No','No'],
    'category': ['A','B','A','C','A','B','C','A','A','C','B',None]
})

Exercise 1 — Top categories and their share

Task: Show category counts including missing values, plus percentage share. Display only the top 3 non-missing categories by count. Round shares to 1 decimal place (%).

  • Hint: Use value_counts twice (counts and normalize). Consider dropna.
  • Goal: A small table or Series with counts and percent for A, B, C.
Show a tiny nudge

Compute counts = df['category'].value_counts(dropna=False) and shares = ... normalize=True.

Exercise 2 — Returns by region (row percentages + totals)

Task: Build a crosstab of region vs returned with totals. Then produce row percentages (each region sums to 100%). Round to 1 decimal place.

  • Hint: margins=True for totals, normalize='index' for row percentages.
Show a tiny nudge

ct = pd.crosstab(df['region'], df['returned'], margins=True). Then another crosstab with normalize='index'.

Practice checklist

  • Can I include NaN in value_counts when needed?
  • Can I switch between counts and percentages quickly?
  • Can I add margins and read totals correctly?
  • Do my row/column percentages sum to ~1 (or 100%)?
  • Did I round for readability without losing meaning?

Common mistakes

  • Forgetting normalize='index' vs 'columns': If you want row-wise shares, use 'index'. Columns for column-wise shares.
  • Dropping NaN unintentionally: Use dropna=False to keep missing categories visible.
  • Comparing counts across groups with different sizes: Use percentages for fair comparison.
  • Reading margins incorrectly: The Total row/column summarizes counts; when using normalize, margins summarize proportions accordingly—double-check interpretation.
  • Sorting confusion: value_counts sorts by count by default. If you need alphabetical order, use sort=False then sort_index().
Self-check
  • Do my totals equal the dataset size?
  • Do row percentages sum to 100% (allow tiny rounding errors)?
  • Did I explicitly decide whether to include NaN?
  • Is the chosen normalization aligned with the question asked?

Practical projects

  • E-commerce snapshot: Use value_counts to list top 10 product categories and their shares; use crosstab to compare returns by shipping method with margins and row percentages.
  • Customer support: crosstab ticket severity by channel; produce both counts and row percentages; highlight which channel has the highest share of critical tickets.
  • Marketing campaign: value_counts on campaign source; crosstab source by conversion flag; export both as CSV after rounding percentages.

Who this is for

  • Aspiring and practicing Data Analysts needing fast categorical summaries.
  • Anyone using pandas to explore and report distributions.

Prerequisites

  • Basic Python (variables, lists, functions).
  • pandas fundamentals: Series, DataFrame, basic selection.

Learning path

  1. Refresh Series and DataFrame basics.
  2. Practice value_counts on multiple categorical columns; include dropna.
  3. Build crosstab tables; add margins; try different normalize modes.
  4. Format outputs (rounding, sorting) for stakeholder-friendly summaries.
  5. Apply to a real dataset and validate totals.

Mini challenge

Given a dataset of signups with columns plan (Basic/Pro/Team), country, and churn (Yes/No):

  • Find plan distribution with value_counts including NaN share.
  • Create crosstab of country vs churn with row percentages and totals.
  • In one sentence, interpret which country shows the highest churn share.

Note: The quick test is available to everyone. Only logged-in users will have their progress saved.

Next steps

  • Re-run the examples on your own data.
  • Complete the exercises above, then take the Quick Test below.
  • Move on to grouping and aggregation to build deeper analyses.

Practice Exercises

2 exercises to complete

Instructions

Using the provided df, compute category counts including missing values and percentage share. Show only the top 3 non-missing categories by count, with percent rounded to 1 decimal place.

Expected Output
A small table or two aligned Series showing A, B, C counts and approximately their % share; NaN shown in counts but not among the top 3 display.

Value Counts Crosstab — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Value Counts Crosstab?

AI Assistant

Ask questions about this tool