How to learn Value Counts Crosstab for Python pandas in Data Analyst for free

Why this matters

As a Data Analyst, you will constantly answer questions like: What are the top customer segments? How often do users churn by plan? Which regions return products more? Two pandas tools make this fast and reliable: value_counts (frequency of a single column) and crosstab (contingency table across two or more variables). Mastering them means you can create clear summaries, percent distributions, and quick sanity checks in seconds.

Quickly find the top N categories and their share.
Compare distributions across segments (e.g., churn by region, returns by channel).
Produce totals and row/column percentages without custom code.

Concept explained simply

value_counts counts how many times each category appears in a Series. crosstab creates a table (rows vs columns) tallying counts for each combination of categories. Both can show percentages and include totals.

Mental model

value_counts: Imagine a tally sheet for one question: How many of each answer?
crosstab: Imagine a 2D grid: rows are one question (e.g., Region), columns another (e.g., Returned), and cells hold counts. Add margins for totals and normalize for percentages.

Quick syntax cheat-sheet

# 1) value_counts basics
s.value_counts()                       # counts, sorted by count desc
s.value_counts(normalize=True)         # share (proportions) sum to 1
s.value_counts(dropna=False)           # include NaN as a category
s.value_counts(sort=False)             # keep original order (no sorting)
s.value_counts().head(5)               # top 5 categories

# 2) crosstab basics
pd.crosstab(df['row_col'], df['col_col'])

# Add totals (margins)
pd.crosstab(df['A'], df['B'], margins=True, margins_name='Total')

# Percentages
pd.crosstab(df['A'], df['B'], normalize='index')   # row-wise percent
pd.crosstab(df['A'], df['B'], normalize='columns') # column-wise percent
pd.crosstab(df['A'], df['B'], normalize='all')     # table-wise percent

# Keep missing combinations
pd.crosstab(df['A'], df['B'], dropna=False)

# With bins (e.g., age groups)
pd.crosstab(pd.cut(df['age'], bins=[0,25,40,60,100], include_lowest=True), df['B'])

Worked examples

Example 1 — Top products with shares

import pandas as pd
products = pd.Series(['A','B','A','C','B','A','B','B','C','A', None])
counts = products.value_counts(dropna=False)
shares = products.value_counts(normalize=True, dropna=False)
print(counts)
print(shares.round(3))

What to notice

Including NaN exposes missing data rate.
normalize=True returns proportions useful for easy charting and comparisons.

Example 2 — Returns by channel with totals and row percentages

df = pd.DataFrame({
    'channel': ['Online','Online','Store','Store','Online','Store','Online','Online','Store','Store'],
    'returned': ['Yes','No','No','Yes','No','No','No','Yes','No','Yes']
})

ct = pd.crosstab(df['channel'], df['returned'], margins=True, margins_name='Total')
row_pct = pd.crosstab(df['channel'], df['returned'], normalize='index')
print(ct)
print((row_pct*100).round(1))

What to notice

margins adds a Total row/column for quick sanity checks.
normalize='index' shows per-channel shares (each row sums to 1).

Example 3 — Age-band vs churn, including missing and friendly percentages

df = pd.DataFrame({
    'age': [22, 37, 41, 59, None, 26, 63, 44, 38, 23],
    'churned': ['No','No','Yes','No','Yes','No','Yes','Yes','No','No']
})

age_band = pd.cut(df['age'], bins=[0,25,40,60,100], include_lowest=True)
ct = pd.crosstab(age_band, df['churned'], dropna=False)
row_pct = pd.crosstab(age_band, df['churned'], normalize='index', dropna=False)
print(ct)
print((row_pct*100).round(1).astype(str) + '%')

What to notice

Binning creates interpretable groups.
dropna=False keeps the (NaN, ...) band if any, helpful to audit missing ages.
Row percentages simplify comparison across age bands.

Exercises

Use the sample dataset below for both exercises. You can copy-paste it into a Python session.

import pandas as pd

df = pd.DataFrame({
    'order_id': [1,2,3,4,5,6,7,8,9,10,11,12],
    'region': ['North','South','East','West','North','East','West','South','North','West','East','South'],
    'channel': ['Online','Store','Store','Online','Online','Online','Store','Store','Online','Store','Online','Online'],
    'returned': ['No','No','Yes','No','Yes','No','No','Yes','No','Yes','No','No'],
    'category': ['A','B','A','C','A','B','C','A','A','C','B',None]
})

Task: Show category counts including missing values, plus percentage share. Display only the top 3 non-missing categories by count. Round shares to 1 decimal place (%).

Hint: Use value_counts twice (counts and normalize). Consider dropna.
Goal: A small table or Series with counts and percent for A, B, C.

Show a tiny nudge

Compute counts = df['category'].value_counts(dropna=False) and shares = ... normalize=True.

Exercise 2 — Returns by region (row percentages + totals)

Task: Build a crosstab of region vs returned with totals. Then produce row percentages (each region sums to 100%). Round to 1 decimal place.

Hint: margins=True for totals, normalize='index' for row percentages.

Show a tiny nudge

ct = pd.crosstab(df['region'], df['returned'], margins=True). Then another crosstab with normalize='index'.

Practice checklist

Can I include NaN in value_counts when needed?
Can I switch between counts and percentages quickly?
Can I add margins and read totals correctly?
Do my row/column percentages sum to ~1 (or 100%)?
Did I round for readability without losing meaning?

Common mistakes

Forgetting normalize='index' vs 'columns': If you want row-wise shares, use 'index'. Columns for column-wise shares.
Dropping NaN unintentionally: Use dropna=False to keep missing categories visible.
Comparing counts across groups with different sizes: Use percentages for fair comparison.
Reading margins incorrectly: The Total row/column summarizes counts; when using normalize, margins summarize proportions accordingly—double-check interpretation.
Sorting confusion: value_counts sorts by count by default. If you need alphabetical order, use sort=False then sort_index().

Self-check

Do my totals equal the dataset size?
Do row percentages sum to 100% (allow tiny rounding errors)?
Did I explicitly decide whether to include NaN?
Is the chosen normalization aligned with the question asked?

Practical projects

E-commerce snapshot: Use value_counts to list top 10 product categories and their shares; use crosstab to compare returns by shipping method with margins and row percentages.
Customer support: crosstab ticket severity by channel; produce both counts and row percentages; highlight which channel has the highest share of critical tickets.
Marketing campaign: value_counts on campaign source; crosstab source by conversion flag; export both as CSV after rounding percentages.

Who this is for

Aspiring and practicing Data Analysts needing fast categorical summaries.
Anyone using pandas to explore and report distributions.

Prerequisites

Basic Python (variables, lists, functions).
pandas fundamentals: Series, DataFrame, basic selection.

Learning path

Refresh Series and DataFrame basics.
Practice value_counts on multiple categorical columns; include dropna.
Build crosstab tables; add margins; try different normalize modes.
Format outputs (rounding, sorting) for stakeholder-friendly summaries.
Apply to a real dataset and validate totals.

Mini challenge

Given a dataset of signups with columns plan (Basic/Pro/Team), country, and churn (Yes/No):

Find plan distribution with value_counts including NaN share.
Create crosstab of country vs churn with row percentages and totals.
In one sentence, interpret which country shows the highest churn share.

Note: The quick test is available to everyone. Only logged-in users will have their progress saved.

Next steps

Re-run the examples on your own data.
Complete the exercises above, then take the Quick Test below.
Move on to grouping and aggregation to build deeper analyses.

Menu

Value Counts Crosstab

Table of Contents

Why this matters

Concept explained simply

Quick syntax cheat-sheet

Worked examples

Example 1 — Top products with shares

Example 2 — Returns by channel with totals and row percentages

Example 3 — Age-band vs churn, including missing and friendly percentages

Exercises

Exercise 2 — Returns by region (row percentages + totals)

Practice checklist

Common mistakes

Practical projects

Who this is for

Prerequisites

Learning path

Mini challenge

Next steps

Practice Exercises

Top categories and their share

Instructions

Expected Output

Returns by region with totals and row percentages

Value Counts Crosstab — Quick Test

Have questions about Value Counts Crosstab?

AI Assistant

Menu

Value Counts Crosstab

Table of Contents

Why this matters

Concept explained simply

Quick syntax cheat-sheet

Worked examples

Example 1 — Top products with shares

Example 2 — Returns by channel with totals and row percentages

Example 3 — Age-band vs churn, including missing and friendly percentages

Exercises

Exercise 1 — Top categories and their share

Exercise 2 — Returns by region (row percentages + totals)

Practice checklist

Common mistakes

Practical projects

Who this is for

Prerequisites

Learning path

Mini challenge

Next steps

Practice Exercises

Top categories and their share

Instructions

Expected Output

Returns by region with totals and row percentages

Value Counts Crosstab — Quick Test

Have questions about Value Counts Crosstab?

AI Assistant