luvv to helpDiscover the Best Free Online Tools
Topic 19 of 30

Working with MultiIndex

Learn Working with MultiIndex for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

Real datasets often have multiple keys: region → store → product, or year → quarter → month. pandas MultiIndex lets you model these hierarchies cleanly, making it easier to slice, summarize, and reshape complex data without awkward string parsing or repeated columns.

  • Sales analysis: drill from country to city to store with one index.
  • Experiment tracking: model run → metric → variant in tidy form.
  • Finance: time series by asset class → ticker → field (Open/Close).

Who this is for

Data Analysts and learners comfortable with pandas basics (DataFrame, Series, indexing, groupby, pivot_table) who need reliable patterns for hierarchical data.

Prerequisites

  • Python and pandas installed.
  • Know DataFrame/Series, .loc/.iloc, groupby, pivot_table.
  • Basic understanding of tidy vs wide data.

Concept explained simply

A MultiIndex is a hierarchical index: instead of one label per row (or column), you have a tuple of labels across levels. Think of it like a folder path: (Region, Store, Product). You can select at any level or combination of levels.

Mental model

Imagine a tree: the top level splits into branches (e.g., Region). Each branch splits into sub-branches (Store), and then leaves (Product). Selecting with .loc navigates the path. Reshaping (stack/unstack) moves a level between rows and columns.

Mini cheat-sheet
  • Create: set_index([...], append=...), MultiIndex.from_product, from_tuples
  • Select: .loc[(level1, level2), ...], .loc[pd.IndexSlice[…]]
  • Slice ranges: pd.IndexSlice for slices across levels
  • Cross-section: .xs(key, level=..., axis=0/1)
  • Rearrange: .swaplevel, .reorder_levels, .sort_index
  • Reshape: .stack(level=...), .unstack(level=...)
  • Clean-up: .droplevel, .rename_axis, .reset_index

Core operations you'll use

  • Creating indices: set_index, groupby(...).sum() (creates MultiIndex), pivot_table (often MultiIndex columns)
  • Label-based selection: .loc with tuples; .xs for quick cross-sections
  • Partial slicing: use slices and IndexSlice for ranges within levels
  • Reordering levels: swaplevel, reorder_levels
  • Sorting: sort_index(level=..., ascending=...)
  • Reshaping: stack/unstack to move a level from rows to columns and back
  • Tidying: droplevel, reset_index, rename_axis to keep outputs readable

Worked examples

Example 1 — Build and slice a MultiIndex

import pandas as pd

sales = pd.DataFrame({
    'Region': ['North','North','North','South','South','South'],
    'Store':  ['A','A','B','A','B','B'],
    'Product':['Tea','Coffee','Tea','Tea','Coffee','Tea'],
    'Revenue':[120, 200, 150, 90, 160, 130]
})

# Create a MultiIndex on rows
smi = sales.set_index(['Region','Store','Product']).sort_index()
print(smi)

# Select all items for Region='North'
print(smi.loc['North'])

# Select specific tuple: ('South','B','Tea')
print(smi.loc[('South','B','Tea')])

# Partial slice across a level using IndexSlice
idx = pd.IndexSlice
print(smi.loc[idx['North', :, 'Tea'], :])

Tip: If you get KeyError, verify sort order only matters for .loc slices with ranges; label lookups by exact tuple don't require sorting.

Example 2 — MultiIndex columns via pivot_table

import pandas as pd

# Monthly metrics by Region and Product
metrics = pd.DataFrame({
    'Region':['North','North','South','South']*2,
    'Product':['Tea','Coffee','Tea','Coffee']*2,
    'Month':[1,1,1,1,2,2,2,2],
    'Revenue':[120,200,90,160,130,210,95,155],
    'Units':[30,50,20,40,33,52,21,39]
})

pt = metrics.pivot_table(
    index='Month',
    columns=['Region','Product'],
    values=['Revenue','Units'],
    aggfunc='sum'
)

# Columns are now a MultiIndex with top level metric then (Region, Product)
print(pt.columns)

# Select Revenue for all regions/products in Month 2
print(pt.loc[2, 'Revenue'])

# Select all metrics for Region='North' only
print(pt.loc[:, pd.IndexSlice[:, 'North', :]])

When columns are MultiIndex, you can still use .loc with IndexSlice on the column axis.

Example 3 — stack/unstack and level control

# Continue from pt in Example 2
# Move 'Region' from columns down to the rows
pt2 = pt.stack(level='Region')  # now index is (Month, Region)

# Bring 'Product' down as well
pt3 = pt2.stack(level='Product')  # index is (Month, Region, Product)

# If you only want Revenue in tidy rows
rev_tidy = pt['Revenue'].stack(['Region','Product']).rename('Revenue').reset_index()
print(rev_tidy.head())

# Reorder and sort levels on a MultiIndex index
reordered = rev_tidy.set_index(['Product','Region','Month']).sort_index()
print(reordered.index.names)

# Swap Region and Product levels
swapped = reordered.swaplevel('Region','Product').sort_index()

# Drop a level after aggregation
agg = swapped.groupby(level=['Region']).sum(numeric_only=True)
clean = agg.droplevel(0, axis=1) if isinstance(agg.columns, pd.MultiIndex) else agg
print(clean)

Rule of thumb: after reshaping, rename_axis and reset_index to keep outputs readable.

Exercises you can try now

These mirror the exercises below the lesson. Do them in a notebook or Python REPL.

  1. Create a MultiIndex from ['Region','Store','Product'], slice all Tea in North, then get a cross-section at Store='B'.
Self-check checklist
  • I can set a MultiIndex with set_index([...]).
  • I can use .loc with tuples and IndexSlice.
  • I can use .xs(level=...) to pull a cross-section.
  • I can swap and sort levels without losing track of names.
  • I can stack/unstack and restore tidy data with reset_index.

Common mistakes and how to self-check

  • Forgetting level names: unclear outputs. Fix with rename_axis and verify df.index.names / df.columns.names.
  • KeyErrors on partial slices: remember that range-like slices across a level require a sorted index. Use sort_index() before slicing.
  • Confusing rows vs columns: after pivot, your MultiIndex might be on columns. Use axis awareness: .loc[rows, cols].
  • Over-nesting: too many levels make data hard to read. Consider resetting a level to columns when needed.
  • Unstable ordering after swaplevel: always follow with sort_index to get predictable order.
Quick self-audit
  • Can I select all (Region='North') regardless of other levels? Try df.xs('North', level='Region').
  • Can I select a range on Month without errors? Sort first, then df.loc[pd.IndexSlice[1:3, :, :], :].
  • After unstack → stack → reset_index, do I recover the same row count?

Practical projects

  • Store performance dashboard: Build a MultiIndex DataFrame (Region, Store, Department). Implement functions to drill down and summarize KPIs.
  • Subscription cohort table: Pivot to MultiIndex columns (Metric → CohortMonth) and practice stack/unstack to produce tidy outputs for charts.
  • Market basket by hierarchy: Use (Category, Subcategory, SKU) levels to compute pairwise lift and export clean tables with reset_index.

Learning path

  1. Review indexing basics: .loc/.iloc and boolean masks.
  2. Create and name MultiIndex levels with set_index and from_product.
  3. Selection patterns: tuple keys, IndexSlice, .xs.
  4. Reshaping: pivot_table, stack/unstack, tidy outputs.
  5. Level management: swaplevel, reorder_levels, sort_index, droplevel, rename_axis.
  6. Apply to a real dataset and validate outputs against expected counts.

Next steps

  • Practice combining MultiIndex with groupby transforms and rolling windows.
  • Add data validation: assert expected index names and levels before running analysis.
  • Automate clean-up helpers (small utility functions) to standardize outputs.

Mini challenge

Given a DataFrame indexed by (Year, Quarter, Product) with metrics as MultiIndex columns (Metric → Region), compute the QoQ percent change for Revenue for Region='North' only, and return a tidy DataFrame with columns [Year, Quarter, Product, QoQ_Rev_North].

Ready for the quick test?

The quick test below is available to everyone. If you log in, your progress will be saved.

Practice Exercises

1 exercises to complete

Instructions

1) Create a DataFrame with columns Region, Store, Product, Revenue. 2) Set a MultiIndex on (Region, Store, Product) sorted by all levels. 3) Select all rows for Region='North' and Product='Tea' (any Store). 4) Take a cross-section for Store='B' across all Regions and Products. 5) Return the result as tidy data with reset_index.

import pandas as pd

data = pd.DataFrame({
    'Region': ['North','North','North','South','South','South'],
    'Store':  ['A','A','B','A','B','B'],
    'Product':['Tea','Coffee','Tea','Tea','Coffee','Tea'],
    'Revenue':[120, 200, 150, 90, 160, 130]
})
Expected Output
Two printed tables: 1) All North/Tea rows (any Store), showing Revenue. 2) Cross-section at Store='B' across Regions/Products, then reset_index to columns: Region, Product, Revenue.

Working with MultiIndex — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Working with MultiIndex?

AI Assistant

Ask questions about this tool