Why this matters
As a Data Analyst, you constantly reshape data to fit analysis or reporting needs. You might receive wide spreadsheets (one column per month) but need a long format to model trends, or you may need a summary pivot for quick insights. Mastering melt, pivot, pivot_table, stack, and unstack makes these transformations fast, readable, and reliable.
- Real tasks: turning monthly columns into a single timeline; summarizing revenue by segment; preparing data for charts; converting nested multi-index results back to flat tables.
- Outcome: cleaner pipelines, fewer manual steps, and consistent results you can reproduce.
Who this is for
- Aspiring and junior Data Analysts who use pandas.
- Anyone preparing datasets for dashboards, modeling, or reporting.
Prerequisites
- Basic Python (lists, dicts, functions).
- pandas essentials: DataFrame/Series, indexing, groupby basics.
- Comfort with reading data and inspecting .head()/.info().
Concept explained simply
Think of your data as Lego bricks. Reshaping changes how bricks are arranged without changing the bricks themselves.
- melt: turn many columns into two columns: a single "variable" column and a single "value" column.
- pivot: turn a long table back into a wide table (needs unique index-column pairs).
- pivot_table: like pivot, but can aggregate when duplicates exist (sum, mean, etc.).
- stack: push columns down into the row index (longer, MultiIndex rows).
- unstack: pull an index level up into columns (wider).
Mental model
Imagine a grid:
- Rows (index) describe "who/what" (e.g., customer)
- Columns describe "measurement dimensions" (e.g., month)
- Cells contain values (e.g., sales)
Reshaping moves information between index, columns, and values:
- melt moves multiple columns into two standardized columns.
- pivot(pivot_table) moves one categorical column to column headers.
- stack/unstack shifts a level between the index and columns.
Tip: choose the tool
- When columns represent categories (Jan, Feb, Mar) but you want one timeline column: use melt.
- When your long table has one row per index/column pair: use pivot.
- When duplicates exist and you need an aggregate: use pivot_table.
- When working with MultiIndex and need tidy/untidy flips: use stack/unstack.
Worked examples
Example 1: melt (wide to long)
import pandas as pd
sales = pd.DataFrame({
'store': ['A', 'B', 'C'],
'Jan': [10, 12, 9],
'Feb': [8, 11, 10],
'Mar': [11, 7, 12]
})
long = sales.melt(id_vars='store', var_name='month', value_name='sales')
print(long.sort_values(['store','month']).reset_index(drop=True))
Result:
store month sales
0 A Jan 10
1 A Feb 8
2 A Mar 11
3 B Jan 12
4 B Feb 11
5 B Mar 7
6 C Jan 9
7 C Feb 10
8 C Mar 12
Why it works
id_vars stay as-is; every non-id column becomes a row with month and corresponding sales.
Example 2: pivot (long to wide)
long_sorted = long.sort_values(['store','month'])
wide = long_sorted.pivot(index='store', columns='month', values='sales')
wide = wide.reset_index() # optional: turn index back to a column
print(wide)
Result:
month store Feb Jan Mar
0 A 8 10 11
1 B 11 12 7
2 C 10 9 12
Common pitfall
pivot requires that each (index, column) pair maps to a single value. If duplicates exist, use pivot_table with an aggregation.
Example 3: pivot_table (handles duplicates)
df = pd.DataFrame({
'region': ['East','East','West','West','West'],
'product': ['A','A','A','A','B'],
'quarter': ['Q1','Q1','Q1','Q2','Q2'],
'revenue': [100, 50, 90, 130, 60]
})
pt = pd.pivot_table(
df,
index=['region','product'],
columns='quarter',
values='revenue',
aggfunc='sum',
fill_value=0
)
print(pt)
Result:
quarter Q1 Q2
region product
East A 150 0
West A 90 130
B 0 60
Why pivot_table
East-A had two Q1 entries (100 and 50). pivot_table sums them into 150.
Example 4: stack and unstack (MultiIndex reshaping)
# Using pt from the previous example
stacked = pt.stack() # columns (quarter) become a new index level
print(stacked.head())
unstacked = stacked.unstack('quarter') # bring quarter back to columns
print(unstacked.equals(pt)) # True
Result (first lines of stacked):
region product quarter
East A Q1 150
West A Q1 90
Q2 130
B Q2 60
Name: revenue, dtype: int64
Mental shortcut
stack moves columns down into the index; unstack moves an index level up into columns. They are inverses when no missing data prevents perfect symmetry.
How to apply in real workflows
- Inspect your data: is a category encoded as multiple columns? (e.g., months, channels)
- Decide the target shape: long for modeling/plots; wide for dashboards/exports.
- Pick the tool: melt for wide-to-long; pivot for long-to-wide (unique pairs); pivot_table for aggregation; stack/unstack for MultiIndex flips.
- Validate: check shapes, indexes, and totals before/after.
- Document: keep transformations in clear, small steps with comments.
Exercises (practice here, then run in your IDE)
These match the graded Exercises below. Use the checklist to self-verify.
Exercise 1 — Melt then Pivot (rebuild the original)
import pandas as pd
df = pd.DataFrame({
'store': ['A','B'],
'W1': [10,12],
'W2': [8,9],
'W3': [11,7]
})
# 1) Melt to long with columns: store, week, sales
# 2) Sort by store, week
# 3) Pivot back to wide (columns = week), values = sales
# 4) Make it look like the original (store first, then W1,W2,W3)
- [ ] Long shape should be 6 rows Ă— 3 cols
- [ ] Weeks appear as W1, W2, W3
- [ ] Final wide equals the original values
Exercise 2 — Pivot table + Stack
import pandas as pd
df = pd.DataFrame({
'region': ['East','East','West','West'],
'product': ['A','A','A','B'],
'quarter': ['Q1','Q2','Q1','Q2'],
'revenue': [100,120,90,130],
'units': [5,6,4,7]
})
# 1) Build a pivot_table of revenue with index=['region','product'], columns='quarter'
# Use aggfunc='sum', fill_value=0
# 2) Stack quarters to get a long DataFrame with column name 'revenue'
# 3) Reset index to get columns: region, product, quarter, revenue
- [ ] Pivot table has Q1 and Q2 columns
- [ ] West–A has Q1 = 90
- [ ] Stacked long has 6 rows
Note: The Quick Test below is available to everyone; if you log in, your progress is saved.
Common mistakes and self-checks
- Using pivot when duplicates exist. Fix: switch to pivot_table with an aggfunc.
- Forgetting var_name/value_name in melt. Fix: name them explicitly for clarity.
- Column order surprises after pivot. Fix: reindex columns or sort_index(axis=1).
- Losing index after stack/unstack. Fix: reset_index() when needed.
- Unexpected NaNs after unstack. Fix: fill_value=0 or .fillna(...) based on context.
Quick self-audit
- Totals equal before/after reshape (sum of sales, revenue).
- Number of unique categories matches the number of columns/levels created.
- Index/column names are set correctly (pt.index.names, pt.columns.name).
Practical projects
- Sales timeline: convert 12 monthly columns into a long table; create a pivot_table summary by region and product.
- Marketing channels: melt weekly spend columns; pivot to compare channels by quarter.
- Inventory heatmap: build a pivot_table (items vs. warehouse, values=stock), then export for a heatmap chart.
Learning path
- Review indexing and MultiIndex basics.
- Master melt and pivot on a toy dataset.
- Handle duplicates with pivot_table.
- Practice stack/unstack on MultiIndex outputs.
- Combine steps into a reusable function for your team.
Next steps
- Answer the Quick Test to confirm understanding.
- Refactor a recent analysis to use melt/pivot_table for clarity.
- Create a personal snippet library of common reshape patterns.
Mini challenge
You get a DataFrame with columns: customer, Q1_sales, Q1_cost, Q2_sales, Q2_cost. Convert to long format with columns: customer, quarter, metric, value. Then build a pivot_table with index=customer, columns=[quarter, metric]. Ensure no NaNs (fill with 0), and sort columns by quarter then metric.