Why this matters
As a Data Analyst, you constantly need clean, ordered views of data to spot trends, generate leaderboards, pick top performers, or prepare inputs for dashboards and models. Sorting and ranking in pandas power tasks like:
- Creating weekly top-10 products by revenue.
- Ordering customers by churn risk before outreach.
- Ranking campaigns within channels for optimization.
- Preparing neatly ordered tables for stakeholders and BI tools.
Concept explained simply
Sorting = arranging rows by rules (like A → Z, high → low). Ranking = assigning positions (1st, 2nd, 3rd) with tie handling. Both are about order: sorting changes row order; ranking adds a new column that captures order.
Mental model
- Sort: think of stacking rows from smallest to largest based on one or many columns. You can choose ascending/descending per column and where missing values go.
- Rank: think of giving out medals. Decide how ties are handled: share a place (average), pack tightly with no gaps (dense), or reserve positions (min/max).
Key functions you will use
DataFrame.sort_values(by, ascending=True, na_position='last', kind='quicksort', key=None)DataFrame.sort_index(axis=0, ascending=True)Series.sort_values(...)Series.nlargest(n)andSeries.nsmallest(n); for DataFrame usedf.nlargest(n, 'col')Series.rank(method='average', ascending=True, pct=False, na_option='keep')- Within-group ranking:
df.groupby('group')['value'].rank(...)
When to use what?
- Need overall order? Use
sort_values. - Need order by index labels (e.g., dates as index)? Use
sort_index. - Need top/bottom N quickly? Use
nlargest/nsmallest. - Need positions as a column (keep original order)? Use
rank.
Worked examples
Example 1 — Multi-column sort (different directions)
import pandas as pd
sales = pd.DataFrame({
'region': ['East','West','East','North','West','East', None],
'customer': ['A','B','C','D','E','F','G'],
'revenue': [1200, 3000, 3000, 900, 2000, 1500, 500],
'date': pd.to_datetime(['2024-01-03','2024-02-01','2024-01-07','2024-01-01','2024-02-03','2024-01-05','2024-01-04'])
})
out = sales.sort_values(
by=['region','revenue','date'],
ascending=[True, False, True],
na_position='last'
)
print(out[['region','customer','revenue','date']])
Interpretation: within each region, highest revenue first; ties broken by earliest date. Rows with missing region go last.
Example 2 — Top N overall and per group
products = pd.DataFrame({
'category': ['Snacks','Snacks','Drinks','Drinks','Drinks','Snacks'],
'product': ['Bar','Chips','Cola','Juice','Water','Nuts'],
'revenue': [4200, 8000, 5000, 4500, 3000, 8200]
})
# Overall top 3
print(products.nlargest(3, 'revenue'))
# Top 2 per category
top2_per_cat = (
products.groupby('category', group_keys=False)
.apply(lambda g: g.nlargest(2, 'revenue'))
)
print(top2_per_cat)
Tip: nlargest avoids a full sort and is usually faster for picking top N.
Example 3 — Ranking with different tie rules
scores = pd.DataFrame({
'name': ['Ana','Bo','Cy','Di','Ed'],
'score': [92, 88, 92, 70, 88]
})
scores['rank_average'] = scores['score'].rank(method='average', ascending=False)
scores['rank_dense'] = scores['score'].rank(method='dense', ascending=False)
scores['rank_min'] = scores['score'].rank(method='min', ascending=False)
print(scores)
- average: ties get the mean of tied positions.
- dense: ties share a rank and next rank is sequential (no gaps).
- min: ties get the lowest rank in the tie group.
Example 4 — Sorting index and custom order
import pandas as pd
monthly = pd.DataFrame({
'store': ['A','B','A','B','A','B'],
'month': ['Feb','Jan','Mar','Mar','Jan','Feb'],
'sales': [110, 90, 150, 140, 100, 120]
})
# Custom month order via Categorical + sort_values
month_order = pd.CategoricalDtype(['Jan','Feb','Mar'], ordered=True)
monthly['month'] = monthly['month'].astype(month_order)
print(monthly.sort_values(['store','month']))
# Sort by index after setting a MultiIndex
mi = monthly.set_index(['store','month']).sort_index()
print(mi)
Use categoricals or the key= parameter to define custom sort orders.
Try it: Exercises
Work through the two tasks below. Use the hints if you get stuck. Your quick test is at the end. Note: the test is available to everyone; only logged-in users will have progress saved.
-
Salesboard order: Sort by
regionascending, thenrevenuedescending, thendateascending. Put missing regions last.Hint 1
Use
sort_valueswithbyas a list andascending=[True, False, True].Hint 2
Use
na_position='last'to push missing regions to the bottom. -
Team rankings: Compute a dense rank of players by
pointswithin eachteam(highest first). Also compute an overall percent rank.Hint 1
Use
groupby('team')['points'].rank(method='dense', ascending=False).Hint 2
For overall percent rank, use
df['points'].rank(pct=True, ascending=True/False).
- [ ] I used
nlargestornsmallestwhen I only needed top/bottom N. - [ ] I chose tie-handling method consciously (
dense,average, etc.). - [ ] I avoided
inplace=Trueunless necessary. - [ ] I verified where NaNs end up.
Common mistakes and self-check
- Forgetting per-column sort directions: Use a list for
ascendingmatching thebylist length. - Sorting the whole dataset just to get top N: Prefer
nlargest/nsmallest. - Unexpected tie handling in ranks: Explicitly set
method(e.g.,dense) and document it. - Random order after equal values: Use a stable sort with
kind='mergesort'or add a secondary sort key. - Sorting inside loops: Vectorize: group once, then apply.
- Mixed-type columns (e.g., numbers as strings): Convert dtypes before sorting.
Self-check quick audit
- Did I specify where NaNs go (
na_position)? - Is my rank definition reproducible and documented?
- Did I avoid chained sorting by combining columns into one
sort_valuescall?
Practical projects
- Top-5 products dashboard slice: From a transactions table, compute revenue by product, pick top 5 per month with
groupby+nlargest, and export a tidy table for a BI chart. - Customer leaderboard: Rank customers by LTV overall and within each region using
rank(method='dense'). Produce a table with both ranks side by side. - Campaign shortlists: For each channel, rank campaigns by CTR; deliver top 3 with stable tie-breaking using
kind='mergesort'plus a secondary key (e.g., spend).
Mini challenge
Create a DataFrame with columns: team, player, points, assists. Task: return top 2 players per team by points, breaking ties by higher assists, and add a dense rank per team. Bonus: show overall percent rank by points.
Who this is for
- Aspiring and junior Data Analysts who need clean, ordered outputs.
- Professionals preparing for interviews or analytics case studies.
Prerequisites
- Basic Python and pandas (DataFrame creation, selecting columns,
groupbybasics). - Comfort with data types (numeric, datetime, categorical).
Learning path
- Before: Data cleaning and type casting in pandas.
- This lesson: Sorting and ranking with stable, documented rules.
- Next: Grouped aggregations, window functions, and joins to enrich ranked tables.
Next steps
- Refactor your recent analysis to use
nlargestwhere you previously sorted everything. - Add explicit rank columns with documented tie methods so stakeholders know how ties were treated.
- Run the quick test below. Note: anyone can take it; only logged-in users will have results saved.