How to learn Sorting and Ranking for Python pandas in Data Analyst for free

Why this matters

As a Data Analyst, you constantly need clean, ordered views of data to spot trends, generate leaderboards, pick top performers, or prepare inputs for dashboards and models. Sorting and ranking in pandas power tasks like:

Creating weekly top-10 products by revenue.
Ordering customers by churn risk before outreach.
Ranking campaigns within channels for optimization.
Preparing neatly ordered tables for stakeholders and BI tools.

Concept explained simply

Sorting = arranging rows by rules (like A → Z, high → low). Ranking = assigning positions (1st, 2nd, 3rd) with tie handling. Both are about order: sorting changes row order; ranking adds a new column that captures order.

Mental model

Sort: think of stacking rows from smallest to largest based on one or many columns. You can choose ascending/descending per column and where missing values go.
Rank: think of giving out medals. Decide how ties are handled: share a place (average), pack tightly with no gaps (dense), or reserve positions (min/max).

Key functions you will use

DataFrame.sort_values(by, ascending=True, na_position='last', kind='quicksort', key=None)
DataFrame.sort_index(axis=0, ascending=True)
Series.sort_values(...)
Series.nlargest(n) and Series.nsmallest(n); for DataFrame use df.nlargest(n, 'col')
Series.rank(method='average', ascending=True, pct=False, na_option='keep')
Within-group ranking: df.groupby('group')['value'].rank(...)

When to use what?

Need overall order? Use sort_values.
Need order by index labels (e.g., dates as index)? Use sort_index.
Need top/bottom N quickly? Use nlargest/nsmallest.
Need positions as a column (keep original order)? Use rank.

Worked examples

Example 1 — Multi-column sort (different directions)

import pandas as pd

sales = pd.DataFrame({
    'region': ['East','West','East','North','West','East', None],
    'customer': ['A','B','C','D','E','F','G'],
    'revenue': [1200, 3000, 3000, 900, 2000, 1500, 500],
    'date': pd.to_datetime(['2024-01-03','2024-02-01','2024-01-07','2024-01-01','2024-02-03','2024-01-05','2024-01-04'])
})

out = sales.sort_values(
    by=['region','revenue','date'],
    ascending=[True, False, True],
    na_position='last'
)
print(out[['region','customer','revenue','date']])

Interpretation: within each region, highest revenue first; ties broken by earliest date. Rows with missing region go last.

Example 2 — Top N overall and per group

products = pd.DataFrame({
    'category': ['Snacks','Snacks','Drinks','Drinks','Drinks','Snacks'],
    'product': ['Bar','Chips','Cola','Juice','Water','Nuts'],
    'revenue': [4200, 8000, 5000, 4500, 3000, 8200]
})

# Overall top 3
print(products.nlargest(3, 'revenue'))

# Top 2 per category
top2_per_cat = (
    products.groupby('category', group_keys=False)
            .apply(lambda g: g.nlargest(2, 'revenue'))
)
print(top2_per_cat)

Tip: nlargest avoids a full sort and is usually faster for picking top N.

Example 3 — Ranking with different tie rules

scores = pd.DataFrame({
    'name': ['Ana','Bo','Cy','Di','Ed'],
    'score': [92, 88, 92, 70, 88]
})

scores['rank_average'] = scores['score'].rank(method='average', ascending=False)
scores['rank_dense']   = scores['score'].rank(method='dense',   ascending=False)
scores['rank_min']     = scores['score'].rank(method='min',     ascending=False)
print(scores)

average: ties get the mean of tied positions.
dense: ties share a rank and next rank is sequential (no gaps).
min: ties get the lowest rank in the tie group.

Example 4 — Sorting index and custom order

import pandas as pd

monthly = pd.DataFrame({
    'store': ['A','B','A','B','A','B'],
    'month': ['Feb','Jan','Mar','Mar','Jan','Feb'],
    'sales': [110, 90, 150, 140, 100, 120]
})

# Custom month order via Categorical + sort_values
month_order = pd.CategoricalDtype(['Jan','Feb','Mar'], ordered=True)
monthly['month'] = monthly['month'].astype(month_order)
print(monthly.sort_values(['store','month']))

# Sort by index after setting a MultiIndex
mi = monthly.set_index(['store','month']).sort_index()
print(mi)

Use categoricals or the key= parameter to define custom sort orders.

Try it: Exercises

Work through the two tasks below. Use the hints if you get stuck. Your quick test is at the end. Note: the test is available to everyone; only logged-in users will have progress saved.

Salesboard order: Sort by region ascending, then revenue descending, then date ascending. Put missing regions last.
Hint 1
Use sort_values with by as a list and ascending=[True, False, True].

Hint 2
Use na_position='last' to push missing regions to the bottom.
Team rankings: Compute a dense rank of players by points within each team (highest first). Also compute an overall percent rank.
Hint 1
Use groupby('team')['points'].rank(method='dense', ascending=False).

Hint 2
For overall percent rank, use df['points'].rank(pct=True, ascending=True/False).

[ ] I used nlargest or nsmallest when I only needed top/bottom N.
[ ] I chose tie-handling method consciously (dense, average, etc.).
[ ] I avoided inplace=True unless necessary.
[ ] I verified where NaNs end up.

Common mistakes and self-check

Forgetting per-column sort directions: Use a list for ascending matching the by list length.
Sorting the whole dataset just to get top N: Prefer nlargest/nsmallest.
Unexpected tie handling in ranks: Explicitly set method (e.g., dense) and document it.
Random order after equal values: Use a stable sort with kind='mergesort' or add a secondary sort key.
Sorting inside loops: Vectorize: group once, then apply.
Mixed-type columns (e.g., numbers as strings): Convert dtypes before sorting.

Self-check quick audit

Did I specify where NaNs go (na_position)?
Is my rank definition reproducible and documented?
Did I avoid chained sorting by combining columns into one sort_values call?

Practical projects

Top-5 products dashboard slice: From a transactions table, compute revenue by product, pick top 5 per month with groupby + nlargest, and export a tidy table for a BI chart.
Customer leaderboard: Rank customers by LTV overall and within each region using rank(method='dense'). Produce a table with both ranks side by side.
Campaign shortlists: For each channel, rank campaigns by CTR; deliver top 3 with stable tie-breaking using kind='mergesort' plus a secondary key (e.g., spend).

Mini challenge

Create a DataFrame with columns: team, player, points, assists. Task: return top 2 players per team by points, breaking ties by higher assists, and add a dense rank per team. Bonus: show overall percent rank by points.

Who this is for

Aspiring and junior Data Analysts who need clean, ordered outputs.
Professionals preparing for interviews or analytics case studies.

Prerequisites

Basic Python and pandas (DataFrame creation, selecting columns, groupby basics).
Comfort with data types (numeric, datetime, categorical).

Learning path

Before: Data cleaning and type casting in pandas.
This lesson: Sorting and ranking with stable, documented rules.
Next: Grouped aggregations, window functions, and joins to enrich ranked tables.

Next steps

Refactor your recent analysis to use nlargest where you previously sorted everything.
Add explicit rank columns with documented tie methods so stakeholders know how ties were treated.
Run the quick test below. Note: anyone can take it; only logged-in users will have results saved.

Menu

Sorting and Ranking

Table of Contents

Why this matters

Concept explained simply

Mental model

Key functions you will use

Worked examples

Try it: Exercises

Common mistakes and self-check

Practical projects

Mini challenge

Who this is for

Prerequisites

Learning path

Next steps

Practice Exercises

Sort a sales table for a leaderboard

Instructions

Expected Output

Rank players within team and overall

Sorting and Ranking — Quick Test

Have questions about Sorting and Ranking?

AI Assistant