luvv to helpDiscover the Best Free Online Tools
Topic 8 of 30

Sorting and Ranking

Learn Sorting and Ranking for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

As a Data Analyst, you constantly need clean, ordered views of data to spot trends, generate leaderboards, pick top performers, or prepare inputs for dashboards and models. Sorting and ranking in pandas power tasks like:

  • Creating weekly top-10 products by revenue.
  • Ordering customers by churn risk before outreach.
  • Ranking campaigns within channels for optimization.
  • Preparing neatly ordered tables for stakeholders and BI tools.

Concept explained simply

Sorting = arranging rows by rules (like A → Z, high → low). Ranking = assigning positions (1st, 2nd, 3rd) with tie handling. Both are about order: sorting changes row order; ranking adds a new column that captures order.

Mental model

  • Sort: think of stacking rows from smallest to largest based on one or many columns. You can choose ascending/descending per column and where missing values go.
  • Rank: think of giving out medals. Decide how ties are handled: share a place (average), pack tightly with no gaps (dense), or reserve positions (min/max).

Key functions you will use

  • DataFrame.sort_values(by, ascending=True, na_position='last', kind='quicksort', key=None)
  • DataFrame.sort_index(axis=0, ascending=True)
  • Series.sort_values(...)
  • Series.nlargest(n) and Series.nsmallest(n); for DataFrame use df.nlargest(n, 'col')
  • Series.rank(method='average', ascending=True, pct=False, na_option='keep')
  • Within-group ranking: df.groupby('group')['value'].rank(...)
When to use what?
  • Need overall order? Use sort_values.
  • Need order by index labels (e.g., dates as index)? Use sort_index.
  • Need top/bottom N quickly? Use nlargest/nsmallest.
  • Need positions as a column (keep original order)? Use rank.

Worked examples

Example 1 — Multi-column sort (different directions)
import pandas as pd

sales = pd.DataFrame({
    'region': ['East','West','East','North','West','East', None],
    'customer': ['A','B','C','D','E','F','G'],
    'revenue': [1200, 3000, 3000, 900, 2000, 1500, 500],
    'date': pd.to_datetime(['2024-01-03','2024-02-01','2024-01-07','2024-01-01','2024-02-03','2024-01-05','2024-01-04'])
})

out = sales.sort_values(
    by=['region','revenue','date'],
    ascending=[True, False, True],
    na_position='last'
)
print(out[['region','customer','revenue','date']])

Interpretation: within each region, highest revenue first; ties broken by earliest date. Rows with missing region go last.

Example 2 — Top N overall and per group
products = pd.DataFrame({
    'category': ['Snacks','Snacks','Drinks','Drinks','Drinks','Snacks'],
    'product': ['Bar','Chips','Cola','Juice','Water','Nuts'],
    'revenue': [4200, 8000, 5000, 4500, 3000, 8200]
})

# Overall top 3
print(products.nlargest(3, 'revenue'))

# Top 2 per category
top2_per_cat = (
    products.groupby('category', group_keys=False)
            .apply(lambda g: g.nlargest(2, 'revenue'))
)
print(top2_per_cat)

Tip: nlargest avoids a full sort and is usually faster for picking top N.

Example 3 — Ranking with different tie rules
scores = pd.DataFrame({
    'name': ['Ana','Bo','Cy','Di','Ed'],
    'score': [92, 88, 92, 70, 88]
})

scores['rank_average'] = scores['score'].rank(method='average', ascending=False)
scores['rank_dense']   = scores['score'].rank(method='dense',   ascending=False)
scores['rank_min']     = scores['score'].rank(method='min',     ascending=False)
print(scores)
  • average: ties get the mean of tied positions.
  • dense: ties share a rank and next rank is sequential (no gaps).
  • min: ties get the lowest rank in the tie group.
Example 4 — Sorting index and custom order
import pandas as pd

monthly = pd.DataFrame({
    'store': ['A','B','A','B','A','B'],
    'month': ['Feb','Jan','Mar','Mar','Jan','Feb'],
    'sales': [110, 90, 150, 140, 100, 120]
})

# Custom month order via Categorical + sort_values
month_order = pd.CategoricalDtype(['Jan','Feb','Mar'], ordered=True)
monthly['month'] = monthly['month'].astype(month_order)
print(monthly.sort_values(['store','month']))

# Sort by index after setting a MultiIndex
mi = monthly.set_index(['store','month']).sort_index()
print(mi)

Use categoricals or the key= parameter to define custom sort orders.

Try it: Exercises

Work through the two tasks below. Use the hints if you get stuck. Your quick test is at the end. Note: the test is available to everyone; only logged-in users will have progress saved.

  1. Salesboard order: Sort by region ascending, then revenue descending, then date ascending. Put missing regions last.
    Hint 1

    Use sort_values with by as a list and ascending=[True, False, True].

    Hint 2

    Use na_position='last' to push missing regions to the bottom.

  2. Team rankings: Compute a dense rank of players by points within each team (highest first). Also compute an overall percent rank.
    Hint 1

    Use groupby('team')['points'].rank(method='dense', ascending=False).

    Hint 2

    For overall percent rank, use df['points'].rank(pct=True, ascending=True/False).

  • [ ] I used nlargest or nsmallest when I only needed top/bottom N.
  • [ ] I chose tie-handling method consciously (dense, average, etc.).
  • [ ] I avoided inplace=True unless necessary.
  • [ ] I verified where NaNs end up.

Common mistakes and self-check

  • Forgetting per-column sort directions: Use a list for ascending matching the by list length.
  • Sorting the whole dataset just to get top N: Prefer nlargest/nsmallest.
  • Unexpected tie handling in ranks: Explicitly set method (e.g., dense) and document it.
  • Random order after equal values: Use a stable sort with kind='mergesort' or add a secondary sort key.
  • Sorting inside loops: Vectorize: group once, then apply.
  • Mixed-type columns (e.g., numbers as strings): Convert dtypes before sorting.
Self-check quick audit
  • Did I specify where NaNs go (na_position)?
  • Is my rank definition reproducible and documented?
  • Did I avoid chained sorting by combining columns into one sort_values call?

Practical projects

  1. Top-5 products dashboard slice: From a transactions table, compute revenue by product, pick top 5 per month with groupby + nlargest, and export a tidy table for a BI chart.
  2. Customer leaderboard: Rank customers by LTV overall and within each region using rank(method='dense'). Produce a table with both ranks side by side.
  3. Campaign shortlists: For each channel, rank campaigns by CTR; deliver top 3 with stable tie-breaking using kind='mergesort' plus a secondary key (e.g., spend).

Mini challenge

Create a DataFrame with columns: team, player, points, assists. Task: return top 2 players per team by points, breaking ties by higher assists, and add a dense rank per team. Bonus: show overall percent rank by points.

Who this is for

  • Aspiring and junior Data Analysts who need clean, ordered outputs.
  • Professionals preparing for interviews or analytics case studies.

Prerequisites

  • Basic Python and pandas (DataFrame creation, selecting columns, groupby basics).
  • Comfort with data types (numeric, datetime, categorical).

Learning path

  • Before: Data cleaning and type casting in pandas.
  • This lesson: Sorting and ranking with stable, documented rules.
  • Next: Grouped aggregations, window functions, and joins to enrich ranked tables.

Next steps

  • Refactor your recent analysis to use nlargest where you previously sorted everything.
  • Add explicit rank columns with documented tie methods so stakeholders know how ties were treated.
  • Run the quick test below. Note: anyone can take it; only logged-in users will have results saved.

Practice Exercises

2 exercises to complete

Instructions

Given a DataFrame sales with columns region, customer, revenue, and date (datetime), sort it by:

  • region ascending
  • revenue descending
  • date ascending

Place rows with missing region at the end. Print only these columns in the result.

Expected Output
Rows grouped by region alphabetically; within each region, higher revenue first; ties show earlier dates first; any rows with region == NaN appear last.

Sorting and Ranking — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Sorting and Ranking?

AI Assistant

Ask questions about this tool