Who this is for
- Beginner to intermediate Data Analysts who want reliable summaries beyond the mean.
- Anyone preparing to build dashboards, monitor SLAs, or describe distributions with quartiles and percentiles.
Prerequisites
- Basic stats: mean, median, sorting data.
- Comfort with simple spreadsheets or a programming language (optional).
Why this matters
Percentiles and quantiles describe how values are spread. They’re vital when averages hide extremes.
- Product analytics: Report median session duration and the 90th percentile to show long sessions.
- Data quality: Use interquartile range (IQR) to spot outliers before modeling.
- Operations: Track p95/p99 latency to ensure most users get fast responses.
- Stakeholder reports: Box plots use quartiles (Q1, median, Q3) and IQR.
Concept explained simply
Sort your data from smallest to largest. The p-th percentile is the value below which p% of observations fall. Quantiles are just percentiles with fewer slices. Quartiles cut data into four equal parts: Q1 (25%), Q2 (median, 50%), Q3 (75%).
Mental model
Imagine people standing in a line sorted by height. The 25th percentile marks the person below whom a quarter of people are shorter. The 50th percentile is the person in the middle. Quartiles are simply the marks at 25%, 50%, and 75% along the line.
How to compute percentiles and quartiles
First, sort your data: x(1) ≤ x(2) ≤ ... ≤ x(n).
Two common methods
- Nearest-rank (simple): rank r = ceil(p/100 × n). Percentile = x(r).
- Linear interpolation (Type 7, common in spreadsheets): r = 1 + (p/100) × (n − 1). If r is not an integer, interpolate between x(floor r) and x(ceil r). This is the default in many tools.
Tip: Always state the method used, especially in reports.
Interpolation step-by-step (Type 7)
- Compute r = 1 + (p/100) × (n − 1).
- If r is integer, percentile = x(r).
- Else, let k = floor(r), d = r − k. Percentile = x(k) + d × [x(k+1) − x(k)].
Quartiles and IQR
- Q1 = 25th percentile, Q2 = 50th (median), Q3 = 75th.
- IQR = Q3 − Q1.
- Outlier rule of thumb: values < Q1 − 1.5×IQR or > Q3 + 1.5×IQR.
Tool hints (no code required)
- Spreadsheets: PERCENTILE.INC (includes 0% and 100%), PERCENTILE.EXC (excludes). QUARTILE.INC / QUARTILE.EXC.
- SQL: percentile_disc (discrete) and percentile_cont (continuous).
- Python: numpy.percentile with a specified method/interpolation. State your choice in reports.
Worked examples
Example 1: Quartiles on a small dataset
Data: 2, 4, 6, 8, 10, 12, 14 (n = 7, already sorted)
- Median (Q2): middle is x(4) = 8.
- Q1 (25th, Type 7): r = 1 + 0.25 × (7−1) = 2.5 → between x(2)=4 and x(3)=6. Q1 = 4 + 0.5×(6−4) = 5.
- Q3 (75th, Type 7): r = 1 + 0.75 × 6 = 5.5 → between x(5)=10 and x(6)=12. Q3 = 10 + 0.5×(12−10) = 11.
- IQR = 11 − 5 = 6.
- Outlier bounds: [Q1 − 1.5×IQR, Q3 + 1.5×IQR] = [5 − 9, 11 + 9] = [−4, 20]. No outliers.
Example 2: 90th percentile, two methods
Data: 3, 3, 5, 7, 9, 11, 13, 15 (n = 8)
- Nearest-rank: r = ceil(0.90×8) = ceil(7.2) = 8 → p90 = x(8) = 15.
- Type 7: r = 1 + 0.90×(8−1) = 1 + 6.3 = 7.3. Between x(7)=13 and x(8)=15: p90 = 13 + 0.3×(15−13) = 13.6.
Interpretation: p90 is 15 (discrete) or 13.6 (continuous). Both are valid; choose and state the method.
Example 3: p95 latency interpretation
API latency (ms): 110, 120, 126, 128, 130, 131, 132, 180, 220, 600 (n = 10)
- Type 7 p95: r = 1 + 0.95×(10−1) = 9.55 → between x(9)=220 and x(10)=600 → 220 + 0.55×(600−220) = 220 + 0.55×380 = 429 ms.
- Interpretation: 95% of requests are ≤ ~429 ms; 5% are slower. The 600 ms outlier heavily influences p95.
Practice exercises
Do these before the quick test. The exercises below mirror what you’ll submit.
- Exercise 1 (Quartiles & IQR): Data: 5, 6, 7, 7, 8, 10, 11, 12, 14, 20.
Tasks:- Compute Q1, median (Q2), Q3 using Type 7.
- Compute IQR and outlier bounds.
- List any outliers.
- Exercise 2 (p90 vs p95): Data (daily sessions): 1, 2, 2, 3, 4, 5, 6, 10, 12, 18, 30.
Tasks:- Compute p90 and p95 using nearest-rank and Type 7.
- Explain in 1–2 sentences how the tail affects these percentiles.
Self-check checklist
- I sorted the data before computing anything.
- I stated which percentile method I used.
- For non-integer ranks, I interpolated correctly.
- My IQR and outlier bounds are non-negative and make sense with data spread.
- My interpretation mentions what proportion of observations are below the percentile.
Common mistakes and how to self-check
- Not sorting data first. Fix: Always sort; write “sorted:” in your notes.
- Mixing methods. Fix: Pick one method per report and state it.
- Assuming percentile = a data point. Fix: Continuous methods can yield in-between values.
- Misreading p95: It is not the average of the top 5%. It’s the threshold leaving 5% above.
- Using IQR outliers as “bad data.” Fix: They’re flags, not proofs. Investigate causes.
Practical projects
- Create a box plot summary: Compute Q1, median, Q3, IQR for weekly sales and write 3 sentences of insight.
- Latency dashboard draft: Track p50, p90, p95 for response times for two weeks; add a brief note when p95 spikes.
- Customer order size: Compare quartiles this month vs last month; state whether the spread widened (IQR change).
Mini challenge
You have revenue per order: 12, 13, 13, 15, 18, 21, 25, 40, 95. Compute p75 and p90 (Type 7). In one sentence, explain whether the distribution is skewed and why.
Learning path
- Before: Mean/median, variance, standard deviation.
- Now: Percentiles, quartiles, IQR, outlier rules, method selection.
- Next: Distributions (normal vs skewed), box plots, robust metrics in A/B tests.
Next steps
- Recalculate your last report’s quartiles with a clearly stated method.
- Add p90 or p95 to any performance or duration metric you track.
- Document your team’s default percentile method to ensure consistency.
Quick Test
Everyone can take the test for free. Logged-in users get their progress saved automatically.