Why this matters
As a Data Analyst, you rarely report the mean alone. You need to show how spread out the data is. Dispersion metrics tell stakeholders whether a result is stable or volatile, if outliers are distorting insights, and how risky a decision might be.
- Product analytics: Compare variability of daily active users before vs. after a feature launch.
- Operations: Check if delivery times are consistent (low standard deviation) or unpredictable (high standard deviation).
- Finance: Flag unusual transactions using IQR and the 1.5×IQR outlier rule.
Who this is for
- Beginner to intermediate analysts who know basic averages and need solid dispersion skills.
- Anyone preparing for data interviews or building analysis reports/dashboards.
Prerequisites
- Basic arithmetic and averages (mean/median).
- Comfort with sorted data and percentiles.
- Optional: Familiarity with spreadsheets or a programming tool; not required for this lesson.
Concept explained simply
Dispersion describes how spread out values are around the center.
- Variance (s^2): Average of squared distances from the mean (uses n−1 for a sample).
- Standard deviation (s): The square root of variance; same units as the data.
- IQR (Interquartile Range): Middle 50% spread = Q3 − Q1. Robust to outliers.
Mental model
Imagine a rubber band around your data points on a number line:
- Standard deviation = how tight or loose the band is around the mean.
- IQR = how wide the middle chunk (between Q1 and Q3) is. Outliers sit outside the band but don’t stretch it much.
Key rules you’ll use
- Sample variance: s^2 = sum((x − mean)^2) / (n − 1)
- Standard deviation: s = sqrt(s^2)
- IQR = Q3 − Q1; Outlier rule: values < Q1 − 1.5×IQR or > Q3 + 1.5×IQR
- Add a constant (e.g., +5): dispersion does not change (s and IQR unchanged)
- Multiply by a constant (e.g., ×3): s and IQR scale by that constant
- Empirical rule (approx, near-normal data): ~68% within 1s, ~95% within 2s, ~99.7% within 3s
Worked examples
Example 1 — Compute s and IQR for a small sample
Data: 5, 7, 7, 8, 9 (n=5)
- Mean = (5+7+7+8+9)/5 = 36/5 = 7.2
- Squared deviations: (−2.2)^2=4.84, (−0.2)^2=0.04, 0.04, 0.64, 3.24; Sum = 8.8
- Variance (sample): s^2 = 8.8 / (5−1) = 2.2
- Std dev: s = sqrt(2.2) ≈ 1.483
- IQR: Sorted is already sorted; median=7. Q1 = median of (5,7) = 6; Q3 = median of (8,9) = 8.5; IQR = 8.5 − 6 = 2.5
Answer: s ≈ 1.483, IQR = 2.5
Example 2 — IQR resists outliers
Data: 10, 11, 12, 12, 13, 14, 60
- Q1 = 11; Q3 = 14; IQR = 3
- Std dev (sample) ≈ 18.2 (large because of 60)
Interpretation: The middle 50% is tight (IQR=3), but a single extreme value blows up the standard deviation. Report both when outliers exist.
Example 3 — Same mean, different spread
A: 8, 9, 10, 11, 12; B: 0, 5, 10, 15, 20
- Means are equal (10).
- A: s ≈ 1.581, IQR = 3
- B: s ≈ 7.906, IQR = 15
Conclusion: A is tightly clustered; B is much more variable.
How to calculate quickly
- Sort your data.
- Compute mean once; reuse it for squared deviations.
- Use consistent quartile method. For this lesson, use Tukey’s rule: for even n, Q1 = median of the lower half; for odd n, exclude the median then take medians of halves.
- Decide sample vs population. In analysis, sample is typical → divide by (n−1).
Common mistakes and how to self-check
- Using n instead of n−1 for a sample. Self-check: Did you sample from a larger population? Use n−1.
- Mixing quartile methods. Self-check: State the method (here: Tukey). Recompute if a tool uses a different rule.
- Forgetting units. Self-check: s is in original units; variance is in squared units.
- Letting outliers hide behind the mean. Self-check: Report IQR and s together; scan for outliers with 1.5×IQR rule.
- Comparing spreads on different scales. Self-check: If scales differ, consider standardized comparisons or use coefficient of variation later.
Exercises (practice here, solutions below)
Note: The quick test is available to everyone; sign in to save your progress.
- Exercise 1: Data = [4, 5, 7, 7, 9, 10, 10, 13]. Compute the sample standard deviation and the IQR. Round s to 2 decimals.
Hints
- Mean is 65/8.
- Use Tukey quartiles: split into halves of 4 each.
- Exercise 2: A dataset has standard deviation 12 and some IQR (unknown). What happens to the standard deviation and IQR if: (a) you add 5 to every value? (b) you multiply every value by 3?
Hints
- Shifts don’t change spread.
- Scaling multiplies distances.
- Checklist before checking solutions:
- Did you clearly choose sample vs population?
- Did you state quartile method?
- Did you round only at the end?
Practical projects
- Customer wait times: Collect 50 wait times (minutes). Report mean, median, s, IQR, and identify outliers using 1.5×IQR. Recommend a stability target.
- A/B feature rollout: Compare daily revenue variability for 14 days before vs 14 days after launch. Interpret whether variability increased or decreased.
- Inventory demand: For 8 SKUs, compute weekly s and IQR. Flag SKUs with high variability for safety stock review.
Learning path
- Before this: mean/median/mode; sorting and percentiles.
- This lesson: variance, standard deviation, IQR, outlier rule.
- Next: z-scores, boxplots, coefficient of variation, and comparing variability across groups.
Mini challenge
You have daily deliveries (in minutes late): 0, 3, 4, 5, 6, 7, 30.
- Compute IQR and standard deviation (sample). Are there outliers? What would you report to operations: IQR or s first, and why?
Suggested approach
- Sort (already sorted), find Q1, Q3, IQR.
- Apply 1.5×IQR rule for outliers.
- Compute s; compare robustness.
Next steps
- Practice on your own data; report both s and IQR.
- When outliers exist, lead with median and IQR, then discuss s for completeness.
- Move on to z-scores and boxplots to communicate spread visually and in standardized units.