Who this is for
Data Analysts who already summarize data with mean, median, and standard deviation and want to describe shapes of distributions clearly when reporting insights.
Prerequisites
- Comfort with mean, median, variance, and standard deviation.
- Basic ability to compute with a calculator or spreadsheet.
- Ability to read histograms/box plots.
Why this matters
In real analysis work, numbers often arent symmetric. Long tails and outliers can mislead averages. Skewness and kurtosis help you:
- Flag when the mean is pulled by outliers (e.g., purchase amounts, time-on-site).
- Compare risk in metrics (heavy tails = more extreme events).
- Choose the right summary and model (transformations, robust stats).
- Write trustworthy narratives: Traffic time is right-skewed; median better represents typical user.
Concept explained simply
Skewness (asymmetry)
Skewness tells you which tail is longer.
- Positive skew (right-skew): long right tail. Mean > median.
- Negative skew (left-skew): long left tail. Mean < median.
- Near zero: roughly symmetric.
Kurtosis (tail weight)
Kurtosis tells you how heavy the tails are versus the center. Its not mainly about the peak. We often use excess kurtosis (kurtosis minus 3):
- Excess > 0: heavy tails (more extreme values than normal).
- Excess = 0: tail weight like a normal distribution.
- Excess < 0: light tails (bounded or uniform-like).
Mental model
- Skewness = direction of the tail.
- Kurtosis = appetite for extremes. Heavy tails mean more surprises.
Quick formulas and definitions
Show formula reference
Let n be sample size, x05 the mean, s the sample standard deviation.
- Sample skewness (Fisher-Pearson): g1 = [n / ((n-1)(n-2))] * [ a (xi - x05)^3 / s^3 ]
- Excess kurtosis (Fisher): g2 = [ n(n+1) / ((n-1)(n-2)(n-3)) ] * [ a (xi - x05)^4 / s^4 ] - [ 3(n-1)^2 / ((n-2)(n-3)) ]
- Quick skewness estimate (Pearsons second): 3*(mean - median)/s
Rules of thumb (rough): |skewness| < 0.5 low skew; 0.5 1 moderate; > 1 high. For excess kurtosis: > 0 indicates heavier tails than normal.
Worked examples
Example 1 Web session duration with an outlier
Data (minutes): [2, 3, 4, 5, 100]
- Mean = 22.8; Median = 4; s approx 43.16
- Pearson skewness = 3*(22.8 - 4)/43.16 approx 1.31 (positive)
- Exact sample skewness (g1) approx 2.23 (very right-skewed)
- Kurtosis: heavy tails due to 100; excess kurtosis > 0 (heavier than normal)
Interpretation: Use the median to describe typical sessions; report skewness and note the outlier effect.
Example 2 Symmetric small set
Data: [1, 2, 3, 4, 5]
- Mean = 3; Median = 3; s^2 = 2.5; skewness g1 = 0 (symmetric)
- Excess kurtosis g2 approx -1.2 (light tails vs normal)
Interpretation: Averages are reliable; extremes are unlikely.
Example 3 Quality scores with mild left skew
Data: [5, 5, 6, 7, 8, 9, 10, 10, 10]
- Mean approx 7.78; Median = 8; s approx 2.11
- Pearson skewness = 3*(7.78 - 8)/2.11 approx -0.32 (mild left-skew)
- Excess kurtosis: slightly < 0 (tails lighter than normal)
Interpretation: A few low scores pull the mean below the median.
How to compute in practice
- Spreadsheet (Excel/Sheets): SKEW or SKEW.P for skewness; KURT for excess kurtosis (check your tools definition). Prefer sample functions unless you truly have the full population.
- Python (pandas): series.skew() for skewness; series.kurtosis(fisher=True) yields excess kurtosis.
- SQL: compute moments via aggregates: AVG(x), AVG(POWER(x - AVG(x), 2/3/4)) in subqueries (careful with numerical stability).
Always pair numbers with visuals (histogram or box plot) to confirm shape.
Common mistakes and self-check
- Confusing peak height with kurtosis. Self-check: Do outliers increase? If yes, kurtosis should increase.
- Using mean when distribution is skewed. Self-check: Compare mean vs median; if far apart, add median and skewness in your report.
- Ignoring units and outliers. Self-check: Winsorize or analyze with/without outliers to see effect on skew/kurtosis.
- Mismatching sample vs population formulas. Self-check: Verify your tools function returns sample skewness and excess kurtosis.
- Over-interpreting small samples (n < 20). Self-check: Add a caution and complement with plots.
Exercises
These mirror the interactive exercises below. Use a calculator or spreadsheet.
Exercise 1: Direction and magnitude of skewness
Data: [4, 5, 5, 6, 6, 7, 8, 30]
- Compute mean, median, and sample standard deviation s.
- Estimate Pearsons skewness = 3*(mean - median)/s.
- State the direction and whether skew is low, moderate, or high.
Exercise 2: Which dataset has heavier tails?
Dataset A: [10, 10, 11, 11, 12, 12, 13, 13, 14, 14]
Dataset B: [7, 8, 10, 10, 12, 12, 14, 16, 30, 45]
- Visually inspect or sketch box plots.
- Compare the spread of extremes relative to the middle (e.g., range vs IQR).
- Decide which has higher excess kurtosis and justify briefly.
Self-check checklist
- I can state what positive/negative skewness means.
- I can distinguish heavy tails from a sharp peak myth.
- I can compute Pearsons skewness and interpret magnitude.
- I can explain why median may be better when data are skewed.
- I can justify conclusions with a plot or quantiles.
Practical projects
- E-commerce basket analysis: Compute skewness and kurtosis of order values by channel. Write 3 bullet insights and a one-line recommendation (e.g., use median in dashboards).
- Operations timing: For task completion times, compare weekday vs weekend skewness and kurtosis. Note any heavy-tail risk periods.
- Quality control: For defect counts per batch, track monthly skewness/kurtosis and alert when tails get heavier (more extreme batches).
Learning path
- Before: Central tendency (mean/median), variability (variance/std), outliers.
- Now: Skewness (direction) and kurtosis (tail weight).
- Next: Normality checks, transformations (log), robust statistics (median/IQR), and hypothesis testing that respects non-normality.
Next steps
- Add skewness/kurtosis columns to your KPI diagnostics sheet.
- Set a rule: if |skewness| 0.5 or excess kurtosis 0, always report the median and IQR.
- Create a reusable spreadsheet tab that calculates these measures from a pasted column.
Mini challenge
Pick one metric you track (e.g., session duration, ticket age). Compute skewness and kurtosis for last month and this month. Write a 2-sentence update: What changed? What should stakeholders watch?
Quick Test
You can take the Quick Test below right away. Progress is available to everyone; if you sign in, your progress will be saved.