luvv to helpDiscover the Best Free Online Tools
Topic 2 of 9

Descriptive Statistics

Learn Descriptive Statistics for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

As a Data Scientist, you will constantly summarize data before modeling or making decisions. Descriptive statistics help you: check data quality, choose the right models, communicate insights clearly, and set baselines for experiments.

  • Real tasks: sanity-check feature distributions, compare A/B variant metrics, summarize user behavior, detect outliers before training.
  • Hiring screens often include quick questions on mean, median, variance, quartiles, z-scores, and interpreting plots.

Concept explained simply

Descriptive statistics turn raw numbers into a quick story about center, spread, and shape.

  • Center: where the data tends to be (mean, median, mode).
  • Spread: how variable it is (range, interquartile range, variance, standard deviation).
  • Shape: symmetric or skewed? any outliers? (look via percentiles, box plots, histograms).

Mental model

Imagine your data as a line of pebbles on a ruler:

  • Median is the middle pebble.
  • Mean is the balance point if the ruler were a seesaw.
  • IQR (Q3−Q1) measures the middle 50% width of pebbles.
  • Standard deviation measures average distance from the mean.
  • Outliers are pebbles far from the rest, beyond the “fences.”

Core concepts and quick rules

  • Mean: average. Sensitive to outliers.
  • Median: middle value. Robust to outliers/skew.
  • Mode: most frequent value. Useful for discrete/categorical-like numeric data.
  • Range: max − min. Very sensitive to outliers.
  • Variance (sample): average squared distance from mean, using n−1 in the denominator.
  • Standard deviation (sample): sqrt(variance). Same units as data.
  • Quartiles (Q1, Q3): 25th and 75th percentiles. IQR = Q3 − Q1.
  • Outlier rule of thumb: values < Q1 − 1.5×IQR or > Q3 + 1.5×IQR.
  • Z-score: (x − mean) / sd. Tells how many standard deviations x is from mean.
  • Report pairs: symmetric data → mean & sd; skewed/outliers → median & IQR.

Worked examples

Example 1 — Core summaries

Data: 9, 10, 10, 11, 12, 14, 15, 20

  • Mean = (sum)/8 = 101/8 = 12.625
  • Median = average of 4th and 5th = (11+12)/2 = 11.5
  • Mode = 10
  • Range = 20 − 9 = 11
  • Q1 (lower half 9,10,10,11) = median = (10+10)/2 = 10
  • Q3 (upper half 12,14,15,20) = median = (14+15)/2 = 14.5
  • IQR = 14.5 − 10 = 4.5
  • Sample sd ≈ 3.62
Example 2 — Outlier detection with IQR

Data: 5, 6, 6, 7, 7, 8, 12, 30

  • Q1 = 6, Q3 = 10 → IQR = 4
  • Fences: lower = 6 − 1.5×4 = 0; upper = 10 + 1.5×4 = 16
  • Outliers: values < 0 or > 16 → 30 is an outlier
Example 3 — Pick the right summaries

Scenario: Monthly incomes in a city are right-skewed with a few very high earners. Mean is pulled up by outliers.

  • Use: median & IQR for typical value and spread.
  • A box plot will highlight skew and outliers clearly.

How to compute quickly

  1. Sort your data once. Many stats (median, quartiles, IQR, outliers) follow directly from the sorted list.
  2. Pick center and spread based on shape: skewed → median & IQR; symmetric → mean & sd.
  3. Use IQR fences for a quick outlier check before modeling.
  4. Standardize with z-scores to compare across different scales.

Exercises (practice here, then open solutions)

These mirror the tasks in the Exercises section below (ex1–ex3). Try them first; then open solutions.

  • ex1: Compute mean, median, mode, range, Q1, Q3, IQR, and sample sd.
  • ex2: Use IQR to find outliers and state the fences.
  • ex3: Choose the best center and spread for a skewed scenario, and explain why.

Self-check checklist

  • You can decide between mean/sd vs median/IQR based on distribution shape.
  • You can compute quartiles and IQR from a sorted list.
  • You can apply the 1.5×IQR rule to flag outliers.
  • You can compute and interpret a z-score.

Common mistakes and how to self-check

  • Using mean with strong skew/outliers. Self-check: compare mean vs median; if far apart, prefer median & IQR.
  • Forgetting n−1 in sample variance/sd. Self-check: confirm denominator for sample-based estimates.
  • Mixing quartile conventions. Self-check: be explicit (median-of-halves/Tukey) and stay consistent within an analysis.
  • Calling any extreme value an “outlier” without context. Self-check: compute IQR fences and also consider domain knowledge.

Practical projects

  • Product metrics snapshot: summarize daily active users for the last 30 days (median, IQR, outliers) and write 3 bullet insights.
  • Experiment pre-check: for an A/B dataset, compute baseline mean & sd and check for skew/outliers; suggest data transformations if needed.
  • Feature audit: pick 5 numeric features from a public dataset; for each, report center, spread, outliers, and which summary pair is appropriate.

Who this is for

  • Aspiring and early-career Data Scientists who need strong data summarization skills.
  • Analysts and ML engineers validating data before modeling.

Prerequisites

  • Comfort with basic arithmetic and order of operations.
  • Knowing how to sort data and count observations.

Learning path

  1. Descriptive Statistics (this page)
  2. Probability basics (random variables, distributions)
  3. Sampling and Central Limit Theorem
  4. Confidence intervals and Hypothesis testing
  5. Effect sizes and Power

Exercises — detailed prompts and solutions

Exercise ex1 — Compute core summaries

Data: 12, 15, 14, 10, 9, 10, 11, 20

Tasks: mean, median, mode(s), range, Q1, Q3, IQR, sample sd (2 decimals).

Try it, then compare with the solution below.

Exercise ex2 — Detect outliers with IQR

Data: 5, 6, 6, 7, 7, 8, 12, 30. Find Q1, Q3, IQR, outlier fences, and list outliers.

Exercise ex3 — Pick the right summary

Scenario: Highly skewed right distribution of household incomes. What center and spread would you report, and why?

Mini challenge

You receive 1, 1, 2, 2, 2, 3, 12 as a feature vector. Without a calculator, decide quickly: should you report mean & sd or median & IQR to describe it to stakeholders? Justify in 1–2 sentences.

Next steps

  • Apply these summaries to a dataset you care about (product, sports, finance).
  • Move to Probability basics to understand uncertainty behind these summaries.

Quick Test

Note: The Quick Test is available to everyone. If you sign in, your progress saves automatically.

Practice Exercises

3 exercises to complete

Instructions

Given the data: 12, 15, 14, 10, 9, 10, 11, 20
Compute: mean, median, mode(s), range, Q1, Q3, IQR, and sample standard deviation (2 decimals).
Expected Output
Mean = 12.625; Median = 11.5; Mode = 10; Range = 11; Q1 = 10; Q3 = 14.5; IQR = 4.5; Sample SD ≈ 3.62

Descriptive Statistics — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Descriptive Statistics?

AI Assistant

Ask questions about this tool