luvv to helpDiscover the Best Free Online Tools

Descriptive Statistics

Learn Descriptive Statistics for Data Analyst for free: roadmap, examples, subskills, and a skill exam.

Published: December 19, 2025 | Updated: December 19, 2025

What is Descriptive Statistics for Data Analysts?

Descriptive statistics summarize data so you can understand what is typical, how spread out values are, and whether there are unusual patterns. As a Data Analyst, you will use it to profile datasets, validate assumptions, and communicate insights clearly to stakeholders before any modeling.

  • Central tendency: mean, median, mode
  • Dispersion: variance, standard deviation, range, IQR
  • Distribution shape: skewness, kurtosis
  • Position: percentiles and quantiles
  • Tabulations: frequency tables, cross tabs
  • Estimation basics: sampling, standard error, confidence intervals, effect size
Typical analyst tasks enabled
  • Quality-check a new dataset (spot outliers, input errors)
  • Describe user behavior (e.g., median session length vs. average)
  • Segment and compare groups (e.g., conversion by channel)
  • Provide ranges and uncertainty (e.g., 95% CI for average order value)

Who this is for

  • Aspiring or junior Data Analysts who need a strong foundation
  • Business analysts transitioning to quantitative work
  • Data-savvy PMs who need reliable summaries and comparisons

Prerequisites

  • Comfort with arithmetic and ratios
  • Basic spreadsheets (sum, average, sort/filter)
  • Optional but helpful: Intro SQL or Python

Learning path

  1. Describe the center: mean, median, mode; when to use each.
  2. Measure spread: variance, standard deviation, IQR; outlier impact.
  3. Understand position: percentiles, quantiles, ranks.
  4. Tabulate: frequency tables and cross tabs for categories.
  5. Shape: skewness and kurtosis to assess tails and asymmetry.
  6. Sampling and uncertainty: sampling basics, standard error, confidence intervals.
  7. Effect size: quantify practical differences beyond p-values.
  8. Interpret and communicate: write clear, decision-ready summaries.
Milestones checklist
  • Compute mean/median/mode and explain when to prefer median
  • Calculate SD and IQR; identify outliers using IQR rule
  • Read and build a frequency table and cross tab with row/column percentages
  • Explain right vs left skew and what it implies
  • Construct a 95% CI for a mean and interpret it plainly
  • Report a simple effect size (e.g., Cohen’s d) with context

Worked examples

Use this small dataset of daily orders for examples:

orders = [2, 3, 3, 4, 6, 8, 50]
Example 1 — Mean vs. median with an outlier

Mean = (2+3+3+4+6+8+50)/7 = 10.86; Median = 4. Outlier 50 inflates the mean; median better represents a typical day.

Example 2 — Variance, SD, and IQR
  1. Sorted: [2,3,3,4,6,8,50]
  2. Q1=3, Q2=4, Q3=8 → IQR=5
  3. Sample variance (s²) ≈ 322.81; SD (s) ≈ 17.96

IQR shows the middle spread (robust to outliers); SD shows large overall spread due to the outlier.

Example 3 — 90th percentile (P90)

For 7 values, P90 is near the 6.4th position. Interpolated value ≈ between 8 and 50 → around 33.2. Interpretation: 90% of days have ≤ ~33 orders.

Example 4 — Frequency table (categorizing order sizes)
Bins: Small (≤3), Medium (4–8), Large (>8)
Small: 3 values (2,3,3)
Medium: 3 values (4,6,8)
Large: 1 value (50)

Percentages: Small 42.9%, Medium 42.9%, Large 14.3%.

Example 5 — Cross tab and conditional percentages
Channel vs Purchase (1=yes,0=no)
Rows: Channel [Email, Ads]
Cols: Purchase [0,1]
Email: [60, 40]
Ads:   [75, 25]

Row percentages:

Email: No 60%, Yes 40%
Ads:   No 75%, Yes 25%

Email outperforms Ads in conversion (relative to each channel’s traffic).

Drills and exercises

  • [ ] Compute mean, median, and mode for three recent metrics (e.g., daily signups). Note differences.
  • [ ] Calculate IQR and flag outliers using 1.5×IQR rule.
  • [ ] Build a frequency table for a categorical field (e.g., device type).
  • [ ] Create a 2×2 cross tab (e.g., new vs. returning by converted vs. not) with row and column percentages.
  • [ ] Identify skew direction in a numeric field by comparing mean vs. median and using a histogram.
  • [ ] Construct a 95% confidence interval for a mean from a sample and interpret it plainly.
  • [ ] Compute a simple effect size (Cohen’s d) between two groups.
Mini tasks you can do in a spreadsheet
  • Use QUARTILE.INC for Q1/Q3 and compute IQR
  • Use PERCENTILE.INC for P90
  • Use COUNTIF/COUNTIFS to build frequency tables
  • Use STDEV.S for sample standard deviation

Common mistakes and how to fix them

  • Using the mean with skewed data: Prefer median or winsorized mean; report both.
  • Confusing population vs. sample formulas: Use sample SD/variance (n−1) for samples.
  • Ignoring outliers: Always check IQR and visualize; investigate data quality or report robust stats.
  • Reading cross tabs without conditioning: Always specify row or column percentages and why.
  • Misinterpreting CIs: A 95% CI means the procedure covers the true mean 95% of the time, not a 95% chance for your specific interval.
  • Overstating small differences: Add effect size and context (practical significance) alongside p-values.
Debugging tips
  • If SD is huge, check for unit mix-ups or extreme outliers.
  • If percentiles look off, confirm sorting method and inclusive vs. exclusive function versions.
  • If cross-tab totals don’t match, verify filters and missing values handling.

Mini project: Customer Order Insights

Goal: Summarize orders to guide operations and marketing.

  1. Clean: Remove obvious errors (negative orders). Document any removals.
  2. Center and spread: Report mean, median, SD, and IQR for daily orders.
  3. Percentiles: Provide P50, P75, P90, P95 to inform staffing thresholds.
  4. Cross tab: By device (Desktop/Mobile) and conversion (Yes/No), give row percentages.
  5. Uncertainty: 95% CI for average order value from a 30-day sample.
  6. Effect size: Compare average order value between new vs. returning users (Cohen’s d) and discuss practical impact.
  7. Deliverable: A one-page brief with a chart (histogram or box plot) and plain-language insights.
What good looks like
  • Clear distinction between robust and non-robust metrics
  • Explicit handling of outliers and missing data
  • Row/column percentage labels on cross tabs
  • Precise CI interpretation and a concise recommendation

More practical project ideas

  • E-commerce: Daily revenue distribution with staffing recommendations from percentiles
  • Marketing: Channel × device cross tab with conversion rates and effect sizes vs. baseline
  • Product: Feature usage percentiles and skewness to identify power-user features

Subskills

This skill includes the following subskills. Explore each to practice focused capabilities:

  • Central Tendency: Mean, Median, Mode
  • Dispersion: Variance, Standard Deviation, IQR
  • Percentiles and Quantiles
  • Frequency Tables
  • Cross Tabulation
  • Skewness and Kurtosis
  • Confidence Intervals Basics
  • Sampling Basics
  • Standard Error Basics
  • Effect Size Basics
  • Practical Interpretation

Next steps

  • Re-run descriptive stats on 2–3 different datasets to build speed and intuition.
  • Practice clear, one-paragraph interpretations for non-technical readers.
  • When comfortable, move to inferential techniques (hypothesis tests, regression) while keeping robust descriptive summaries in your workflow.

Descriptive Statistics — Skill Exam

This exam checks your ability to summarize data, choose appropriate measures, and communicate insights. You can take it for free. Anyone can attempt the exam; if you are logged in, your progress and results will be saved so you can resume or review later.Tips: Round intermediate calculations sensibly (e.g., 2 decimal places). When multiple answers apply, select all that are correct.

15 questions70% to pass

Have questions about Descriptive Statistics?

AI Assistant

Ask questions about this tool