How to learn Descriptive Statistics for Data Analyst for free

What is Descriptive Statistics for Data Analysts?

Descriptive statistics summarize data so you can understand what is typical, how spread out values are, and whether there are unusual patterns. As a Data Analyst, you will use it to profile datasets, validate assumptions, and communicate insights clearly to stakeholders before any modeling.

Central tendency: mean, median, mode
Dispersion: variance, standard deviation, range, IQR
Distribution shape: skewness, kurtosis
Position: percentiles and quantiles
Tabulations: frequency tables, cross tabs
Estimation basics: sampling, standard error, confidence intervals, effect size

Typical analyst tasks enabled

Quality-check a new dataset (spot outliers, input errors)
Describe user behavior (e.g., median session length vs. average)
Segment and compare groups (e.g., conversion by channel)
Provide ranges and uncertainty (e.g., 95% CI for average order value)

Who this is for

Aspiring or junior Data Analysts who need a strong foundation
Business analysts transitioning to quantitative work
Data-savvy PMs who need reliable summaries and comparisons

Prerequisites

Comfort with arithmetic and ratios
Basic spreadsheets (sum, average, sort/filter)
Optional but helpful: Intro SQL or Python

Learning path

Describe the center: mean, median, mode; when to use each.
Measure spread: variance, standard deviation, IQR; outlier impact.
Understand position: percentiles, quantiles, ranks.
Tabulate: frequency tables and cross tabs for categories.
Shape: skewness and kurtosis to assess tails and asymmetry.
Sampling and uncertainty: sampling basics, standard error, confidence intervals.
Effect size: quantify practical differences beyond p-values.
Interpret and communicate: write clear, decision-ready summaries.

Milestones checklist

Compute mean/median/mode and explain when to prefer median
Calculate SD and IQR; identify outliers using IQR rule
Read and build a frequency table and cross tab with row/column percentages
Explain right vs left skew and what it implies
Construct a 95% CI for a mean and interpret it plainly
Report a simple effect size (e.g., Cohen’s d) with context

Worked examples

Use this small dataset of daily orders for examples:

orders = [2, 3, 3, 4, 6, 8, 50]

Example 1 — Mean vs. median with an outlier

Mean = (2+3+3+4+6+8+50)/7 = 10.86; Median = 4. Outlier 50 inflates the mean; median better represents a typical day.

Example 2 — Variance, SD, and IQR

Sorted: [2,3,3,4,6,8,50]
Q1=3, Q2=4, Q3=8 → IQR=5
Sample variance (s²) ≈ 322.81; SD (s) ≈ 17.96

IQR shows the middle spread (robust to outliers); SD shows large overall spread due to the outlier.

Example 3 — 90th percentile (P90)

For 7 values, P90 is near the 6.4th position. Interpolated value ≈ between 8 and 50 → around 33.2. Interpretation: 90% of days have ≤ ~33 orders.

Example 4 — Frequency table (categorizing order sizes)

Bins: Small (≤3), Medium (4–8), Large (>8)
Small: 3 values (2,3,3)
Medium: 3 values (4,6,8)
Large: 1 value (50)

Percentages: Small 42.9%, Medium 42.9%, Large 14.3%.

Example 5 — Cross tab and conditional percentages

Channel vs Purchase (1=yes,0=no)
Rows: Channel [Email, Ads]
Cols: Purchase [0,1]
Email: [60, 40]
Ads:   [75, 25]

Row percentages:

Email: No 60%, Yes 40%
Ads:   No 75%, Yes 25%

Email outperforms Ads in conversion (relative to each channel’s traffic).

Drills and exercises

[ ] Compute mean, median, and mode for three recent metrics (e.g., daily signups). Note differences.
[ ] Calculate IQR and flag outliers using 1.5×IQR rule.
[ ] Build a frequency table for a categorical field (e.g., device type).
[ ] Create a 2×2 cross tab (e.g., new vs. returning by converted vs. not) with row and column percentages.
[ ] Identify skew direction in a numeric field by comparing mean vs. median and using a histogram.
[ ] Construct a 95% confidence interval for a mean from a sample and interpret it plainly.
[ ] Compute a simple effect size (Cohen’s d) between two groups.

Mini tasks you can do in a spreadsheet

Use QUARTILE.INC for Q1/Q3 and compute IQR
Use PERCENTILE.INC for P90
Use COUNTIF/COUNTIFS to build frequency tables
Use STDEV.S for sample standard deviation

Common mistakes and how to fix them

Using the mean with skewed data: Prefer median or winsorized mean; report both.
Confusing population vs. sample formulas: Use sample SD/variance (n−1) for samples.
Ignoring outliers: Always check IQR and visualize; investigate data quality or report robust stats.
Reading cross tabs without conditioning: Always specify row or column percentages and why.
Misinterpreting CIs: A 95% CI means the procedure covers the true mean 95% of the time, not a 95% chance for your specific interval.
Overstating small differences: Add effect size and context (practical significance) alongside p-values.

Debugging tips

If SD is huge, check for unit mix-ups or extreme outliers.
If percentiles look off, confirm sorting method and inclusive vs. exclusive function versions.
If cross-tab totals don’t match, verify filters and missing values handling.

Mini project: Customer Order Insights

Goal: Summarize orders to guide operations and marketing.

Clean: Remove obvious errors (negative orders). Document any removals.
Center and spread: Report mean, median, SD, and IQR for daily orders.
Percentiles: Provide P50, P75, P90, P95 to inform staffing thresholds.
Cross tab: By device (Desktop/Mobile) and conversion (Yes/No), give row percentages.
Uncertainty: 95% CI for average order value from a 30-day sample.
Effect size: Compare average order value between new vs. returning users (Cohen’s d) and discuss practical impact.
Deliverable: A one-page brief with a chart (histogram or box plot) and plain-language insights.

What good looks like

Clear distinction between robust and non-robust metrics
Explicit handling of outliers and missing data
Row/column percentage labels on cross tabs
Precise CI interpretation and a concise recommendation

More practical project ideas

E-commerce: Daily revenue distribution with staffing recommendations from percentiles
Marketing: Channel × device cross tab with conversion rates and effect sizes vs. baseline
Product: Feature usage percentiles and skewness to identify power-user features

Subskills

This skill includes the following subskills. Explore each to practice focused capabilities:

Central Tendency: Mean, Median, Mode
Dispersion: Variance, Standard Deviation, IQR
Percentiles and Quantiles
Frequency Tables
Cross Tabulation
Skewness and Kurtosis
Confidence Intervals Basics
Sampling Basics
Standard Error Basics
Effect Size Basics
Practical Interpretation

Next steps

Re-run descriptive stats on 2–3 different datasets to build speed and intuition.
Practice clear, one-paragraph interpretations for non-technical readers.
When comfortable, move to inferential techniques (hypothesis tests, regression) while keeping robust descriptive summaries in your workflow.

Menu

Descriptive Statistics

Table of Contents

What is Descriptive Statistics for Data Analysts?

Who this is for

Prerequisites

Learning path

Worked examples

Drills and exercises

Common mistakes and how to fix them

Mini project: Customer Order Insights

More practical project ideas

Subskills

Next steps

Descriptive Statistics — Skill Exam

Topics

Practical Interpretation

Central Tendency Mean Median Mode

Dispersion Variance Standard Deviation IQR

Percentiles and Quantiles

Frequency Tables

Cross Tabulation

Skewness and Kurtosis

Confidence Intervals Basics

Sampling Basics

Standard Error Basics

Effect Size Basics

Have questions about Descriptive Statistics?

AI Assistant