luvv to helpDiscover the Best Free Online Tools

Statistics

Learn Statistics for Data Scientist for free: roadmap, examples, subskills, and a skill exam.

Published: January 1, 2026 | Updated: January 1, 2026

Why Statistics matters for a Data Scientist

Statistics turns raw data into trustworthy decisions. As a Data Scientist, you will design experiments, estimate uncertainty, test hypotheses, build predictive models, and communicate risk. Statistics helps you avoid false wins, quantify impact, and select models that generalize—critical for A/B tests, product metrics, forecasting, and machine learning validation.

Typical tasks this skill unlocks
  • Design and analyze A/B/n tests with power and sample size.
  • Estimate metrics and build confidence intervals that stakeholders can trust.
  • Choose and validate models (regression, time series) with correct assumptions.
  • Handle small samples with resampling (bootstrap) or Bayesian methods.
  • Control false discoveries when testing many metrics or segments.

What you will be able to do

  • Summarize data with robust descriptive statistics and visual checks.
  • Use sampling, distributions, and the central limit theorem to reason about uncertainty.
  • Build confidence intervals and interpret p-values correctly.
  • Run t-tests, proportion tests, chi-square tests, and understand power and effect size.
  • Fit and diagnose basic regression models.
  • Use simple Bayesian updates for rates and proportions.
  • Work with time series trends, seasonality, and stationarity.
  • Check statistical assumptions and avoid false discoveries across many comparisons.

Practical roadmap

  1. Describe: Understand variables, distributions, mean/median/variance, quantiles, outliers; plot histograms/boxplots; compute z-scores.
  2. Sample & reason: Random vs biased sampling; law of large numbers; central limit theorem; common distributions (Normal, t, Binomial, Poisson).
  3. Estimate: Standard error; confidence intervals for means and proportions; bootstrap for non-normal data.
  4. Test: Formulate H0/H1; choose tests (t, z for proportions, chi-square); interpret p-values; understand power, Type I/II errors.
  5. Model: Linear regression for prediction and inference; residual diagnostics; basic regularization awareness.
  6. Bayes basics: Priors for proportions (Beta); posterior update; credible intervals; compare to frequentist CI.
  7. Time series: Trend/seasonality decomposition; stationarity checks; simple forecasting baselines; autocorrelation intuition.
  8. Multiple testing: Why it inflates false positives; control FDR with Benjamini–Hochberg; preregister metrics mindset.
  9. Communicate: Report estimates with uncertainty, assumptions made, and practical conclusions.
Milestone checklist

Worked examples

1) A/B test on conversion rate (proportions z-test)

Scenario: Variant B shows 6.0% conversion vs 5.2% for control A. Is it significant at 5%?

import numpy as np
from statsmodels.stats.proportion import proportions_ztest

conv_A, n_A = 260, 5000  # 5.2%
conv_B, n_B = 300, 5000  # 6.0%
count = np.array([conv_B, conv_A])
nobs = np.array([n_B, n_A])

z_stat, p_val = proportions_ztest(count, nobs, alternative='larger')
print(z_stat, p_val)

# 95% CI for difference using normal approx
pA, pB = conv_A/n_A, conv_B/n_B
diff = pB - pA
se = np.sqrt(pA*(1-pA)/n_A + pB*(1-pB)/n_B)
ci_low, ci_high = diff - 1.96*se, diff + 1.96*se
print(diff, (ci_low, ci_high))

Interpretation: If p-value < 0.05 and the CI for the difference excludes 0, you have evidence that B outperforms A by about the CI range.

Try it: compute required sample size

Target detectable lift: +0.6pp (5.2% to 5.8%), alpha=0.05, power=0.8. Use a power calculator or approximate with normal formula:

from math import sqrt
from scipy.stats import norm

p1 = 0.052
p2 = 0.058
alpha = 0.05
power = 0.80
z_alpha = norm.ppf(1 - alpha/2)
z_beta = norm.ppf(power)
pooled = (p1 + p2)/2
se_part = p1*(1-p1) + p2*(1-p2)
n_per_group = se_part * (z_alpha + z_beta)**2 / (p2 - p1)**2
print(round(n_per_group))

2) Linear regression for pricing

Predict price from size and location rating; check assumptions and interpret coefficients.

import statsmodels.api as sm
import pandas as pd

# Fake data
df = pd.DataFrame({
    'price': [210, 220, 250, 260, 275, 300, 320, 350, 360, 390],
    'size_m2': [45, 48, 52, 55, 58, 60, 65, 70, 72, 80],
    'loc_rating': [3.1, 3.0, 3.2, 3.4, 3.5, 3.6, 3.8, 4.0, 4.1, 4.3]
})
X = sm.add_constant(df[['size_m2', 'loc_rating']])
model = sm.OLS(df['price'], X).fit()
print(model.summary())

# Residual diagnostics
resid = model.resid
print('Mean residual ~ 0:', resid.mean())
print('Homoskedasticity proxy: corr(|resid|, fitted)')
import numpy as np
print(np.corrcoef(np.abs(resid), model.fittedvalues)[0,1])

Interpret: Coefficients show marginal effect holding other features fixed. Check linearity (residual vs fitted), normality (QQ plot), and influential points (Cook's distance) when making inference.

Try it: add an interaction

Add size_m2 * loc_rating and see if fit improves (lower AIC, significant coefficient).

3) Bootstrap CI for the median

When data are skewed or heavy-tailed, bootstrap the median's CI.

import numpy as np
rng = np.random.default_rng(7)
data = rng.lognormal(mean=1.5, sigma=0.8, size=100)

B = 5000
boot_meds = []
for _ in range(B):
    sample = rng.choice(data, size=len(data), replace=True)
    boot_meds.append(np.median(sample))

ci = (np.percentile(boot_meds, 2.5), np.percentile(boot_meds, 97.5))
print('Median:', np.median(data), '95% CI:', ci)

Report: median and 95% bootstrap CI. State that CI is from resampling.

4) Bayesian update for a conversion rate

Prior belief: Beta(1,1) (uniform). Observe 60 conversions out of 1000. Posterior is Beta(1+60, 1+940).

from scipy.stats import beta
alpha_post, beta_post = 1+60, 1+940
mean = alpha_post / (alpha_post + beta_post)
ci = beta.ppf([0.025, 0.975], alpha_post, beta_post)
print('Posterior mean:', mean, '95% credible interval:', ci)

Compare to frequentist CI for a proportion; they will be similar for weak priors and moderate n.

5) Time series quickstart: trend, seasonality, stationarity

import pandas as pd
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller

# Simulated monthly data
dates = pd.date_range('2022-01-01', periods=36, freq='M')
trend = np.linspace(100, 140, 36)
season = 10*np.sin(2*np.pi*np.arange(36)/12)
noise = np.random.default_rng(0).normal(0, 3, 36)
series = pd.Series(trend + season + noise, index=dates)

result = seasonal_decompose(series, model='additive', period=12)
# result.trend, result.seasonal, result.resid are available

adf_stat, pvalue, *_ = adfuller(series.dropna())
print('ADF p-value:', pvalue)

If p-value is high, the series may be non-stationary; difference the series and test again.

6) Benjamini–Hochberg (FDR) for many metrics

Suppose you ran 10 significance tests and got these p-values:

import numpy as np
p = np.array([0.001, 0.004, 0.012, 0.019, 0.041, 0.052, 0.12, 0.23, 0.31, 0.77])
alpha = 0.05
m = len(p)
order = np.argsort(p)
p_sorted = p[order]
thresholds = alpha * (np.arange(1, m+1)/m)
# Find largest k with p_(k) <= threshold_k
k = np.where(p_sorted <= thresholds)[0]
cut_index = k.max() if len(k) else -1
significant_mask_sorted = np.zeros(m, dtype=bool)
if cut_index >= 0:
    significant_mask_sorted[:cut_index+1] = True
# Map back to original order
significant_mask = np.zeros(m, dtype=bool)
significant_mask[order] = significant_mask_sorted
print('Significant flags:', significant_mask)

This controls expected false discovery rate at 5% across all tests.

Drills and micro-exercises

Common mistakes and how to debug

  • Peeking in A/B tests: Stopping early on a significant result inflates false positives. Fix: use a fixed sample plan or a sequential method designed for peeking.
  • Misreading p-values: p=0.04 is not a 96% chance the effect is real. Correct: it is the probability of data as extreme assuming no effect.
  • Ignoring assumptions: t-tests and OLS need approximate normality of errors and independence. Fix: inspect residuals; use non-parametric or robust methods if violated.
  • Multiple comparisons: Testing many segments inflates false positives. Fix: pre-register metrics; control FDR with BH.
  • Overfitting regression: Too many features relative to n. Fix: cross-validate; simplify model; regularize.
  • Confusing correlation with causation: Observational differences are not causal. Fix: use experiments or causal methods.
Debugging checklist

Mini project: Ship a trustworthy A/B test report

  1. Define metrics: Primary conversion rate; secondary metrics (e.g., revenue per user, click-through). State hypotheses and alpha. Decide on FDR control for secondary metrics.
  2. Plan sample size: Pick a minimum detectable effect and 80% power; compute per-group n.
  3. Collect data: Ensure random assignment, logging of exposure, conversions, and timestamps.
  4. Analyze: For the primary metric, run a two-proportion z-test and compute a 95% CI. For secondary metrics, apply BH FDR.
  5. Diagnostics: Check balance (user counts, baseline rate). Look for novelty effects or time trends.
  6. Report: Summarize effect sizes with uncertainty, decisions, limitations, and recommendations.
Deliverables
  • Notebook/script with computations and plots.
  • One-page summary: objective, method, main result with CI, guardrail metrics, decision, next steps.

Subskills

  • Descriptive Statistics: Summaries (mean, median, variance, quantiles), outliers, and shape of distributions.
  • Sampling And Distributions: Random sampling, CLT, and common distributions (Normal, t, Binomial, Poisson).
  • Estimation And Confidence Intervals: Standard error, CIs for means/proportions, bootstrap.
  • Hypothesis Testing: t-tests, z-tests for proportions, chi-square tests, p-values, power.
  • Regression Basics: Linear regression, interpretation, diagnostics, simple regularization awareness.
  • Bayesian Basics: Priors, likelihood, Beta-Binomial updates, credible intervals.
  • Time Series Basics: Trend, seasonality, stationarity, simple forecasting baselines.
  • Statistical Assumptions And Diagnostics: Residual checks, influence, robustness.
  • Multiple Testing And False Discovery Awareness: FDR control with Benjamini–Hochberg.

Who this is for

  • Aspiring and junior Data Scientists who need solid inference skills for experiments and modeling.
  • Analysts and ML engineers who want to quantify uncertainty and make defensible decisions.

Prerequisites

  • Comfort with basic algebra and functions.
  • Python basics (lists, arrays) or R basics; ability to run notebooks or scripts.
  • Familiarity with data frames (pandas or similar) is helpful.

Learning path

  1. Start with Descriptive Statistics and Sampling And Distributions.
  2. Move to Estimation And Confidence Intervals and Hypothesis Testing.
  3. Practice Regression Basics and Statistical Assumptions And Diagnostics.
  4. Add Bayesian Basics and Time Series Basics.
  5. Finish with Multiple Testing And False Discovery Awareness and a capstone A/B analysis.

Practical projects

  • Analyze a funnel: compute stage-wise rates with CIs; identify the biggest drop with uncertainty.
  • Marketing uplift: test email versions; size the test; run BH over multiple segments.
  • Retention forecast: decompose weekly active users; build a naive seasonal forecast; evaluate error.
  • Pricing model: regress price on features; validate assumptions; communicate elasticities.

Next steps

  • Complete the subskills below in order.
  • Do the mini project and share your one-page report with a peer.
  • Take the skill exam to check readiness. Anyone can take it; logged-in users get saved progress.

Statistics — Skill Exam

This exam checks practical understanding of statistics for Data Scientists: estimation, testing, regression, Bayesian basics, time series, and multiple testing. Score 70% or higher to pass. Anyone can take the exam; if you are logged in, your progress and results will be saved automatically.How it worksSingle-choice and multi-select questions; some numeric answers.No trick questions; show your reasoning where helpful.You can retake the exam. Use the recommendations after submitting.

13 questions70% to pass

Have questions about Statistics?

AI Assistant

Ask questions about this tool