How to learn Estimation And Confidence Intervals for Statistics in Data Scientist for free

Who this is for

Aspiring and practicing Data Scientists who run A/B tests, report metrics, or compare models.
Analysts/engineers who need defensible estimates and margins of error for stakeholders.
Anyone who wants to quantify uncertainty, not just give a single number.

Prerequisites

Basic algebra and comfort with fractions/percentages.
Familiarity with mean, proportion, standard deviation, and sampling.
Very light probability: independence, normal distribution idea.

Why this matters

In Data Science, you rarely know the true population value. You estimate it and must communicate how uncertain you are. Confidence intervals (CIs) let you:

Report conversion rate with a margin of error after an A/B test.
Share average session time or latency with uncertainty to product/engineering.
Compare two models' accuracies and state whether the difference is meaningful.
Plan sample size to hit a target precision before running an experiment.

Concept explained simply

A point estimate (like a sample mean or proportion) is your best single guess of a true but unknown parameter. Because samples vary, your estimate varies too. The standard error (SE) measures how much estimates vary across hypothetical repeated samples.

A confidence interval is an algorithm that builds a plausible range for the true parameter. A 95% CI means: if you repeated the whole sampling-and-interval-building process many times, about 95% of those intervals would contain the true value. For the one interval you computed, think of it as a method with 95% reliability—not a 95% chance that “this exact interval” contains the truth.

Mental model

Imagine throwing darts (your estimates) at a hidden bullseye (the true value). The SE is how spread out your darts land. A CI is a circle you draw around each dart. Higher confidence draws a bigger circle; more data draws a tighter circle.

Key reference values

95% z* ≈ 1.96
For unknown σ and moderate n, use t* with df = n−1 (e.g., df=24, t* ≈ 2.064)
SE(mean) with unknown σ: s/√n
SE(proportion): √[ p̂(1−p̂)/n ]

How to build common intervals

1) Mean (σ unknown) — t-interval

CI: x̄ ± t* · s/√n. Use t* from t-table with df = n−1. Works well when data are roughly symmetric or n is moderate/large by CLT.

2) Proportion — z-interval

CI: p̂ ± z* · √[ p̂(1−p̂)/n ]. Good for reasonably large n and p̂ not too close to 0 or 1. For small n or extreme p̂, Wilson or Agresti–Coull intervals are more stable.

3) Difference in means (independent samples)

CI: (x̄1 − x̄2) ± t* · √( s1²/n1 + s2²/n2 ). Use Welch's t* with approximate df when variances are unequal (typical in practice).

4) Difference in proportions

CI: (p̂1 − p̂2) ± z* · √( p̂1(1−p̂1)/n1 + p̂2(1−p̂2)/n2 ).

5) Bootstrap (for medians or complex stats)

Resample your data with replacement many times, compute the statistic each time, and take percentile bounds (e.g., 2.5th and 97.5th percentiles for 95% CI). Great when formulas are hard or assumptions are shaky.

Assumptions checklist (quick)

Independence: samples are independent (or properly paired when required).
Sampling: representative of the population or randomized assignment in experiments.
Mean with t: data roughly symmetric or n sufficiently large.
Proportions: n·p̂ and n·(1−p̂) reasonably large; otherwise consider Wilson/Agresti–Coull or bootstrap.

What if assumptions fail?

Use robust/transformations for skewed means (e.g., log-transform times, then back-transform).
Use bootstrap intervals for medians or skewed distributions.
For clustered/correlated data, use appropriate hierarchical or cluster-robust methods.

Worked examples

Example 1: Mean API latency (t-interval)

Sample of n=25 requests: x̄=8.2 ms, s=1.5 ms. 95% CI.

SE = 1.5/√25 = 0.3
t* (df=24) ≈ 2.064
MOE = 2.064·0.3 ≈ 0.619
CI = 8.2 ± 0.619 → [7.58, 8.82] ms

Interpretation: If we repeat this many times, about 95% of such intervals would contain the true mean latency.

Example 2: Conversion rate CI (proportion)

Product page: 120 signups among 500 visitors → p̂=0.24. 95% CI using z-interval.

SE = √[0.24·0.76/500] ≈ 0.0191
MOE = 1.96·0.0191 ≈ 0.037
CI = 0.24 ± 0.037 → [0.203, 0.277]

Interpretation: The true conversion rate is plausibly 20.3%–27.7%.

Example 3: A/B difference in proportions

A: 150/1000=0.15, B: 180/1000=0.18. Estimate difference p̂B − p̂A = 0.03.

SE = √[0.15·0.85/1000 + 0.18·0.82/1000] ≈ 0.0166
MOE = 1.96·0.0166 ≈ 0.033
CI = 0.03 ± 0.033 → [−0.003, 0.063]

Interpretation: Interval includes 0 → not a clear lift at 95% confidence.

Example 4: Difference in means (Welch)

Control (n1=40): x̄1=5.2, s1=1.8; Variant (n2=37): x̄2=4.6, s2=1.5. Difference x̄1−x̄2 = 0.6.

SE = √(1.8²/40 + 1.5²/37) ≈ 0.377
df ≈ 74 → t* ≈ 1.99
MOE ≈ 1.99·0.377 ≈ 0.75
CI = 0.6 ± 0.75 → [−0.15, 1.35]

Interpretation: Not conclusive at 95%.

Choosing the right interval

If you estimate a mean with unknown σ: t-interval (Welch for two means).
If you estimate a proportion or difference in proportions: z-interval; consider Wilson/Agresti–Coull for small n/extremes.
If the statistic is a median or complex metric: bootstrap percentile CI.
If data are paired (before/after): use paired t-interval or bootstrap differences.

Planning sample size

Mean (known σ approx): n ≈ (z*·σ / MOE)²
Proportion: n ≈ (z*² · p*(1−p*)) / MOE²; use p*=0.5 for conservative planning if unsure.

Exercises

These mirror the exercises below so you can practice and then check against the solutions.

Exercise 1: Compute a 95% CI for the average API latency with n=36, sample mean 220 ms, sample sd 60 ms. State the interval and a one-sentence interpretation.
Exercise 2: A/B test. A: 62/520 converted. B: 81/540 converted. Build a 95% CI for p̂B − p̂A and say whether it indicates a significant lift.

Checklist before you compute:
- Have I selected the right interval (t for mean, z for proportion)?
- Did I compute SE correctly?
- Did I use the right critical value (t* or z*)?
- Is my interpretation about the method's reliability, not probability of this one interval?

Common mistakes and how to self-check

Saying “there is a 95% probability the true value is in this interval.” Self-check: Rephrase to “If we repeated this many times, 95% of intervals would contain the truth.”
Using z instead of t for small n and unknown σ. Self-check: Is σ known? If not, use t.
Ignoring dependence/paired data. Self-check: Are observations matched or measured twice? Use paired methods.
Reporting many decimals. Self-check: Round to decision-friendly precision (e.g., percentage points).
Overlooking small-sample proportion issues. Self-check: If n is small or p̂ near 0/1, prefer Wilson/Agresti–Coull or bootstrap.

Practical projects

Experiment dashboard: For each ongoing A/B test, show p̂, CI, and whether 0 is included for the lift.
Latency report: Weekly mean latency with t-intervals; include a trend of CI widths as n changes.
Model evaluation: After cross-validation, report accuracy with a CI (bootstrap across folds) and compare two models with a CI on the difference.

Learning path

Before this: Sampling and the Central Limit Theorem basics.
This step: Point estimates, SE, t/z intervals, difference intervals, bootstrap.
Next: Hypothesis testing and p-values, then power analysis and sample size planning; later, regression coefficients and their CIs.

Mini challenge

You measured median time-on-task for a new feature with n=200 sessions. You suspect strong skew. Build a bootstrap 95% CI for the median (1,000 resamples). Write one sentence explaining why bootstrap was appropriate here.

Hint

Skewed distributions and non-linear statistics (like the median) often violate simple normal-based approximations.

Next steps

Apply CI reporting to your current KPIs: add ranges, not just point estimates.
Set a target margin of error for your next experiment and back-solve for n.
Explore Wilson intervals for proportions and bootstrap for medians to handle edge cases well.

Quick test is available to everyone for free. If you log in, your progress will be saved.

Menu

Estimation And Confidence Intervals

Table of Contents