luvv to helpDiscover the Best Free Online Tools
Topic 4 of 9

Estimation And Confidence Intervals

Learn Estimation And Confidence Intervals for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Who this is for

  • Aspiring and practicing Data Scientists who run A/B tests, report metrics, or compare models.
  • Analysts/engineers who need defensible estimates and margins of error for stakeholders.
  • Anyone who wants to quantify uncertainty, not just give a single number.

Prerequisites

  • Basic algebra and comfort with fractions/percentages.
  • Familiarity with mean, proportion, standard deviation, and sampling.
  • Very light probability: independence, normal distribution idea.

Why this matters

In Data Science, you rarely know the true population value. You estimate it and must communicate how uncertain you are. Confidence intervals (CIs) let you:

  • Report conversion rate with a margin of error after an A/B test.
  • Share average session time or latency with uncertainty to product/engineering.
  • Compare two models' accuracies and state whether the difference is meaningful.
  • Plan sample size to hit a target precision before running an experiment.

Concept explained simply

A point estimate (like a sample mean or proportion) is your best single guess of a true but unknown parameter. Because samples vary, your estimate varies too. The standard error (SE) measures how much estimates vary across hypothetical repeated samples.

A confidence interval is an algorithm that builds a plausible range for the true parameter. A 95% CI means: if you repeated the whole sampling-and-interval-building process many times, about 95% of those intervals would contain the true value. For the one interval you computed, think of it as a method with 95% reliability—not a 95% chance that “this exact interval” contains the truth.

Mental model

Imagine throwing darts (your estimates) at a hidden bullseye (the true value). The SE is how spread out your darts land. A CI is a circle you draw around each dart. Higher confidence draws a bigger circle; more data draws a tighter circle.

Key reference values
  • 95% z* ≈ 1.96
  • For unknown σ and moderate n, use t* with df = n−1 (e.g., df=24, t* ≈ 2.064)
  • SE(mean) with unknown σ: s/√n
  • SE(proportion): √[ p̂(1−p̂)/n ]

How to build common intervals

1) Mean (σ unknown) — t-interval

CI: x̄ ± t* · s/√n. Use t* from t-table with df = n−1. Works well when data are roughly symmetric or n is moderate/large by CLT.

2) Proportion — z-interval

CI: p̂ ± z* · √[ p̂(1−p̂)/n ]. Good for reasonably large n and p̂ not too close to 0 or 1. For small n or extreme p̂, Wilson or Agresti–Coull intervals are more stable.

3) Difference in means (independent samples)

CI: (x̄1 − x̄2) ± t* · √( s1²/n1 + s2²/n2 ). Use Welch's t* with approximate df when variances are unequal (typical in practice).

4) Difference in proportions

CI: (p̂1 − p̂2) ± z* · √( p̂1(1−p̂1)/n1 + p̂2(1−p̂2)/n2 ).

5) Bootstrap (for medians or complex stats)

Resample your data with replacement many times, compute the statistic each time, and take percentile bounds (e.g., 2.5th and 97.5th percentiles for 95% CI). Great when formulas are hard or assumptions are shaky.

Assumptions checklist (quick)
  • Independence: samples are independent (or properly paired when required).
  • Sampling: representative of the population or randomized assignment in experiments.
  • Mean with t: data roughly symmetric or n sufficiently large.
  • Proportions: n·p̂ and n·(1−p̂) reasonably large; otherwise consider Wilson/Agresti–Coull or bootstrap.
What if assumptions fail?
  • Use robust/transformations for skewed means (e.g., log-transform times, then back-transform).
  • Use bootstrap intervals for medians or skewed distributions.
  • For clustered/correlated data, use appropriate hierarchical or cluster-robust methods.

Worked examples

Example 1: Mean API latency (t-interval)

Sample of n=25 requests: x̄=8.2 ms, s=1.5 ms. 95% CI.

  • SE = 1.5/√25 = 0.3
  • t* (df=24) ≈ 2.064
  • MOE = 2.064·0.3 ≈ 0.619
  • CI = 8.2 ± 0.619 → [7.58, 8.82] ms

Interpretation: If we repeat this many times, about 95% of such intervals would contain the true mean latency.

Example 2: Conversion rate CI (proportion)

Product page: 120 signups among 500 visitors → p̂=0.24. 95% CI using z-interval.

  • SE = √[0.24·0.76/500] ≈ 0.0191
  • MOE = 1.96·0.0191 ≈ 0.037
  • CI = 0.24 ± 0.037 → [0.203, 0.277]

Interpretation: The true conversion rate is plausibly 20.3%–27.7%.

Example 3: A/B difference in proportions

A: 150/1000=0.15, B: 180/1000=0.18. Estimate difference p̂B − p̂A = 0.03.

  • SE = √[0.15·0.85/1000 + 0.18·0.82/1000] ≈ 0.0166
  • MOE = 1.96·0.0166 ≈ 0.033
  • CI = 0.03 ± 0.033 → [−0.003, 0.063]

Interpretation: Interval includes 0 → not a clear lift at 95% confidence.

Example 4: Difference in means (Welch)

Control (n1=40): x̄1=5.2, s1=1.8; Variant (n2=37): x̄2=4.6, s2=1.5. Difference x̄1−x̄2 = 0.6.

  • SE = √(1.8²/40 + 1.5²/37) ≈ 0.377
  • df ≈ 74 → t* ≈ 1.99
  • MOE ≈ 1.99·0.377 ≈ 0.75
  • CI = 0.6 ± 0.75 → [−0.15, 1.35]

Interpretation: Not conclusive at 95%.

Choosing the right interval

  • If you estimate a mean with unknown σ: t-interval (Welch for two means).
  • If you estimate a proportion or difference in proportions: z-interval; consider Wilson/Agresti–Coull for small n/extremes.
  • If the statistic is a median or complex metric: bootstrap percentile CI.
  • If data are paired (before/after): use paired t-interval or bootstrap differences.
Planning sample size
  • Mean (known σ approx): n ≈ (z*·σ / MOE)²
  • Proportion: n ≈ (z*² · p*(1−p*)) / MOE²; use p*=0.5 for conservative planning if unsure.

Exercises

These mirror the exercises below so you can practice and then check against the solutions.

  1. Exercise 1: Compute a 95% CI for the average API latency with n=36, sample mean 220 ms, sample sd 60 ms. State the interval and a one-sentence interpretation.
  2. Exercise 2: A/B test. A: 62/520 converted. B: 81/540 converted. Build a 95% CI for p̂B − p̂A and say whether it indicates a significant lift.
  • Checklist before you compute:
    • Have I selected the right interval (t for mean, z for proportion)?
    • Did I compute SE correctly?
    • Did I use the right critical value (t* or z*)?
    • Is my interpretation about the method's reliability, not probability of this one interval?

Common mistakes and how to self-check

  • Saying “there is a 95% probability the true value is in this interval.” Self-check: Rephrase to “If we repeated this many times, 95% of intervals would contain the truth.”
  • Using z instead of t for small n and unknown σ. Self-check: Is σ known? If not, use t.
  • Ignoring dependence/paired data. Self-check: Are observations matched or measured twice? Use paired methods.
  • Reporting many decimals. Self-check: Round to decision-friendly precision (e.g., percentage points).
  • Overlooking small-sample proportion issues. Self-check: If n is small or p̂ near 0/1, prefer Wilson/Agresti–Coull or bootstrap.

Practical projects

  • Experiment dashboard: For each ongoing A/B test, show p̂, CI, and whether 0 is included for the lift.
  • Latency report: Weekly mean latency with t-intervals; include a trend of CI widths as n changes.
  • Model evaluation: After cross-validation, report accuracy with a CI (bootstrap across folds) and compare two models with a CI on the difference.

Learning path

  • Before this: Sampling and the Central Limit Theorem basics.
  • This step: Point estimates, SE, t/z intervals, difference intervals, bootstrap.
  • Next: Hypothesis testing and p-values, then power analysis and sample size planning; later, regression coefficients and their CIs.

Mini challenge

You measured median time-on-task for a new feature with n=200 sessions. You suspect strong skew. Build a bootstrap 95% CI for the median (1,000 resamples). Write one sentence explaining why bootstrap was appropriate here.

Hint

Skewed distributions and non-linear statistics (like the median) often violate simple normal-based approximations.

Next steps

  • Apply CI reporting to your current KPIs: add ranges, not just point estimates.
  • Set a target margin of error for your next experiment and back-solve for n.
  • Explore Wilson intervals for proportions and bootstrap for medians to handle edge cases well.

Quick test is available to everyone for free. If you log in, your progress will be saved.

Practice Exercises

2 exercises to complete

Instructions

You observed n=36 requests with sample mean 220 ms and sample sd 60 ms. Compute the 95% CI for the true mean latency. Use a t-interval.

Report the interval (lower, upper) and one sentence interpreting it for a stakeholder.

Expected Output
Approximate 95% CI around (199.7 ms, 240.3 ms); interpretation describes long-run reliability of the method and plausible mean latency range.

Estimation And Confidence Intervals — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Estimation And Confidence Intervals?

AI Assistant

Ask questions about this tool