Why this matters
As a Data Scientist, you rely on samples to make decisions about products, users, and systems. The Law of Large Numbers (LLN) explains why sample averages settle near the true average as data grows. The Central Limit Theorem (CLT) tells you how the average behaves and lets you quantify uncertainty with confidence intervals and hypothesis tests. You will use these ideas to:
- Size and interpret A/B tests (conversion rates, uplift, minimum sample size).
- Build confidence intervals for metrics (CTR, average session time, revenue per user).
- Estimate risks and KPIs via simulation (Monte Carlo) and report error bars.
- Sanity-check model evaluation metrics and aggregated dashboards.
Concept explained simply
- Law of Large Numbers (LLN): If you repeat the same random process many times under the same conditions, the sample average will get closer to the true mean. Bigger n → more stable averages.
- Central Limit Theorem (CLT): For many independent, similar draws with finite variance, the average (properly standardized) is approximately normal, even if individual data points are not. This powers confidence intervals and p-values.
Mental model
Think of LLN as gravity for averages: extreme up-and-down swings get pulled toward the true mean as you add more observations.
Think of CLT as a smoothing machine: add many small, independent pieces of randomness, and their average looks bell-shaped. That bell shape lets you use z-scores and familiar probability rules.
Assumptions and when they can break
- Independence: Observations should not strongly influence each other (watch out for time series, seasonality, and clustered users).
- Identically distributed: Same data-generating process (avoid mixing very different groups unless modeled).
- Finite variance: Heavy-tailed data (e.g., Cauchy, extreme outliers) can break CLT/LLN in practice.
- Sample size: “Large n” depends on distribution shape; skewed/heavy-tailed data needs more samples.
Practical safeguards
- Inspect distributions: histograms, quantiles, and outlier checks.
- Stabilize variance: transform skewed data (e.g., log for revenue > 0), or use robust summaries (median, trimmed mean).
- Account for dependence: cluster-robust SEs, block bootstraps, or aggregate at the independent unit (e.g., per-user).
Worked examples
Example 1 — A/B test: proportion CI with CLT
Scenario: Variant B gets n = 4000 visitors; 480 convert → \(\hat{p} = 0.12\). Approximate 95% CI for the true conversion p.
- SE = sqrt( p̂(1 − p̂) / n ) = sqrt(0.12 × 0.88 / 4000) ≈ sqrt(0.0000264) ≈ 0.00514.
- 95% CI ≈ p̂ ± 1.96 × SE = 0.12 ± 1.96 × 0.00514 ≈ 0.12 ± 0.0101.
- Result: [0.1099, 0.1301] (about 11.0% to 13.0%).
Interpretation: If repeated many times, 95% of such intervals would contain the true p.
Example 2 — Average time-on-site: mean CI with CLT
Scenario: n = 50 sessions; sample mean m = 5.4 minutes; sample sd s = 2.0.
- SE(m) ≈ s / sqrt(n) = 2.0 / 7.071 ≈ 0.283.
- 95% CI ≈ 5.4 ± 1.96 × 0.283 ≈ 5.4 ± 0.554 → [4.85, 5.95] minutes.
Note: With small n and unknown variance, a t-interval is more precise, but the CLT-based z-interval is often close by n≈50.
Example 3 — Sum of uniforms looks normal (CLT)
Scenario: S = U1 + ... + U12 with Ui ~ Uniform(0,1) i.i.d.
- Mean(S) = 12 × 0.5 = 6.
- Var(S) = 12 × 1/12 = 1 → SD(S) = 1.
- Approximate P(4.5 ≤ S ≤ 7.5) via Normal(6, 1): z-lower = −1.5, z-upper = 1.5 → ≈ 0.9332 − 0.0668 = 0.8664 (about 86.6%).
Exercises
Try these, then open the solutions to verify. The same exercises appear in the Exercises section for progress tracking.
- ex1 — Proportion CI: A feature was shown to n = 2500 users; 520 clicked. Compute p̂, the standard error, and a 95% CI.
- ex2 — Sample size for margin of error: You want a 95% CI for a proportion with ±2 percentage points margin (±0.02), worst-case variance. What minimum n is needed?
- ex3 — Probability the sample mean is close: For data with sd = 3.6 and n = 36, what is P(|x̄ − μ| ≤ 0.6) using CLT?
Show solutions
Brief answers (full steps are in the Exercise solutions below):
- ex1: p̂ = 0.208; SE ≈ 0.0081; 95% CI ≈ [0.192, 0.224].
- ex2: n ≈ 2401.
- ex3: SE = 0.6; P(|Z| ≤ 1) ≈ 0.6827 (≈ 68%).
Self-check checklist
- I can explain LLN vs CLT in one sentence each.
- I can compute SE for a proportion and for a mean.
- I know when to use a t-interval (small n, unknown σ) vs z-interval.
- I sanity-check independence and consider clustering/time effects.
- I can estimate sample size from a desired margin of error.
Common mistakes and how to catch them
- Assuming normality for raw data instead of the sample mean. CLT applies to averages/sums, not necessarily to individual observations.
- Ignoring dependence (e.g., the same user appears multiple times). Fix by aggregating per user or using clustered SEs.
- Using z-intervals with very small n and unknown σ. Prefer t-intervals or bootstrapping.
- For heavy-tailed metrics (revenue), using the mean without checks. Consider log-transform, winsorizing, or robust statistics.
- Misreading confidence intervals as probability statements about a fixed parameter. CI describes the procedure’s long-run coverage.
Quick self-audit
- Plot your metric. If it’s highly skewed, consider transforms before CI/CLT use.
- Check unit of independence (user/session/day). Aggregate accordingly.
- Document assumptions and any robustness checks (trimmed mean, sensitivity analysis).
Practical projects
- A/B test simulation: simulate conversions for two variants at different sample sizes, plot the distribution of p̂ and its CI coverage.
- CLT in action: repeatedly sample averages from a skewed distribution (e.g., lognormal) for n = 5, 30, 200; visualize histograms of x̄.
- Robust KPIs: compare mean vs median (and trimmed mean) for heavy-tailed revenue; show how CI widths and stability differ.
Who this is for
Data Scientist and ML/AI practitioners who analyze experiments, monitor metrics, or build data-informed features, and need reliable uncertainty estimates.
Prerequisites
- Basic probability (random variables, mean, variance).
- Notions of independence and identically distributed samples.
- Comfort with arithmetic, square roots, and z-scores.
Learning path
- Review LLN and CLT concepts and assumptions.
- Practice computing SE and CIs for means and proportions.
- Learn when to use t-distribution and robust/bootstrapped intervals.
- Apply to A/B testing, dashboards, and simulations.
- Perform sensitivity checks for skew, outliers, and clustering.
Mini challenge
Your daily revenue per user is heavy-tailed. You need a 95% CI for mean revenue. What’s your plan?
Possible plan
- Inspect distribution; consider log-transform or winsorize top 1%.
- Aggregate at user-level to improve independence.
- Use a t-interval on transformed mean (then back-transform carefully) or bootstrap a CI for the mean.
- Compare with a trimmed-mean CI as a robustness check.
Next steps
- Practice on your team’s metrics: compute SE and 95% CI for a key KPI weekly.
- Design a small A/B test and pre-compute the required n for the desired margin of error.
- Implement a bootstrap CI for a skewed metric and compare to CLT-based CI.
Quick Test
The quick test below checks your understanding. Everyone can take it for free; if you are logged in, your progress will be saved automatically.