Why this matters
Confidence intervals (CIs) tell you the plausible range of the true effect in an A/B test. As a Data Analyst, you will be asked to:
- Decide if a test is significant and practically meaningful.
- Translate intervals into business impact (e.g., conversion points, revenue per user).
- Communicate uncertainty clearly to stakeholders.
- Set appropriate guardrails (e.g., do not harm signup rate by more than X pp).
Concept explained simply
A 95% confidence interval is a range built from your sample that will contain the true effect in 95% of repeated, identical experiments. It is about the procedure’s long-run performance, not the probability of the true effect being in this specific range.
Mental model
Think of CI as a fishing net you throw around the unknown effect. A wider net (wide CI) means you’re less sure; a tighter net (narrow CI) means better precision. More data shrinks the net.
Key points you’ll use daily
- If the CI for the difference (B − A) excludes 0, it’s statistically significant at that confidence level.
- Absolute vs relative: report both. Example: +0.6 percentage points (pp) absolute = +12% relative if baseline is 5.0%.
- Precision matters: narrow CIs enable confident decisions; wide CIs suggest “need more data.”
- Overlap trap: overlapping single-variant CIs do not necessarily mean the difference is insignificant. Always compute the CI for the difference.
- Two-sided vs one-sided: most product decisions use two-sided CIs unless you pre-specify a one-sided risk direction.
How the CI is built (simple formulas you can use)
For conversion rate p (proportion): CI ≈ p ± z * sqrt(p(1 − p)/n)
For difference of two proportions (B − A): CI ≈ (pB − pA) ± z * sqrt( pA(1 − pA)/nA + pB(1 − pB)/nB )
For mean x̄ with sample SD s: CI ≈ x̄ ± z * s/√n (use t for small n). For difference of means: (x̄B − x̄A) ± z * sqrt( sA²/nA + sB²/nB )
z is 1.96 for 95%, 1.64 for 90%, 2.58 for 99% (approx).
Worked examples
Example 1: Conversion rate difference
Variant A: 10,000 visitors, 500 conversions (5.0%). Variant B: 10,200 visitors, 571 conversions (5.6%).
- Point estimate: B − A = 0.056 − 0.050 = +0.006 = +0.6 pp.
- SE ≈ sqrt(0.05*0.95/10000 + 0.056*0.944/10200) ≈ 0.00315.
- 95% margin ≈ 1.96 * 0.00315 ≈ 0.00618 (0.618 pp).
- 95% CI for diff: 0.006 ± 0.00618 → [−0.00018, 0.01218] ≈ [−0.02 pp, +1.22 pp].
Interpretation: The effect could be slightly negative or as high as +1.2 pp. Not significant at 95% (CI includes 0). Decision: Need more data or accept uncertainty.
Example 2: Average order value (difference of means)
A: n=1200, mean=$48, SD=$30. B: n=1210, mean=$50, SD=$30.
- Diff = $2.
- SE ≈ sqrt(30²/1200 + 30²/1210) ≈ 1.22.
- 95% margin ≈ 1.96 * 1.22 ≈ $2.39.
- 95% CI for diff: $2 ± $2.39 → [−$0.39, +$4.39].
Interpretation: Not significant; could be slightly worse or up to $4.39 better.
Example 3: Events per user (mean rate)
A: n=5000, mean=3.2, SD=2.8. B: n=5000, mean=3.5, SD=2.9.
- Diff = 0.3.
- SE ≈ sqrt(2.8²/5000 + 2.9²/5000) ≈ 0.057.
- 95% margin ≈ 1.96 * 0.057 ≈ 0.112.
- 95% CI: 0.3 ± 0.112 → [0.188, 0.412].
Interpretation: Significant improvement. Likely increase is between +0.19 and +0.41 events per user.
How to interpret CIs in practice
- State the metric and unit. e.g., difference in conversion rate (pp), difference in revenue/user ($), difference in sessions/user.
- Quote the CI and level. e.g., “95% CI for B − A is [+0.2, +0.4] events per user.”
- Check statistical significance. Does the CI exclude 0?
- Check practical significance. Compare the lower bound to your minimum detectable effect (MDE) or business threshold.
- Translate to business impact. Lower bound × traffic × value per event to estimate conservative upside.
Picking confidence levels (90% / 95% / 99%)
- 95% is a balanced default for product decisions.
- 90% is less conservative (narrower CI) when speed matters and risk is lower.
- 99% is more conservative (wider CI) for high-stakes changes.
Common mistakes and self-check
- Misreading probability: “95% probability the true effect is in this interval” — Incorrect. Instead: “This method captures the true effect 95% of the time.”
- Using single-variant CIs to infer difference: Always compute CI for B − A.
- Ignoring practical significance: A significant +0.1 pp may be too small to matter.
- Cherry-picking sides post-hoc: Choose one- or two-sided before the test.
- Unit mismatch: Reporting relative when stakeholders need absolute (or vice versa). Provide both.
Self-check before sharing results
- Did I state the metric (proportion/mean), CI level, and the exact interval?
- Did I compute the CI for the difference and check if it excludes 0?
- Did I compare the lower bound to our MDE/threshold?
- Did I include absolute and relative effects?
- Did I highlight assumptions (independent samples, sample size adequate)?
Exercises (practice now)
These mirror the interactive exercises below. Do them to lock in the skill.
- Exercise 1: Compute a 95% CI for difference in conversion rate and decide if it’s significant and practically meaningful.
- Exercise 2: Interpret a reported CI for average order value and make a decision with a given threshold.
- Compute/interpret a CI for proportions.
- Compute/interpret a CI for means.
- Decide significance (excludes 0?).
- Decide practical significance (lower bound vs threshold).
Who this is for
Data Analysts and people running or supporting A/B tests who need to interpret results accurately and communicate uncertainty to product and business stakeholders.
Prerequisites
- Basic probability and averages.
- Understanding of A/B test setup (control vs variant).
- Comfort with percentages, proportion, and standard deviation (helpful).
Learning path
- Interpret single-variant CIs (proportion, mean).
- Interpret CIs for differences (B − A).
- Connect CIs to decision thresholds (MDE, guardrails).
- Report both statistical and practical significance.
Practical projects
- CI Calculator in a Spreadsheet: Build sheets that compute 95% CIs for a single proportion and for the difference of two proportions and means. Validate with small test cases.
- Past Test Re-Analysis: Take a historical A/B test. Recompute the CI for B − A and write a 3-sentence decision note focusing on the lower bound and business threshold.
- A/A Simulation (optional): Simulate two samples from the same proportion (e.g., 5%) 100 times and count how often the 95% CI excludes 0. Expect about 5% false positives.
Mini challenge
Your colleague says: “Variant B is better because its conversion CI is [5.2%, 6.0%] and A’s is [4.6%, 5.4%]; intervals overlap slightly but B looks higher.” Write one sentence to correct this and one sentence for the decision you’d make today.
Possible response
We must use the CI for the difference (B − A); overlapping single-variant CIs don’t determine significance. Decision: compute the CI for B − A, check if it excludes 0, and compare the lower bound to our threshold before shipping.
When ready, take the quick test below. Note: Anyone can take the test; only logged-in users will have progress saved.