Why this matters
Knowing how long to run an A/B test prevents wasted time and misleading results. As a Data Analyst, you will be asked to:
- Estimate how many days/weeks are needed to detect a specific lift.
- Balance business urgency with statistical rigor.
- Plan traffic allocation across variants (A/B or A/B/n).
- Communicate realistic timelines to product, marketing, and engineering.
Concept explained simply
Runtime estimation answers two questions: how many samples do we need per variant (sample size), and how many calendar days will it take with our traffic?
- Baseline metric: your current rate or mean (e.g., 5% conversion).
- MDE (Minimum Detectable Effect): the smallest change you care to detect (absolute or relative).
- Alpha (significance): usually 0.05 (5%).
- Power: usually 0.8 (80%).
- Traffic inputs: daily eligible users, allocation (e.g., 50/50), and any filters (geo, device).
Mental model
Think of each variant as a bucket you must fill with enough users to confidently tell two nearby percentages apart. The rarer the event (low conversion) or the smaller the MDE, the more users you need in each bucket. Calendar time is just the rate at which users drip into each bucket.
Key formulas (practical approximations)
Two-variant proportion metric (e.g., conversion rate):
Let p1 = baseline rate, p2 = target rate = p1 + delta_abs
Let p_bar = (p1 + p2) / 2
Let Z_alpha2 = 1.96 (for alpha = 0.05, two-sided)
Let Z_beta = 0.84 (for power = 0.80)
Approximate sample size per variant:
n_per_variant β 2 * (Z_alpha2 + Z_beta)^2 * p_bar * (1 - p_bar) / (delta_abs)^2
Where delta_abs = p1 * relative_MDE if MDE is given as a % lift.
Continuous metric (e.g., revenue/order):
Let sigma = standard deviation, delta_abs = desired mean difference
n_per_variant β 2 * (Z_alpha2 + Z_beta)^2 * sigma^2 / (delta_abs)^2
Runtime in days:
eligible_daily = daily_users * eligibility_fraction
per_variant_daily = eligible_daily * allocation_fraction
runtime_days = ceil(n_per_variant / per_variant_daily)
4-step process to estimate running time
- Translate MDE: If relative, convert to absolute. Example: baseline 5%, +10% relative -> +0.5 pp absolute, so p2 = 5.5%.
- Compute sample size: Use the approximate formula above to get n per variant.
- Map to days: Divide by daily eligible traffic per variant; round up.
- Add calendar safeguards: Run at least 1 full business cycle (often 1β2 weeks) to cover weekday/weekend and avoid novelty effects.
Worked examples
Example 1: Website signup conversion (proportion metric)
- Baseline p1 = 5% (0.05)
- MDE = +10% relative => delta_abs = 0.05 * 0.10 = 0.005; p2 = 0.055
- Alpha = 0.05, Power = 0.80 (Z_alpha2 β 1.96, Z_beta β 0.84)
- Daily visitors = 20,000; eligibility = 70%; split = 50/50
p_bar = (0.05 + 0.055)/2 = 0.0525. Then:
n_per_variant β 2 * (2.8)^2 * 0.0525 * (1 - 0.0525) / (0.005)^2
β 31,152 users per variant (approx)
Eligible per day per variant = 20,000 * 0.70 * 0.5 = 7,000.
Runtime = ceil(31,152 / 7,000) β 5 days (minimum). Recommended: 1β2 weeks to span weekdays/weekend.
Example 2: Email subject line CTR (single send)
- Baseline p1 = 10% (0.10)
- MDE = +8% relative => delta_abs = 0.10 * 0.08 = 0.008; p2 = 0.108
- Alpha = 0.05, Power = 0.80
- Total recipients for test = 200,000; split = 50/50 (one send)
n_per_variant β 22,800 (approx)
Exposure available per variant = 100,000
=> Enough in one send; runtime = one campaign send.
Note: For once-off blasts, runtime is often one send, provided sample size per variant is met.
Example 3: Average order value (continuous metric)
- Mean = $60, sigma = $40
- MDE = +$3 (5% lift)
- Alpha = 0.05, Power = 0.80
- Daily orders = 800; split = 50/50
n_per_variant β 2 * (2.8)^2 * 40^2 / 3^2
β 2,778 orders per variant (approx)
Per-variant daily orders = 800 * 0.5 = 400
Runtime = ceil(2,778 / 400) β 7 days (minimum). Recommend 1β2 weeks.
Notes on multi-variant (A/B/n)
- Each additional variant reduces per-variant traffic if you keep total traffic fixed, increasing runtime.
- Controlling overall false positive rate may require multiple-comparison adjustments (e.g., tighter alpha), which further increases required sample size.
Exercises (hands-on)
Tip: The quick test is available to everyone; only logged-in users will have their progress saved.
-
Exercise 1 (mirrors the task below):
Estimate runtime for a signup conversion A/B test with: baseline 4%, MDE +15% relative, alpha 0.05, power 0.80; daily visitors 12,000; eligibility 60%; split 50/50. Compute: (a) absolute MDE, (b) sample size per variant, (c) minimum days, (d) recommended calendar run.
Self-check checklist
Common mistakes and how to self-check
- Using relative MDE directly as absolute. Fix: delta_abs = baseline * relative_MDE.
- Ignoring eligibility filters. Fix: multiply daily traffic by the fraction that actually enters the test.
- Stopping mid-week because sample size hit. Fix: cover full weekday/weekend cycles unless you pre-registered a sequential design.
- Changing split mid-test. Fix: keep allocations stable; re-estimate if you must change.
- Multiple variants without alpha control. Fix: plan comparisons and adjust alpha or use a valid multi-arm framework.
- Variance underestimation for continuous metrics. Fix: use recent data to estimate sigma conservatively.
Mini challenge
You have baseline conversion 3% and want to detect +12% relative. Daily visitors = 30,000, eligibility = 50%, split = 50/50. Alpha = 0.05, Power = 0.80.
- 1) Compute absolute MDE.
- 2) Approximate n per variant.
- 3) Minimum days and a reasonable recommended run.
Show a possible approach
1) delta_abs = 0.03 * 0.12 = 0.0036; p2 = 0.0336.
2) Use the proportion formula with p_bar = (0.03 + 0.0336)/2.
3) Per-variant daily = 30,000 * 0.5 * 0.5 = 7,500; days = ceil(n_per_variant / 7,500). Add 1β2 weeks if feasible.
Who this is for
- Data Analysts, Product Analysts, and Marketers planning A/B tests.
- Engineers and PMs who need quick, defensible test timelines.
Prerequisites
- Basic probability and understanding of conversion rates or means.
- Familiarity with alpha, power, and confidence intervals.
Learning path
- Understand metrics (binary vs continuous).
- Learn MDE, alpha, power, and sample size basics.
- Practice runtime estimation with real traffic inputs.
- Handle multi-variant and filtered traffic scenarios.
- Apply calendar safeguards and communicate timelines.
Practical projects
- Build a simple spreadsheet that calculates n per variant and runtime given inputs.
- Create a one-pager template to share test timelines with stakeholders.
- Backtest: compare your estimates to historical experiments and explain discrepancies.
Next steps
- Estimate runtimes for your next 3 planned experiments with different MDEs.
- Standardize a minimum runtime rule (e.g., at least 14 days) for your team.
- Learn when sequential testing is appropriate for faster, controlled stopping.
Quick Test (progress note)
This quick test is available to everyone. Only logged-in users will have their progress saved.