How to learn Experiment Duration Planning for Experiment Design in Product Analyst for free

Why this matters

As a Product Analyst, you will be asked questions like: When can we call this A/B test? Can we launch before the sale ends? How much traffic do we need to detect a 5% lift? Experiment duration planning answers these with defensible numbers so stakeholders can make timely, low-risk decisions.

Plan realistic launch dates and avoid underpowered tests.
Balance time-to-decision with statistical rigor.
Prevent false wins from peeking or seasonality.

Concept explained simply

Duration is how long you must run an experiment to collect enough high-quality data to detect the effect you care about. It depends on two things:

How many samples you need (sample size).
How quickly you get those samples (traffic per variant) and how long outcomes take to materialize (measurement delay).

For a binary metric (e.g., conversion rate), a common approximation for equal-sized variants is:

n_per_variant ≈ 2 × (Z_1−α/2 + Z_power)^2 × p × (1−p) / δ^2

p = baseline rate (e.g., 0.05)
δ = absolute Minimum Detectable Effect (MDE), e.g., +0.5 percentage points = 0.005
α = significance level (two-sided commonly 0.05, so Z ≈ 1.96)
power = typically 0.8 (Z ≈ 0.84)

For a continuous metric (e.g., revenue per user):

n_per_variant ≈ 2 × (Z_1−α/2 + Z_power)^2 × σ^2 / δ^2

σ = standard deviation of the metric
δ = absolute lift you want to detect

Then convert to calendar time:

days ≈ n_per_variant / daily_eligible_units_per_variant

Add buffers for ramp-up, weekly cycles, and any measurement delay (e.g., D7 retention requires at least 7 extra days after exposure).

Mental model

Levers: smaller MDE or higher variance → bigger sample → longer duration. More traffic → shorter duration. Delayed outcomes → longer calendar time.
Quality gates: ensure stable traffic, avoid mid-test changes, cover full weekly cycles.
Decision rule: commit to stop criteria (e.g., target sample size or fixed calendar window) before you start.

Quick reference: typical Z values

Two-sided α = 0.05 → Z_1−α/2 ≈ 1.96
Power = 0.80 → Z_power ≈ 0.84
Power = 0.90 → Z_power ≈ 1.28

Inputs you need

Primary metric type (binary or continuous) and unit of analysis (user, session, order).
Baseline value (rate or mean) and variability (p(1−p) or standard deviation).
MDE (absolute); optionally relate it to a % lift for stakeholders.
Alpha (typically 0.05, two-sided) and power (typically 0.8).
Traffic estimates: eligible population per day and allocation ratio (commonly 50/50).
Outcome delay (e.g., D7 retention), ramp plan, and expected seasonality.
Experiment length constraints (deadlines, releases, campaigns).

Worked examples

Goal: detect a +0.5 pp absolute lift from 5.0% to 5.5% (≈ +10% relative). α = 0.05 (two-sided), power = 0.8. Eligible traffic: 20,000 users/day, 50/50 allocation.

n_per_variant ≈ 2 × (1.96 + 0.84)^2 × 0.05 × 0.95 / 0.005^2

(1.96 + 0.84)^2 = 7.84; p(1−p) = 0.0475

n_per_variant ≈ 2 × 7.84 × 0.0475 / 0.000025 ≈ 29,800

Daily per variant ≈ 10,000 → ≈ 3.0 days of traffic. Practical plan: 1-day ramp + full week coverage → run ~8 days total.

Example 2: Average order value (AOV)

Goal: detect +$2 lift. Baseline mean = $50, σ = $70. α = 0.05, power = 0.8. Daily orders across both variants ≈ 3,000 (per variant ≈ 1,500).

n_per_variant ≈ 2 × 7.84 × 70^2 / 2^2 = 2 × 7.84 × 4,900 / 4 ≈ 19,200

Days ≈ 19,200 / 1,500 ≈ 12.8 days. Practical plan: 1–2 day ramp + 2 full weeks to cover weekly cycles.

Example 3: D7 retention

Goal: detect +2 pp absolute lift (25% → 27%). α = 0.05, power = 0.8. p(1−p) = 0.25 × 0.75 = 0.1875. Daily new eligible users = 15,000 (per variant 7,500). Outcome delay = 7 days.

n_per_variant ≈ 2 × 7.84 × 0.1875 / 0.02^2 = 2 × 1.47 / 0.0004 ≈ 7,350

Recruitment time ≈ 7,350 / 7,500 ≈ 1 day, then wait 7 days for D7 outcome → ~8 days minimum. Plan for a full 2-week calendar to cover weekly cycles and delays.

Plan your experiment duration: step-by-step

Define your primary metric and unit of analysis.
Collect baseline and variability (historical data or prior tests).
Choose α and power with stakeholders.
Set a practical MDE that is meaningful to the business.
Compute sample size per variant (proportion or mean formula).
Estimate daily eligible traffic per variant and compute raw days.
Add buffers: ramp-up, weekly cycle coverage, outcome delays.
Pre-register stopping rule and guardrails (e.g., SRM checks).
Communicate a calendar plan (start, checkpoints, expected stop).

Mini task: convert % MDE to absolute

If baseline conversion is 6% and stakeholders want +12% relative lift, absolute MDE δ = 0.06 × 0.12 = 0.0072 (0.72 pp).

Checklist before you start

Primary metric, baseline, and MDE are documented.
Alpha and power agreed with stakeholders.
Traffic estimate verified against recent weeks.
Ramp policy defined (e.g., 20% → 50% → 100%).
Outcome delays accounted for (e.g., retention).
Plan covers at least one full weekly cycle.
Stopping rule is written and shared.
Guardrails set: SRM check, error logs, crash rate, availability.

Exercises (practice)

These mirror the exercises below. Work them out, then compare with the provided solutions. Your answers are not auto-graded on this page.

Exercise 1: Binary metric duration

Baseline signup rate p = 8%. Absolute MDE δ = 0.8 pp (10% relative). α = 0.05 (two-sided), power = 0.8. Eligible traffic = 12,000 users/day total, 50/50 allocation. Compute sample size per variant and approximate days of traffic. Recommend a calendar plan including ramp and weekly coverage.

Exercise 2: Continuous metric duration

Primary metric: average session duration. Baseline mean = 120 s, σ = 90 s. Absolute MDE δ = 6 s. α = 0.05, power = 0.8. Eligible traffic = 4,000 users/day total, 50/50. Compute sample size per variant and days. What calendar duration would you propose and why?

Common mistakes and self-check

Counting total visitors, not eligible units. Self-check: does your denominator match your unit of analysis?
Using relative MDE without converting to absolute. Self-check: convert to pp or absolute delta before formulas.
Ignoring outcome delays (e.g., D7 retention). Self-check: add delay days after last exposure needed for measurement.
Ending mid-week with strong weekday seasonality. Self-check: ensure at least 1 full weekly cycle.
Peeking at results and stopping early without corrections. Self-check: adhere to your pre-registered stop rule.
Traffic instability during test (launches, promos). Self-check: note major events; pause or extend if conditions change.
SRM (sample ratio mismatch) ignored. Self-check: monitor allocation; large deviations indicate issues—investigate.

Practical projects

Build a duration calculator sheet: inputs (p or mean, σ, α, power, MDE, traffic) → outputs (n per variant, days, calendar plan).
Audit a past experiment: recompute required duration, compare to actual run, note risks and what you’d change.
Create a one-page test plan template including duration rationale, stop rule, ramp plan, and guardrails.

Next steps

Learn variance reduction (e.g., covariate adjustment) to reduce required sample size.
Study sequential testing basics if you must look early; use proper alpha spending.
Practice communicating duration trade-offs with product and engineering.

Mini challenge

You plan a pricing page test. Baseline conversion = 3.5%, absolute MDE = 0.35 pp, α = 0.05, power = 0.8, traffic = 14,000/day total, 50/50. There’s a 7-day sale next week. How do you set duration so you cover a full week and avoid confounding? Write your plan in 3–4 bullet points.

One possible approach

Compute n_per_variant with the binary formula; convert to days using 7,000/day per variant.
Include a 1-day ramp and ensure at least 1 full week of normal traffic (avoid mixing sale vs. non-sale weeks).
If the sale cannot be avoided, plan the test entirely within the sale window or entirely outside it, not spanning both.
Document the stopping rule and guardrails (SRM, outages).

Note on saving progress

The quick test is available to everyone for free. If you are logged in, your progress is saved automatically.

Menu

Experiment Duration Planning

Table of Contents

Why this matters

Concept explained simply

Mental model

Inputs you need

Worked examples

Example 2: Average order value (AOV)

Example 3: D7 retention

Plan your experiment duration: step-by-step

Checklist before you start

Exercises (practice)

Exercise 1: Binary metric duration

Exercise 2: Continuous metric duration

Common mistakes and self-check

Practical projects

Next steps

Mini challenge

Practice Exercises

Binary metric duration planning

Instructions

Expected Output

Continuous metric duration planning

Experiment Duration Planning — Quick Test

Have questions about Experiment Duration Planning?

AI Assistant

Menu

Experiment Duration Planning

Table of Contents

Why this matters

Concept explained simply

Mental model

Inputs you need

Worked examples

Example 1: Signup conversion

Example 2: Average order value (AOV)

Example 3: D7 retention

Plan your experiment duration: step-by-step

Checklist before you start

Exercises (practice)

Exercise 1: Binary metric duration

Exercise 2: Continuous metric duration

Common mistakes and self-check

Practical projects

Next steps

Mini challenge

Practice Exercises

Binary metric duration planning

Instructions

Expected Output

Continuous metric duration planning

Experiment Duration Planning — Quick Test

Have questions about Experiment Duration Planning?

AI Assistant