luvv to helpDiscover the Best Free Online Tools
Topic 7 of 8

Running Time And Sample Size Basics

Learn Running Time And Sample Size Basics for free with explanations, exercises, and a quick test (for Marketing Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Why this matters

As a Marketing Analyst, you will plan and judge A/B tests for landing pages, emails, pricing, creatives, and funnels. Choosing the right sample size and run time prevents two costly outcomes: stopping too early (false winners) and running too long (wasting traffic and time). Mastering these basics helps you ship reliable wins faster.

  • Estimate how many users you need before starting.
  • Plan how long to run a test based on traffic.
  • Avoid peeking mistakes and mid-week bias.

Concept explained simply

You want to detect whether Variant B truly performs differently from Variant A. Two practical decisions control this:

  • How small a change you care to detect (Minimum Detectable Effect, MDE).
  • How confident you want to be (significance level, alpha) and how often you want to catch true effects (power).

Inputs you typically choose:

  • Baseline metric: e.g., conversion rate (proportions) or average order value (means) and its variability.
  • MDE: smallest effect size that matters (absolute or relative).
  • Alpha: commonly 0.05 (two-sided) for marketing tests.
  • Power: commonly 0.8 (80%) or 0.9 (90%).
  • Allocation: often 50/50 for two variants to finish faster.
  • Daily traffic or events to estimate calendar days.

Mental model: a resolution dial

Think of MDE as a resolution dial. The smaller the change you want to see, the more data you need. Turn the dial down (tiny effects): tests take longer. Turn it up (bigger effects): tests finish faster.

  • Smaller MDE → larger sample → longer run.
  • Higher traffic or more events/day → shorter calendar time.
  • More variability (noisier metric) → larger sample.
  • Two-sided tests are safer defaults and usually require a bit more data than one-sided.

Quick formulas you can use

These are standard approximations to plan before you test. They are close enough for scoping and prioritization. For precise planning, use a stats calculator with the same inputs.

Two-proportion test (e.g., conversion rate)

Per variant sample size ≈ 2 × p̄ × (1 − p̄) × (Zα/2 + Zβ)^2 / δ^2, where:

  • p1 = baseline rate, p2 = target rate, δ = |p2 − p1| (absolute difference)
  • p̄ = (p1 + p2) / 2
  • Zα/2 = 1.96 for α = 0.05 (two-sided)
  • Zβ = 0.84 for 80% power; 1.28 for 90% power
Two-mean test (e.g., average order value)

Per variant sample size ≈ 2 × (Zα/2 + Zβ)^2 × σ^2 / δ^2, where:

  • σ is the standard deviation of the metric
  • δ is the absolute difference you want to detect (MDE)
From sample size to run time
  1. Compute per-variant sample size n from a formula or calculator.
  2. Estimate daily exposure per variant: daily_total × allocation (e.g., 50%).
  3. Days ≈ n / daily_exposure_per_variant. Add a buffer to cover full weekly cycles.

Worked examples

Example 1: Landing page conversion uplift

Goal: Detect a 10% relative uplift, baseline 5% → target 5.5%.

  • Inputs: p1 = 0.05, p2 = 0.055, δ = 0.005, p̄ = 0.0525
  • α = 0.05 (two-sided), power = 0.8 → Zα/2 = 1.96, Zβ = 0.84
Show the math

n ≈ 2 × 0.0525 × 0.9475 × (1.96 + 0.84)^2 / 0.005^2

2 × 0.04974 × 7.84 / 0.000025 ≈ 31,200 per variant (approx.)

Runtime with 10,000 visitors/day, 50/50 split → 5,000 per variant/day → ~6–7 days. Good practice: run at least 1–2 full weeks to cover weekdays/weekends.

Example 2: Email open rate

Goal: Detect +2 percentage points, baseline 20% → 22%.

  • Inputs: p1 = 0.20, p2 = 0.22, δ = 0.02, p̄ = 0.21
  • α = 0.05 (two-sided), power = 0.9 → Zα/2 = 1.96, Zβ = 1.28
Show the math

n ≈ 2 × 0.21 × 0.79 × (1.96 + 1.28)^2 / 0.02^2

2 × 0.1659 × 10.4976 / 0.0004 ≈ 8,700 per variant (approx.)

If your list has 100,000 recipients split 50/50, one send can be enough.

Example 3: Average order value (means)

Goal: Detect +$3 AOV, baseline SD ≈ $20.

  • Inputs: σ = 20, δ = 3
  • α = 0.05 (two-sided), power = 0.8 → Zα/2 = 1.96, Zβ = 0.84
Show the math

n ≈ 2 × (1.96 + 0.84)^2 × 20^2 / 3^2

2 × 7.84 × 400 / 9 ≈ 697 per variant (approx.)

With ~300 orders/day total (150 per variant), you need ~5 days of orders; run for at least one full week.

How to plan runtime safely

  1. Always pre-set MDE, alpha, power, and guardrails.
  2. Compute n per variant; estimate days from your traffic.
  3. Run for whole business cycles (commonly ≥ 1–2 full weeks).
  4. Avoid peeking to stop at first p < 0.05; wait until sample-size and minimum duration are both met.
  5. Prefer 50/50 allocation for speed unless there are strong risk reasons.
Guardrails to track during the test
  • Traffic quality: bots filtered, tracking firing consistently.
  • Variant parity: sample counts per variant are balanced within a few percent.
  • No major marketing or site changes mid-test that affect segments unevenly.

Common mistakes and how to self-check

  • Stopping early on a spike: Self-check by confirming both sample-size target and minimum calendar duration are met.
  • Choosing an unrealistically tiny MDE: Sanity-check against expected business impact and traffic; if runtime is months, increase MDE or choose a higher-impact change.
  • Using visitor counts when the unit is sessions or orders: Match sample to the unit your metric is computed on.
  • Ignoring variance for means tests: You need an SD estimate; use historical data.
  • Not covering full weeks: Seasonal bias can flip results. Include at least one full weekday-weekend cycle.

Exercises

Try these. Then compare with the solutions.

Exercise 1 (mirrors ex1)

You plan a conversion test with baseline 3% and want to detect +0.6 percentage points (absolute) at α = 0.05 (two-sided), power = 0.8. Estimate the per-variant sample size and the minimum days if you get 20,000 visitors/day at 50/50 split.

Exercise 2 (mirrors ex2)

You plan an AOV test. SD ≈ $45, MDE = $4, α = 0.05 (two-sided), power = 0.8. Estimate per-variant sample size. If you have 500 orders/day total (evenly split), how many days do you need (before adding weekly-cycle buffers)?

Show exercise solutions

Exercise 1 solution (summary)

Two-proportion formula with p1 = 0.03, p2 = 0.036 → δ = 0.006, p̄ = 0.033. n ≈ 13,900 per variant. With 10,000 per variant/day (50% of 20k), ≈ 1.4 days; still run at least a full week.

Exercise 2 solution (summary)

Two-mean formula: n ≈ 2 × (1.96 + 0.84)^2 × 45^2 / 4^2 ≈ 1,980 per variant. With 250 orders/variant/day, ≈ 8 days; round to full weeks.

Checklist before you launch

  • MDE chosen and meaningful for the business.
  • Alpha and power set (e.g., 0.05 and 0.8).
  • Baseline and, if needed, SD estimated from recent data.
  • Per-variant sample size computed.
  • Planned start and end dates cover at least one full weekly cycle.
  • Allocation set (often 50/50) and tracking QA done.

Mini challenge

You have 12,000 daily sessions, baseline add-to-cart rate 8%. You care about a 1 percentage point absolute lift. Alpha 0.05, power 0.8. Roughly estimate per-variant n and how many days you would run at 50/50. Then list two guardrails you would monitor. Compare to the worked examples to sanity-check your answer.

Who this is for

  • Marketing Analysts planning or reviewing A/B tests.
  • Growth, CRM, and Product Marketers who need reliable experiment timelines.
  • Designers/PMs collaborating on test roadmaps.

Prerequisites

  • Basic statistics: proportions, means, standard deviation.
  • Comfort with metrics you test (e.g., conversion rate, AOV, CTR).
  • Access to recent baseline data for your metric.

Learning path

  1. Start here: MDE, alpha, power, baseline, and time-to-sample.
  2. Next: Test design choices (allocation, tail direction, guardrails).
  3. Then: Interpreting results (confidence intervals, lift, uncertainty).
  4. Advanced: Sequential testing and multiple comparisons control.

Practical projects

  1. Build a sample-size sheet: Create a spreadsheet with inputs (baseline, MDE, alpha, power, SD) and outputs (per-variant n, runtime). Include both proportions and means.
  2. Backtest a past experiment: Using historical data, recompute the needed sample size and check if the original run met it. Summarize risks if it did not.
  3. Traffic budgeting: For your next quarter’s test ideas, estimate n and calendar days for each. Prioritize by impact and feasibility.

Next steps

  • Apply these formulas to your next planned test and compare with a calculator.
  • Add minimum duration rules to your team playbook (e.g., ≥ 1–2 full weeks).
  • Track a small set of guardrails (traffic balance, bot filtering, tracking health).

Take the quick test

Available to everyone for free. Progress is saved only for logged-in users.

Practice Exercises

2 exercises to complete

Instructions

Baseline conversion is 3%. You want to detect a +0.6 percentage point absolute lift (to 3.6%). Use α = 0.05 (two-sided) and power = 0.8. 1) Estimate per-variant sample size. 2) With 20,000 visitors/day and 50/50 allocation, estimate minimum days (ignore weekly-cycle buffers).

Expected Output
Per-variant sample size ≈ 13,900. Minimum days ≈ 1.4 (still run at least a full week in practice).

Running Time And Sample Size Basics — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Running Time And Sample Size Basics?

AI Assistant

Ask questions about this tool