How to learn Running Time And Sample Size Basics for A/B Testing in Marketing Analyst for free

Why this matters

As a Marketing Analyst, you will plan and judge A/B tests for landing pages, emails, pricing, creatives, and funnels. Choosing the right sample size and run time prevents two costly outcomes: stopping too early (false winners) and running too long (wasting traffic and time). Mastering these basics helps you ship reliable wins faster.

Estimate how many users you need before starting.
Plan how long to run a test based on traffic.
Avoid peeking mistakes and mid-week bias.

Concept explained simply

You want to detect whether Variant B truly performs differently from Variant A. Two practical decisions control this:

How small a change you care to detect (Minimum Detectable Effect, MDE).
How confident you want to be (significance level, alpha) and how often you want to catch true effects (power).

Inputs you typically choose:

Baseline metric: e.g., conversion rate (proportions) or average order value (means) and its variability.
MDE: smallest effect size that matters (absolute or relative).
Alpha: commonly 0.05 (two-sided) for marketing tests.
Power: commonly 0.8 (80%) or 0.9 (90%).
Allocation: often 50/50 for two variants to finish faster.
Daily traffic or events to estimate calendar days.

Mental model: a resolution dial

Think of MDE as a resolution dial. The smaller the change you want to see, the more data you need. Turn the dial down (tiny effects): tests take longer. Turn it up (bigger effects): tests finish faster.

Smaller MDE → larger sample → longer run.
Higher traffic or more events/day → shorter calendar time.
More variability (noisier metric) → larger sample.
Two-sided tests are safer defaults and usually require a bit more data than one-sided.

Quick formulas you can use

These are standard approximations to plan before you test. They are close enough for scoping and prioritization. For precise planning, use a stats calculator with the same inputs.

Two-proportion test (e.g., conversion rate)

Per variant sample size ≈ 2 × p̄ × (1 − p̄) × (Zα/2 + Zβ)^2 / δ^2, where:

p1 = baseline rate, p2 = target rate, δ = |p2 − p1| (absolute difference)
p̄ = (p1 + p2) / 2
Zα/2 = 1.96 for α = 0.05 (two-sided)
Zβ = 0.84 for 80% power; 1.28 for 90% power

Two-mean test (e.g., average order value)

Per variant sample size ≈ 2 × (Zα/2 + Zβ)^2 × σ^2 / δ^2, where:

σ is the standard deviation of the metric
δ is the absolute difference you want to detect (MDE)

From sample size to run time

Compute per-variant sample size n from a formula or calculator.
Estimate daily exposure per variant: daily_total × allocation (e.g., 50%).
Days ≈ n / daily_exposure_per_variant. Add a buffer to cover full weekly cycles.

Worked examples

Example 1: Landing page conversion uplift

Goal: Detect a 10% relative uplift, baseline 5% → target 5.5%.

Inputs: p1 = 0.05, p2 = 0.055, δ = 0.005, p̄ = 0.0525
α = 0.05 (two-sided), power = 0.8 → Zα/2 = 1.96, Zβ = 0.84

Show the math

n ≈ 2 × 0.0525 × 0.9475 × (1.96 + 0.84)^2 / 0.005^2

2 × 0.04974 × 7.84 / 0.000025 ≈ 31,200 per variant (approx.)

Runtime with 10,000 visitors/day, 50/50 split → 5,000 per variant/day → ~6–7 days. Good practice: run at least 1–2 full weeks to cover weekdays/weekends.

Example 2: Email open rate

Goal: Detect +2 percentage points, baseline 20% → 22%.

Inputs: p1 = 0.20, p2 = 0.22, δ = 0.02, p̄ = 0.21
α = 0.05 (two-sided), power = 0.9 → Zα/2 = 1.96, Zβ = 1.28

Show the math

n ≈ 2 × 0.21 × 0.79 × (1.96 + 1.28)^2 / 0.02^2

2 × 0.1659 × 10.4976 / 0.0004 ≈ 8,700 per variant (approx.)

If your list has 100,000 recipients split 50/50, one send can be enough.

Example 3: Average order value (means)

Goal: Detect +$3 AOV, baseline SD ≈ $20.

Inputs: σ = 20, δ = 3
α = 0.05 (two-sided), power = 0.8 → Zα/2 = 1.96, Zβ = 0.84

Show the math

n ≈ 2 × (1.96 + 0.84)^2 × 20^2 / 3^2

2 × 7.84 × 400 / 9 ≈ 697 per variant (approx.)

With ~300 orders/day total (150 per variant), you need ~5 days of orders; run for at least one full week.

How to plan runtime safely

Always pre-set MDE, alpha, power, and guardrails.
Compute n per variant; estimate days from your traffic.
Run for whole business cycles (commonly ≥ 1–2 full weeks).
Avoid peeking to stop at first p < 0.05; wait until sample-size and minimum duration are both met.
Prefer 50/50 allocation for speed unless there are strong risk reasons.

Guardrails to track during the test

Traffic quality: bots filtered, tracking firing consistently.
Variant parity: sample counts per variant are balanced within a few percent.
No major marketing or site changes mid-test that affect segments unevenly.

Common mistakes and how to self-check

Stopping early on a spike: Self-check by confirming both sample-size target and minimum calendar duration are met.
Choosing an unrealistically tiny MDE: Sanity-check against expected business impact and traffic; if runtime is months, increase MDE or choose a higher-impact change.
Using visitor counts when the unit is sessions or orders: Match sample to the unit your metric is computed on.
Ignoring variance for means tests: You need an SD estimate; use historical data.
Not covering full weeks: Seasonal bias can flip results. Include at least one full weekday-weekend cycle.

Exercises

Try these. Then compare with the solutions.

Exercise 1 (mirrors ex1)

You plan a conversion test with baseline 3% and want to detect +0.6 percentage points (absolute) at α = 0.05 (two-sided), power = 0.8. Estimate the per-variant sample size and the minimum days if you get 20,000 visitors/day at 50/50 split.

Exercise 2 (mirrors ex2)

You plan an AOV test. SD ≈ $45, MDE = $4, α = 0.05 (two-sided), power = 0.8. Estimate per-variant sample size. If you have 500 orders/day total (evenly split), how many days do you need (before adding weekly-cycle buffers)?

Show exercise solutions

Exercise 1 solution (summary)

Two-proportion formula with p1 = 0.03, p2 = 0.036 → δ = 0.006, p̄ = 0.033. n ≈ 13,900 per variant. With 10,000 per variant/day (50% of 20k), ≈ 1.4 days; still run at least a full week.

Exercise 2 solution (summary)

Two-mean formula: n ≈ 2 × (1.96 + 0.84)^2 × 45^2 / 4^2 ≈ 1,980 per variant. With 250 orders/variant/day, ≈ 8 days; round to full weeks.

Checklist before you launch

MDE chosen and meaningful for the business.
Alpha and power set (e.g., 0.05 and 0.8).
Baseline and, if needed, SD estimated from recent data.
Per-variant sample size computed.
Planned start and end dates cover at least one full weekly cycle.
Allocation set (often 50/50) and tracking QA done.

Mini challenge

You have 12,000 daily sessions, baseline add-to-cart rate 8%. You care about a 1 percentage point absolute lift. Alpha 0.05, power 0.8. Roughly estimate per-variant n and how many days you would run at 50/50. Then list two guardrails you would monitor. Compare to the worked examples to sanity-check your answer.

Who this is for

Marketing Analysts planning or reviewing A/B tests.
Growth, CRM, and Product Marketers who need reliable experiment timelines.
Designers/PMs collaborating on test roadmaps.

Prerequisites

Basic statistics: proportions, means, standard deviation.
Comfort with metrics you test (e.g., conversion rate, AOV, CTR).
Access to recent baseline data for your metric.

Learning path

Start here: MDE, alpha, power, baseline, and time-to-sample.
Next: Test design choices (allocation, tail direction, guardrails).
Then: Interpreting results (confidence intervals, lift, uncertainty).
Advanced: Sequential testing and multiple comparisons control.

Practical projects

Build a sample-size sheet: Create a spreadsheet with inputs (baseline, MDE, alpha, power, SD) and outputs (per-variant n, runtime). Include both proportions and means.
Backtest a past experiment: Using historical data, recompute the needed sample size and check if the original run met it. Summarize risks if it did not.
Traffic budgeting: For your next quarter’s test ideas, estimate n and calendar days for each. Prioritize by impact and feasibility.

Next steps

Apply these formulas to your next planned test and compare with a calculator.
Add minimum duration rules to your team playbook (e.g., ≥ 1–2 full weeks).
Track a small set of guardrails (traffic balance, bot filtering, tracking health).

Take the quick test

Available to everyone for free. Progress is saved only for logged-in users.

Menu

Running Time And Sample Size Basics

Table of Contents

Why this matters

Concept explained simply

Mental model: a resolution dial

Quick formulas you can use

Worked examples

Example 1: Landing page conversion uplift

Example 2: Email open rate

Example 3: Average order value (means)

How to plan runtime safely

Common mistakes and how to self-check

Exercises

Exercise 1 (mirrors ex1)

Exercise 2 (mirrors ex2)

Checklist before you launch

Mini challenge

Who this is for

Prerequisites

Learning path

Practical projects

Next steps

Take the quick test

Practice Exercises

Plan sample size and run time for a conversion test

Instructions

Expected Output

Plan sample size for a means (AOV) test

Running Time And Sample Size Basics — Quick Test

Have questions about Running Time And Sample Size Basics?

AI Assistant