How to learn Sample Size and Power Basics for A/B Testing Basics in Data Analyst for free

Why this matters

As a Data Analyst, you will often plan experiments and answer: How much traffic do we need? How long should we run the test? Can we detect this lift? Getting sample size and power right avoids underpowered tests (wasted time) and overlong tests (wasted opportunity).

Decide if a feature can be tested this sprint based on required users/sessions.
Set Minimum Detectable Effect (MDE) so stakeholders know what lift is realistically detectable.
Estimate test runtime using daily traffic and target power.
Explain trade-offs: faster decision vs result sensitivity.

Concept explained simply

Think of an A/B test like taking a photo in low light:

Sample size = exposure time. More samples = clearer picture.
Power (1 − β) = your camera’s ability to capture a real object. Commonly 80% or 90%.
Alpha (α) = false alarm rate you accept (usually 5% two-sided).
MDE = the smallest lift you care to detect (effect size). Smaller MDE needs more data.
Variance/SD = shakiness of your hand. More variance = more samples needed.
Baseline = current performance (e.g., 5% conversion).

Mental model: You set how sensitive you want the test to be (MDE) and how confident you want to be (alpha, power). The math tells you the needed sample size.

Practical formulas you can use safely

These approximations work well for planning with equal group sizes and two-sided tests:

Two-sample proportions (e.g., conversion rate)
n per group ≈ [ 2 × p × (1 − p) × (Zα/2 + Zβ)² ] / MDE²
Where p = baseline rate (e.g., 0.05), MDE = absolute difference (e.g., +1 percentage point = 0.01).
Two-sample means (e.g., average order value)
n per group ≈ [ 2 × σ² × (Zα/2 + Zβ)² ] / MDE²
Where σ = standard deviation of the metric, MDE = absolute difference in the mean.

What Z values should I use?

Two-sided α = 0.05 → Zα/2 ≈ 1.96
Power = 80% → Zβ ≈ 0.84
Power = 90% → Zβ ≈ 1.28

One-sided vs two-sided tests

Two-sided is safer for most product tests. One-sided can reduce sample size but only use it when you truly care about one direction and would not act on a significant effect in the other direction.

Worked examples

Conversion rate (proportions):
Goal: Detect +1 percentage point on a 5% baseline (0.05 → 0.06), α=0.05 two-sided, power=80%.
p = 0.05, MDE = 0.01, Zα/2 = 1.96, Zβ = 0.84. Sum = 2.8; (Sum)² = 7.84.
2 × p(1−p) = 2 × 0.05 × 0.95 = 0.095.
n ≈ (0.095 × 7.84) / 0.01² = 0.7448 / 0.0001 ≈ 7,448 per group (≈ 14,896 total).
Click-through rate (proportions, relative MDE):
Baseline 2% (0.02). Target relative lift 15% → absolute MDE = 0.02 × 0.15 = 0.003.
2 × p(1−p) = 2 × 0.02 × 0.98 = 0.0392. With α=0.05, power=80%: factor = 7.84.
n ≈ (0.0392 × 7.84) / 0.003² ≈ 0.3073 / 0.000009 ≈ 34,149 per group.
Average order value (means):
Baseline mean $50, SD $60, MDE = +$5, α=0.05, power=90%.
Zα/2 = 1.96, Zβ = 1.28 → Sum = 3.24; (Sum)² = 10.4976.
2 × σ² = 2 × 3,600 = 7,200.
n ≈ (7,200 × 10.4976) / 5² ≈ 75,583 / 25 ≈ 3,024 per group.

Step-by-step planning checklist

Clarify the decision: what action will you take if the test wins/loses?
Choose primary metric and its type (proportion or mean).
Estimate baseline and variance (or SD); use recent, representative data.
Set MDE that is both meaningful and feasible.
Pick α (usually 0.05 two-sided) and desired power (80% or 90%).
Compute n per group and translate to days using your daily traffic.
Sanity-check duration against seasonality and release cadence.
Plan guardrails (e.g., sample ratio mismatch, data quality checks).

Exercises — practice

Do these without a calculator first to estimate, then compute precisely.

Exercise 1: Conversion rate sample size

You have a 3% signup rate. You want to detect an absolute lift of +0.6 percentage points (MDE = 0.006) with α=0.05 (two-sided) and power=80%. Use the simplified proportions formula.

Round up to the nearest whole number.
Report per-group and total sample sizes.

Hints

Zα/2 ≈ 1.96, Zβ ≈ 0.84 → (sum)^2 ≈ 7.84.
2 × p(1−p) with p=0.03.
MDE = 0.006.

Exercise 2: Mean metric sample size

Average session length is 8 minutes with SD = 5 minutes. You want to detect a decrease of 0.8 minutes (MDE = 0.8) with α=0.05 (two-sided) and power=80%. Use the means formula.

Round up to the nearest whole number.
Report per-group and total sample sizes.

Hints

Zα/2 ≈ 1.96, Zβ ≈ 0.84 → (sum)^2 ≈ 7.84.
σ = 5 → σ^2 = 25.
MDE = 0.8 → MDE^2 = 0.64.

Checklist: Did you use absolute MDE values? Did you round up? Did you state assumptions (two-sided, equal split)?

Common mistakes and self-checks

Using relative MDE in a formula that expects absolute MDE. Convert 15% relative on 2% baseline to 0.003 absolute.
Underestimating variance/SD. Use recent data; consider seasonality and segmentation.
Peeking and stopping early. If you look often without proper corrections, your α inflates; pre-plan duration.
Mismatched tails. Planning with one-sided α but analyzing two-sided changes the error rates.
Ignoring allocation ratio. If not 50/50, required total n increases; use a calculator that handles unequal splits.
Overly tiny MDEs. If MDE is too small, required n or duration may be impractical; reset expectations.
Poor baseline estimate. Baseline should reflect the audience and time window of the test.

Quick self-check

If you halve the MDE, sample size should roughly quadruple. Does your number reflect that?
If SD doubles (for means), n should roughly quadruple. Check your result.
Lowering α or increasing power should increase n. Did your plan reflect the trade-off?

Practical projects

Spreadsheet calculator: Build a simple sheet with inputs (baseline, SD, MDE, α, power) and outputs (n per group, test duration). Include a toggle for proportions vs means.
Traffic-to-time planner: Add daily unique counts per page and compute estimated days to reach required n with a 10% buffer.
Guardrail checklist: Create a one-pager you attach to every test plan: sample ratio match, data freshness, outlier handling, and seasonality notes.
Pilot variance check: Run a short A/A or pre-test measurement to validate SD/variance assumptions before the main test.

Learning path

Before this: Understanding metrics, variance, and basic hypothesis testing.
This lesson: Choose α, power, and MDE; compute required sample size and estimated duration.
Next: Test design details (guardrails, bucketing, segmentation) and interpreting results (p-values, confidence intervals, lift).

Next steps

Pick one upcoming A/B test and fill out the checklist above.
Compute two versions: an 80% power plan and a 90% power plan; discuss trade-offs with stakeholders.
Document assumptions (baseline window, tails, allocation) so analysis matches the plan.

Mini challenge

You have 12,000 daily unique visitors to a page. Baseline conversion is 4%. You want to detect a +0.8 percentage point lift (to 4.8%), α=0.05 two-sided, power=80%, equal split.

Estimate n per group and total.
How many days will the test take with a 10% buffer?

Suggested approach

Use n per group ≈ [2 × p(1−p) × (Zα/2 + Zβ)^2] / MDE^2. Then divide required total by daily eligible traffic per variant.

Quick test

Take the quick test below to lock in your understanding. Everyone can take it for free; progress is saved only if you are logged in.

Menu

Sample Size and Power Basics

Table of Contents

Why this matters

Concept explained simply

Practical formulas you can use safely

Worked examples

Step-by-step planning checklist

Exercises — practice

Exercise 1: Conversion rate sample size

Exercise 2: Mean metric sample size

Common mistakes and self-checks

Practical projects

Learning path

Next steps

Mini challenge

Quick test

Practice Exercises

Compute sample size for a conversion rate test

Instructions

Expected Output

Compute sample size for a mean metric test

Sample Size and Power Basics — Quick Test

Have questions about Sample Size and Power Basics?

AI Assistant