luvv to helpDiscover the Best Free Online Tools
Topic 9 of 9

Power And Sample Size Basics

Learn Power And Sample Size Basics for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

As a Data Scientist, you will often be asked: How many users do we need? How long should the A/B test run? Is this effect detectable? Power and sample size let you plan experiments that are reliable, ethical, and cost-effective.

  • Product A/B tests: Estimate users required to detect a +0.8 percentage point lift in conversion.
  • ML offline eval: Decide dataset size to detect a 1-point accuracy improvement.
  • Clinical-like or risky changes: Avoid underpowered tests that expose users without learning.
  • Stakeholder planning: Communicate timelines and trade-offs between effect size, risk, and speed.

Concept explained simply

Key terms:

  • Significance level (alpha, α): Probability of a false positive (e.g., 0.05).
  • Power (1 − β): Probability of detecting a true effect of a chosen size (e.g., 80%).
  • Effect size (δ or MDE): The smallest effect you care to detect (e.g., +0.6 percentage points).
  • Variability (σ or variance of the metric): More variability means more noise and larger samples.
  • Sample size (n): How many observations you need per group.
  • Two-sided vs one-sided: Two-sided tests detect change in either direction; one-sided tests focus on a specified direction.

Mental model: Detecting a signal in noise. To see a small, faint signal (small MDE) in a noisy environment (high variance) with high confidence (low α, high power), you need a stronger microscope (more data). If you accept a bigger effect, tolerate more risk (higher α), or reduce noise, you need fewer samples.

Cheat-sheet formulas (quick approximations)

Use these for ballpark planning. For production, confirm with a stats library.

  • Two-sample means (equal n, two-sided): n per group ≈ 2 × (z1−α/2 + zpower)² × σ² / δ²
  • Two-sample proportions (two-sided): n per group ≈ [z1−α/2√(2p(1−p)) + zpower√(p1(1−p1) + p2(1−p2))]² / (p2−p1)². If baseline unknown, conservative worst-case: p ≈ 0.5.
  • MDE for means (given n per group): δ ≈ (z1−α/2 + zpower) × √(2σ² / n)
  • Common z-values: two-sided α=0.05 → z1−α/2=1.96; power 80% → zpower=0.84; power 90% → 1.28.

Assumptions: independent samples, equal allocation, stable variance, large-sample normal approximations. For skewed metrics, heavy tails, or clustering, adjust methods.

Worked examples

Example 1: Proportions (A/B conversion)

Goal: Detect increase from 5.0% to 6.0% conversion (δ=+1.0 percentage point), two-sided α=0.05, power=80%.

  1. p1=0.05, p2=0.06, δ=0.01.
  2. z1−α/2=1.96, zpower=0.84.
  3. Approximate pooled p ≈ (0.05+0.06)/2=0.055.
  4. Compute term T = 1.96√(2×0.055×0.945) + 0.84√(0.05×0.95 + 0.06×0.94) ≈ 1.96×0.322 + 0.84×0.322 ≈ 0.903.
  5. n per group ≈ T² / δ² ≈ 0.903² / 0.0001 ≈ 8,150.

Answer: ~8.2k users per group (total ~16.3k).

Example 2: Means (page load time)

Goal: Detect a decrease from 2.0s to 1.9s (δ=0.1s). Known σ≈1.2s. Two-sided α=0.05, power=90%.

  1. z1−α/2=1.96, zpower=1.28.
  2. n per group ≈ 2×(1.96+1.28)²×σ²/δ² = 2×(3.24)²×1.44/0.01 ≈ 3,023.

Answer: ~3.0k users per group.

Example 3: MDE with fixed sample size

Given n per group=1,000, σ=1.2s, α=0.05, power=80%.

  1. δ ≈ (1.96+0.84)×√(2×1.44/1000) = 2.80×√(0.00288) ≈ 2.80×0.0537 ≈ 0.150s.

Answer: MDE ≈ 0.15s.

Tip: When baseline is unknown

For conversion metrics with no baseline, use p=0.5 to get a conservative upper bound for n (largest required). If the true baseline is lower or higher than 50%, the required sample will usually be smaller.

How to run a power analysis (step-by-step)

  1. Pick a single primary metric (e.g., conversion rate).
  2. Decide practical significance: the MDE you care about (business-relevant, not just statistically tiny).
  3. Set test type: two-sided by default; one-sided only with strong, pre-declared directional rationale.
  4. Choose α (commonly 0.05) and desired power (80–90%).
  5. Estimate variability: baseline rate for proportions; standard deviation for means (use historical data or pilot).
  6. Choose allocation (often 50/50; unequal increases total n).
  7. Compute n per group using a calculator or formula; round up.
  8. Check constraints: traffic, seasonality, run time, ethics. Adjust MDE or power if needed.
  9. Pre-register: freeze α, power, metrics, duration, and stopping rules before launching.

Common mistakes and self-checks

  • Peeking/stopping early inflates Type I error. Self-check: commit to a stopping rule; if peeking, use sequential methods.
  • Underpowered tests: MDE not tied to business value. Self-check: Is the MDE meaningful and feasible within timeline?
  • Ignoring variance shifts: novelty or heterogeneity can change σ. Self-check: monitor variance; plan guardrail metrics.
  • Unequal allocation forgotten: 90/10 splits need more total n. Self-check: confirm allocation in your calculator.
  • Non-independence/clustered data: households, classrooms, or stores. Self-check: account for design effect (1+(m−1)ICC).
  • Multiple testing on many metrics or variants. Self-check: pre-specify primary metric; adjust for multiplicity if needed.
  • Relying on post-hoc power. Self-check: use confidence intervals/MDE from CI width instead of retrospective power.
Quick self-audit before launch
  • Primary metric and MDE are documented.
  • α, power, test side, and allocation are fixed.
  • Variance estimate source is stated (historical/pilot).
  • Duration and traffic assumptions are realistic.
  • Stopping and analysis plans are pre-committed.

Exercises

These mirror the graded tasks below. The quick test is available to everyone; log in to save your progress.

  1. Exercise ex1: Compute sample size for a conversion lift (see details in the Exercises panel below).
  2. Exercise ex2: Compute the MDE for a means metric with fixed n.
  • Checklist for your work:
    • State α, power, and whether test is one- or two-sided.
    • Show the formula you used and each numeric substitution.
    • Round up n per group; report total sample size and expected test duration (if you know daily traffic).

Who this is for

  • Data Scientists and Analysts planning A/B tests or offline evaluations.
  • PMs and Experimentation Leads who need credible estimates for timelines and risk.

Prerequisites

  • Comfort with basic probability and normal distributions.
  • Understanding of hypothesis tests and p-values.
  • Ability to compute means/proportions and standard deviations from data.

Learning path

  1. Review hypothesis testing (Type I/II errors, two- vs one-sided).
  2. Learn power and MDE concepts and their trade-offs.
  3. Practice with means and proportions sample size calculations.
  4. Account for practical issues: unequal allocation, clustering, variance shifts.
  5. Automate your calculations (spreadsheet or small script).

Practical projects

  • Build a power calculator spreadsheet: inputs (α, power, baseline, MDE, σ), outputs (n per group, duration).
  • Write a small function in your preferred language that returns n and MDE for means and proportions.
  • Plan an A/B test end-to-end for a hypothetical feature, producing a one-page pre-analysis plan.

Mini challenge

You have 10,000 daily users and a 4% conversion baseline. Leadership wants to detect +0.5 percentage points in two weeks at 80% power, α=0.05, two-sided. Is it feasible with a 50/50 split? Estimate n per group, total available traffic, and whether you can hit the target timeline. State any assumptions or compromises (e.g., adjust MDE or power).

Ready to check your understanding? Try the Quick Test below. Note: the quick test is available to everyone; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Baseline conversion p1=3.0%. You care about detecting an increase to p2=3.6% (δ=+0.6 percentage points) with a two-sided test, α=0.05, power=80%, 50/50 allocation.

  • Use the approximation: n per group ≈ [z1−α/2√(2p(1−p)) + zpower√(p1(1−p1) + p2(1−p2))]^2 / (p2−p1)^2, with p ≈ (p1+p2)/2.
  • Report n per group and total sample size.
Expected Output
About 14,000 users per group (≈ 28,000 total). Small deviations are fine if your rounding differs.

Power And Sample Size Basics — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Power And Sample Size Basics?

AI Assistant

Ask questions about this tool