luvv to helpDiscover the Best Free Online Tools
Topic 4 of 9

Experiment Duration And Seasonality

Learn Experiment Duration And Seasonality for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

Choosing the right experiment duration is one of the highest-leverage decisions you make as a Data Scientist. Too short, and you risk false wins or misses. Too long, and you waste time and expose users to a poor variant. Seasonality (day-of-week, monthly cycles, holidays) can skew results if you stop at the wrong time.

  • Estimate when you will reach the needed sample size for your Minimum Detectable Effect (MDE).
  • Plan runs to cover seasonality cycles (e.g., full weeks).
  • Control novelty effects and avoid biased stopping decisions.

Who this is for

  • Data Scientists and Analysts running A/B/n tests or switchback tests.
  • Product managers who need realistic experiment timelines.
  • Engineers enabling experimentation who want guardrail-aware plans.

Prerequisites

  • Basic hypothesis testing (alpha, power) and confidence intervals.
  • Understanding of conversion rates or continuous metrics (mean, variance).
  • Ability to estimate traffic volume per group per day.

Concept explained simply

Think of duration as filling a bucket to a marked line. The faucet flow (traffic) pulses by day and season. You must fill to the line (required sample size) and stop at a good moment: when one or more full cycles have passed, so a single pulse doesn’t mislead you.

Key inputs

  • Baseline level/variance: for proportions p (conversion rate) or standard deviation σ for continuous metrics.
  • MDE (minimum detectable effect): absolute or relative (convert relative to absolute units for formulas).
  • Significance (alpha) and power (1 - beta).
  • Daily events per group (after eligibility filters).
  • Seasonality pattern: day-of-week, monthly/quarterly, holidays, marketing campaigns.
  • Novelty or ramp time: initial stabilization period to exclude or downweight.

Simple formulas (back-of-envelope)

For a binary metric (conversion rate) with baseline p and absolute MDE = d (in proportion points, e.g., 0.005 for 0.5 pp), an approximate per-group sample size is:

n_per_group ≈ 2 × p × (1 − p) × (Z1−α/2 + Zpower)² / d²

For a continuous metric with standard deviation σ and absolute MDE = d:

n_per_group ≈ 2 × σ² × (Z1−α/2 + Zpower)² / d²

Duration (days) ≈ ceil(n_per_group / daily_events_per_group)

Good practice: commit to a fixed horizon that includes at least one full seasonality cycle (e.g., a full week), and stop only at cycle boundaries (end of week).

What Z-values should I use?
  • α = 0.05 (two-sided) → Z1−α/2 ≈ 1.96
  • Power = 0.80 → Zpower ≈ 0.84
  • Power = 0.90 → Zpower ≈ 1.28

These are standard approximations for planning.

Seasonality playbook

  • Day-of-week: Always cover at least one full week. If n is large or traffic variable, prefer multiples of full weeks.
  • Known events (launches, holidays): Avoid or explicitly include as full cycles. Use blackouts (pause enrollment) if needed.
  • Marketing bursts: Start after ramp settles, or run for full campaign windows.
  • Switchbacks (if applicable to supply/demand systems): Use full cycles of assignment to cover temporal patterns.

Worked examples

Example 1: Website conversion A/B

Goal: Detect a 10% relative lift from 5% baseline. So absolute MDE d = 0.10 × 0.05 = 0.005 (0.5 pp). α = 0.05, power = 0.80 → Z sum ≈ 1.96 + 0.84 = 2.80.

n_per_group ≈ 2 × 0.05 × 0.95 × (2.80)² / 0.005² = 2 × 0.0475 × 7.84 / 0.000025 ≈ 0.095 × 7.84 / 0.000025 ≈ 0.7448 / 0.000025 ≈ 29,792 users per group.

Traffic: 20,000 eligible users per day total, 50/50 split → 10,000 per group per day. Duration for sample size ≈ ceil(29,792 / 10,000) = 3 days.

But: day-of-week seasonality exists. Plan: run at least one full week and include a 2-day novelty burn-in excluded from analysis. Practical plan: 9 calendar days total; analyze the last 7 days ending at a weekday boundary.

Example 2: Continuous metric (revenue per user)

Baseline σ = $3, MDE d = $0.20, α = 0.05, power = 0.90 → Z sum = 1.96 + 1.28 = 3.24.

n_per_group ≈ 2 × 3² × (3.24)² / 0.20² = 18 × 10.4976 / 0.04 ≈ 188.96 / 0.04 ≈ 4,724 users per group.

Traffic: 1,000 users per group per day → 4.7 → 5 days. Seasonality rule: round up to a full week. Practical plan: 7 days; optionally exclude day 1 as novelty.

Example 3: Uneven weekly traffic

Per-group daily events: Mon–Thu 6k, Fri 8k, Sat 12k, Sun 12k. Weekly total = 56k per group. Required n_per_group = 50k. Even if you reach 50k in 5–6 days, stopping mid-week over-represents weekend behavior. Plan: stop only at a week boundary. One full week covers the cycle and exceeds the sample need.

Example 4: Holiday interference

You plan a pricing test across late November. Black Friday/Cyber Monday will spike traffic and change buyer intent. Options:

  • Exclude enrollment during the holiday (blackout) and extend the run to complete a full post-holiday week.
  • Run two full-week phases: pre-holiday and post-holiday; analyze each phase separately and then meta-analyze if appropriate.

How to plan duration (step-by-step)

  1. Define metric, baseline (p or σ), MDE (absolute), alpha, and power.
  2. Compute back-of-envelope n_per_group with the formulas above.
  3. Estimate daily events per group and get a naive day count = ceil(n_per_group / daily_events_per_group).
  4. Map seasonality cycles. For day-of-week effects, round up to a whole number of weeks.
  5. Add novelty stabilization (e.g., 1–2 days) to the calendar; either exclude it from analysis or pre-register a weighting scheme.
  6. Pre-register stop rules: stop only at cycle boundaries; avoid peeking-based early stopping.
  7. Check guardrails (e.g., error rates, latency). If guardrails degrade, define safety stop conditions.
Mini reference: absolute vs relative MDE
  • Relative MDE → absolute: d = relative × baseline. Example: 10% of 5% = 0.5 pp = 0.005.
  • For continuous metrics, specify d in the same units as the metric (e.g., $0.20).

Common mistakes and self-check

  • Stopping mid-week: can bias results if weekends/weekday behavior differ. Self-check: Does the run end at a week boundary?
  • Ignoring novelty: early user curiosity inflates metrics. Self-check: Compare day 1–2 vs later days; large drifts suggest novelty.
  • MDE confusion: using relative MDE in a formula that needs absolute. Self-check: Confirm units of d before calculating.
  • Peeking repeatedly without correction: raises false-positive risk. Self-check: Did you commit to a fixed horizon or use a sequential method?
  • Traffic overestimation: assuming all users are eligible. Self-check: Use observed eligible traffic from recent weeks.
  • Holiday blindness: running across major events unintentionally. Self-check: Review the calendar for launches, holidays, and campaigns.

Pre-launch checklist

  • Baseline and MDE are documented; d is absolute.
  • n_per_group computed for primary metric; guardrails listed.
  • Daily eligible traffic per group is realistic.
  • Duration covers ≥1 full week (or ≥1 full relevant cycle).
  • Novelty period decided (exclude or include with note).
  • Stop rule: end at cycle boundary; no unplanned peeking.
  • Holiday/campaign calendar reviewed; blackout or split phases planned.

Exercises (practice here, then check solutions)

  1. Exercise 1 (matches ex1): Baseline p = 0.04, target relative MDE = 12.5% (two-sided), α = 0.05, power = 0.80, total eligible traffic = 16,000/day (50/50 split). Compute n_per_group, naive duration, then propose a seasonality-safe plan with novelty handling.
  2. Exercise 2 (matches ex2): You need n_per_group = 80,000 events for your add-to-cart rate. Your per-group daily events are Mon–Thu 7,000, Fri 9,000, Sat 13,000, Sun 11,000. Plan a stop date if you start on Tuesday, considering novelty = 1 day and week-boundary stopping.

Practical projects

  • Create a duration planner: a small spreadsheet that takes baseline, MDE, alpha, power, and daily traffic by weekday to output recommended end dates at week boundaries.
  • Seasonality audit: analyze 8 weeks of historical metric data to quantify day-of-week lift factors; propose experiment run rules based on the findings.
  • Post-hoc sensitivity: simulate different MDEs and powers to show how duration changes; present trade-offs to stakeholders.

Learning path

  • Before: Hypothesis testing basics, power and sample size.
  • This lesson: Duration planning and seasonality coverage.
  • Next: Guardrails, novelty effects, and sequential testing considerations.

Next steps

  • Use the checklist on your next A/B test plan.
  • Build or adapt a duration spreadsheet for your team.
  • Take the Quick Test below. Note: anyone can take it; only logged-in users have progress saved.

Mini challenge

You can only run for 10 days this month due to a planned campaign. Your naive calculation says 6 days is enough. Propose a plan that respects day-of-week coverage, includes novelty control, and avoids the campaign window. Write the exact calendar dates and which days you will analyze.

Practice Exercises

2 exercises to complete

Instructions

Baseline p = 0.04. Relative MDE = 12.5% → convert to absolute d. α = 0.05 (two-sided), power = 0.80. Total eligible traffic = 16,000/day; 50/50 split between control and treatment. Tasks:

  • Compute n_per_group (use Z values 1.96 and 0.84).
  • Compute naive duration in days.
  • Propose a seasonality-safe calendar: include at least one full week and a 2-day novelty burn-in excluded from analysis.
Expected Output
n_per_group, naive days, and a plan that ends at a week boundary with novelty excluded and at least 7 analyzed days.

Experiment Duration And Seasonality — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Experiment Duration And Seasonality?

AI Assistant

Ask questions about this tool