How to learn Experiment Duration And Seasonality for Experiment Design in Data Scientist for free

Why this matters

Choosing the right experiment duration is one of the highest-leverage decisions you make as a Data Scientist. Too short, and you risk false wins or misses. Too long, and you waste time and expose users to a poor variant. Seasonality (day-of-week, monthly cycles, holidays) can skew results if you stop at the wrong time.

Estimate when you will reach the needed sample size for your Minimum Detectable Effect (MDE).
Plan runs to cover seasonality cycles (e.g., full weeks).
Control novelty effects and avoid biased stopping decisions.

Who this is for

Data Scientists and Analysts running A/B/n tests or switchback tests.
Product managers who need realistic experiment timelines.
Engineers enabling experimentation who want guardrail-aware plans.

Prerequisites

Basic hypothesis testing (alpha, power) and confidence intervals.
Understanding of conversion rates or continuous metrics (mean, variance).
Ability to estimate traffic volume per group per day.

Concept explained simply

Think of duration as filling a bucket to a marked line. The faucet flow (traffic) pulses by day and season. You must fill to the line (required sample size) and stop at a good moment: when one or more full cycles have passed, so a single pulse doesn’t mislead you.

Key inputs

Baseline level/variance: for proportions p (conversion rate) or standard deviation σ for continuous metrics.
MDE (minimum detectable effect): absolute or relative (convert relative to absolute units for formulas).
Significance (alpha) and power (1 - beta).
Daily events per group (after eligibility filters).
Seasonality pattern: day-of-week, monthly/quarterly, holidays, marketing campaigns.
Novelty or ramp time: initial stabilization period to exclude or downweight.

Simple formulas (back-of-envelope)

For a binary metric (conversion rate) with baseline p and absolute MDE = d (in proportion points, e.g., 0.005 for 0.5 pp), an approximate per-group sample size is:

n_per_group ≈ 2 × p × (1 − p) × (Z_1−α/2 + Z_power)² / d²

For a continuous metric with standard deviation σ and absolute MDE = d:

n_per_group ≈ 2 × σ² × (Z_1−α/2 + Z_power)² / d²

Duration (days) ≈ ceil(n_per_group / daily_events_per_group)

Good practice: commit to a fixed horizon that includes at least one full seasonality cycle (e.g., a full week), and stop only at cycle boundaries (end of week).

What Z-values should I use?

α = 0.05 (two-sided) → Z_1−α/2 ≈ 1.96
Power = 0.80 → Z_power ≈ 0.84
Power = 0.90 → Z_power ≈ 1.28

These are standard approximations for planning.

Seasonality playbook

Day-of-week: Always cover at least one full week. If n is large or traffic variable, prefer multiples of full weeks.
Known events (launches, holidays): Avoid or explicitly include as full cycles. Use blackouts (pause enrollment) if needed.
Marketing bursts: Start after ramp settles, or run for full campaign windows.
Switchbacks (if applicable to supply/demand systems): Use full cycles of assignment to cover temporal patterns.

Worked examples

Example 1: Website conversion A/B

Goal: Detect a 10% relative lift from 5% baseline. So absolute MDE d = 0.10 × 0.05 = 0.005 (0.5 pp). α = 0.05, power = 0.80 → Z sum ≈ 1.96 + 0.84 = 2.80.

n_per_group ≈ 2 × 0.05 × 0.95 × (2.80)² / 0.005² = 2 × 0.0475 × 7.84 / 0.000025 ≈ 0.095 × 7.84 / 0.000025 ≈ 0.7448 / 0.000025 ≈ 29,792 users per group.

Traffic: 20,000 eligible users per day total, 50/50 split → 10,000 per group per day. Duration for sample size ≈ ceil(29,792 / 10,000) = 3 days.

But: day-of-week seasonality exists. Plan: run at least one full week and include a 2-day novelty burn-in excluded from analysis. Practical plan: 9 calendar days total; analyze the last 7 days ending at a weekday boundary.

Example 2: Continuous metric (revenue per user)

Baseline σ = $3, MDE d = $0.20, α = 0.05, power = 0.90 → Z sum = 1.96 + 1.28 = 3.24.

n_per_group ≈ 2 × 3² × (3.24)² / 0.20² = 18 × 10.4976 / 0.04 ≈ 188.96 / 0.04 ≈ 4,724 users per group.

Traffic: 1,000 users per group per day → 4.7 → 5 days. Seasonality rule: round up to a full week. Practical plan: 7 days; optionally exclude day 1 as novelty.

Example 3: Uneven weekly traffic

Per-group daily events: Mon–Thu 6k, Fri 8k, Sat 12k, Sun 12k. Weekly total = 56k per group. Required n_per_group = 50k. Even if you reach 50k in 5–6 days, stopping mid-week over-represents weekend behavior. Plan: stop only at a week boundary. One full week covers the cycle and exceeds the sample need.

Example 4: Holiday interference

You plan a pricing test across late November. Black Friday/Cyber Monday will spike traffic and change buyer intent. Options:

Exclude enrollment during the holiday (blackout) and extend the run to complete a full post-holiday week.
Run two full-week phases: pre-holiday and post-holiday; analyze each phase separately and then meta-analyze if appropriate.

How to plan duration (step-by-step)

Define metric, baseline (p or σ), MDE (absolute), alpha, and power.
Compute back-of-envelope n_per_group with the formulas above.
Estimate daily events per group and get a naive day count = ceil(n_per_group / daily_events_per_group).
Map seasonality cycles. For day-of-week effects, round up to a whole number of weeks.
Add novelty stabilization (e.g., 1–2 days) to the calendar; either exclude it from analysis or pre-register a weighting scheme.
Pre-register stop rules: stop only at cycle boundaries; avoid peeking-based early stopping.
Check guardrails (e.g., error rates, latency). If guardrails degrade, define safety stop conditions.

Mini reference: absolute vs relative MDE

Relative MDE → absolute: d = relative × baseline. Example: 10% of 5% = 0.5 pp = 0.005.
For continuous metrics, specify d in the same units as the metric (e.g., $0.20).

Common mistakes and self-check

Stopping mid-week: can bias results if weekends/weekday behavior differ. Self-check: Does the run end at a week boundary?
Ignoring novelty: early user curiosity inflates metrics. Self-check: Compare day 1–2 vs later days; large drifts suggest novelty.
MDE confusion: using relative MDE in a formula that needs absolute. Self-check: Confirm units of d before calculating.
Peeking repeatedly without correction: raises false-positive risk. Self-check: Did you commit to a fixed horizon or use a sequential method?
Traffic overestimation: assuming all users are eligible. Self-check: Use observed eligible traffic from recent weeks.
Holiday blindness: running across major events unintentionally. Self-check: Review the calendar for launches, holidays, and campaigns.

Pre-launch checklist

Baseline and MDE are documented; d is absolute.
n_per_group computed for primary metric; guardrails listed.
Daily eligible traffic per group is realistic.
Duration covers ≥1 full week (or ≥1 full relevant cycle).
Novelty period decided (exclude or include with note).
Stop rule: end at cycle boundary; no unplanned peeking.
Holiday/campaign calendar reviewed; blackout or split phases planned.

Exercises (practice here, then check solutions)

Exercise 1 (matches ex1): Baseline p = 0.04, target relative MDE = 12.5% (two-sided), α = 0.05, power = 0.80, total eligible traffic = 16,000/day (50/50 split). Compute n_per_group, naive duration, then propose a seasonality-safe plan with novelty handling.
Exercise 2 (matches ex2): You need n_per_group = 80,000 events for your add-to-cart rate. Your per-group daily events are Mon–Thu 7,000, Fri 9,000, Sat 13,000, Sun 11,000. Plan a stop date if you start on Tuesday, considering novelty = 1 day and week-boundary stopping.

Practical projects

Create a duration planner: a small spreadsheet that takes baseline, MDE, alpha, power, and daily traffic by weekday to output recommended end dates at week boundaries.
Seasonality audit: analyze 8 weeks of historical metric data to quantify day-of-week lift factors; propose experiment run rules based on the findings.
Post-hoc sensitivity: simulate different MDEs and powers to show how duration changes; present trade-offs to stakeholders.

Learning path

Before: Hypothesis testing basics, power and sample size.
This lesson: Duration planning and seasonality coverage.
Next: Guardrails, novelty effects, and sequential testing considerations.

Next steps

Use the checklist on your next A/B test plan.
Build or adapt a duration spreadsheet for your team.
Take the Quick Test below. Note: anyone can take it; only logged-in users have progress saved.

Mini challenge

You can only run for 10 days this month due to a planned campaign. Your naive calculation says 6 days is enough. Propose a plan that respects day-of-week coverage, includes novelty control, and avoids the campaign window. Write the exact calendar dates and which days you will analyze.

Menu

Experiment Duration And Seasonality

Table of Contents