luvv to helpDiscover the Best Free Online Tools
Topic 8 of 8

Pre Experiment Analysis Plan

Learn Pre Experiment Analysis Plan for free with explanations, exercises, and a quick test (for Product Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Why this matters

A solid pre-experiment analysis plan prevents bias, scope creep, and last-minute arguments. Product Analysts use it to:

  • Align teams on goals and success criteria before launch.
  • Lock hypotheses, metrics, and analysis choices to avoid p-hacking.
  • Estimate sample size and duration realistically.
  • Define guardrails and sanity checks to keep users safe and the business stable.
  • Decide how to deal with outliers, bots, missing data, and clustering.

Concept explained simply

Think of the plan as your experiment7s flight plan. You set the destination (hypothesis), fuel and timing (sample size and duration), instruments (metrics and tests), safety checks (guardrails), and landing procedure (stopping and analysis rules). You commit to this plan before takeoff to avoid mid-air detours that can invalidate results.

Mental model

  • Define the question: What user behavior should change, and by how much?
  • Choose the lens: Primary metric + a small set of secondary and guardrail metrics.
  • Set the bar: Minimal Detectable Effect (MDE), alpha, power.
  • Clarify who counts: Experiment unit, population, eligibility, and segmentation.
  • Reduce noise: Variance reduction (e.g., CUPED, stratification), blocking, cluster effects.
  • Decide when to stop: Fixed horizon or sequential rules; how to handle peeking.
  • Commit to analysis: Exact statistical test, data filters, outlier handling, missing data plan.

Core components of a Pre-Experiment Analysis Plan

  • Business context & hypothesis
    • Primary hypothesis (directional or two-sided), framed clearly.
    • Expected mechanism: How the change should drive the metric.
  • Population & unit
    • Unit of randomization (user, session, account, household, geo).
    • Eligibility criteria and exposure rules (e.g., first session only).
    • Clustering risk and design effect if not independent.
  • Variants & allocation
    • Control, treatment(s), allocation ratio, ramp plan, holdouts if any.
  • Metrics
    • Primary metric: single success criterion.
    • Secondary metrics: limited and motivated.
    • Guardrail metrics: qualitative safety thresholds (e.g., error rate, latency).
  • Effect size & power
    • MDE (relative or absolute), alpha (usually 0.05), power (usually 0.8).
    • Sample size and duration estimate; adjust for design effect and seasonality.
  • Data & instrumentation
    • Event names, fields, and timing; attribution window.
    • Bot filtering, outlier rules, missing data handling.
  • Analysis plan
    • Primary test (e.g., two-proportion z-test, Welch t-test, Mann013Whitney), two-sided vs one-sided.
    • Variance reduction (e.g., CUPED covariate: recent baseline behavior).
    • Segmentation (pre-specified), multiple testing correction if needed.
  • Stopping & decision rules
    • Fixed horizon or sequential; peeking policy.
    • Decision thresholds for launch, iterate, or roll back.
  • Reporting template
    • Pre-defined table/graph and decision note.
Copy/paste plan template
Title: [Experiment name]
Owner: [Name]
Date:

1) Business context:
2) Primary hypothesis (two-sided or one-sided):
3) Variants & allocation:
4) Population & unit of randomization:
   - Eligibility:
   - Exposure rule:
   - Clustering risk (ICC if known):
5) Metrics:
   - Primary:
   - Secondary (max 3):
   - Guardrails (with thresholds):
6) Effect size & power:
   - Baseline:
   - MDE:
   - Alpha / Power:
   - Sample size per arm (adjusted for design effect):
   - Estimated duration:
7) Data & instrumentation:
   - Events and fields:
   - Bot/outlier rules:
   - Missing data handling:
8) Analysis plan:
   - Test(s):
   - One/two-sided:
   - Variance reduction (e.g., CUPED covariate):
   - Segments (pre-specified):
   - Multiple-testing handling:
9) Stopping & decision:
   - Stopping rule:
   - Decision criteria:
10) Reporting:
   - Tables/graphs:
   - Decision note template:

Worked examples

Example 1: Signup conversion experiment

  • Hypothesis: A simplified form increases signup conversion.
  • Primary metric: User-level signup conversion (% of eligible users who sign up).
  • Baseline: 25%.
  • MDE: +5% relative (to 26.25%).
  • Alpha: 0.05 (two-sided); Power: 0.8.
  • Test: Two-proportion z-test; variance reduction: stratify by traffic source.

Approx sample size per arm: ~38,000 users (normal approximation). Duration example: if 10,000 eligible users enter each arm per day, you need ~4 days of enrollment, then analyze at the end of the window.

Stopping: Fixed horizon; no unplanned peeking.

Why this plan is strong
  • MDE is realistic for a UX change.
  • Unit is user-level, aligned with metric.
  • Variance reduction pre-registered and minimal.
  • Clear stopping rule avoids bias.

Example 2: Revenue per user (ARPU) test

  • Hypothesis: Personalized recommendations increase ARPU.
  • Primary metric: 14-day ARPU per user.
  • Baseline mean: $2.50; SD: ~$9 (skewed).
  • MDE: +6% relative (+$0.15).
  • Alpha: 0.05; Power: 0.8.
  • Test: Welch t-test on log(1+revenue) with back-transform for interpretation; confirm with Mann013Whitney as robustness.
  • Outliers: Winsorize top 0.5% as pre-registered.

Approx sample size per arm (means, rough): ~56,000 users. If 15,000 users per arm enroll daily, duration ~4 days + 14-day measurement window. Guardrails: refund rate, latency, error rate.

Notes
  • Skew handled via transform and winsorization decided upfront.
  • Launch depends on ARPU lift with guardrails stable.

Example 3: D7 retention

  • Hypothesis: A new onboarding flow increases D7 retention.
  • Primary metric: D7 retained (% of users active 7 days after first run).
  • Baseline: 18%.
  • MDE: +8% relative (to 19.44%).
  • Alpha: 0.05; Power: 0.8.
  • Test: Two-proportion z-test; block by platform (iOS/Android).

Approx sample size per arm: ~22,500 users. Design effect: If household-level randomization with average 3 users/household and ICC=0.05, inflate by 1+(3-1)*0.05=1.10 10% more users needed. Duration depends on daily eligible volume; add 7 days to observe D7.

How to write your plan in clear steps

  1. Write the hypothesis as a testable statement with direction and mechanism.
  2. Choose one primary metric tightly tied to the goal. Limit secondaries; add guardrails.
  3. Define population and unit (user, session, account), eligibility, and exposure.
  4. Pick MDE, alpha, power; compute sample size and duration; adjust for design effect.
  5. Lock the analysis: statistical test, one/two-sided, variance reduction, segments, multiple-testing rule.
  6. Specify data handling: logging fields, filters, bots, outliers, missing data.
  7. Set stopping rules and decision logic; commit to a reporting template.
Mini task: sanity checks before launch
  • Does the unit match the metric?
  • Is the MDE practical (not too small/large)?
  • Are guardrails measurable and monitored?
  • Is the stopping rule unambiguous?

Exercises (do these now)

These mirror the exercises below so your answers can be checked. Keep your notes in the template above.

Exercise 1 1 Draft a basic plan

Scenario: You test a new add-to-cart button style on product pages.

  • Baseline click-to-add rate: 20% per eligible session.
  • Choose a realistic MDE, unit, metrics (primary, secondary, guardrails), test, sample size, duration, stopping rule, and data filters.
Hints
  • Prefer user-level over session-level if sessions repeat; otherwise justify session-level.
  • Guardrails: bounce, latency, errors.
  • Two-proportion z-test for binary outcomes.

Exercise 2 1 Clustered design

Scenario: Email subject line test randomized at household level.

  • Baseline open rate: 40% (user-level metric).
  • Avg users/household: 1.8; ICC estimate: 0.08.
  • Naive sample size per arm (ignoring clustering): 2,500 users.
  • Compute design effect and adjusted sample size per arm. Update the plan accordingly.
Hint

Design effect = 1 + (m-1)*ICC. Multiply naive n by this factor.

Exercise 3 1 Skewed metric analysis

Scenario: Checkout flow change; primary metric: ARPU over 7 days; revenue is highly skewed.

  • Pick the primary test and any transformation or winsorization.
  • Define outlier and bot rules.
  • State how you will interpret back-transformed effects.
Hint

Consider Welch t-test on log(1+revenue), confirm with a nonparametric test for robustness.

Common mistakes and how to self-check

  • Unit mismatch: Randomizing by session but measuring user-level. Fix: match unit and metric or justify cluster adjustment.
  • Multiple primary metrics: Dilutes decisions. Fix: one primary; rest are secondary or guardrails.
  • MDE too small: Unrealistic durations. Fix: tie MDE to business value and traffic realities.
  • Undefined stopping rule: Encourages peeking. Fix: pre-register fixed horizon or sequential method.
  • Vague data filters: Post-hoc cherry-picking. Fix: specify bot/outlier/missing rules now.
  • Ignoring clustering/seasonality: Underpowered results. Fix: design effect and blocking/stratification.
Self-check checklist
  • Primary hypothesis and metric are singular and aligned.
  • MDE, alpha, power recorded with justification.
  • Sample size per arm computed and adjusted for design effect.
  • Stopping rule and decision thresholds are explicit.
  • Data instrumentation and filters fully specified.
  • Variance reduction and segments pre-registered.

Practical projects

  • Redesign plan: Rewrite an old A/B test plan from your org with clear MDE, guardrails, and stopping rules. Compare decisions you would make now.
  • Metric audit: Take a current product KPI and draft valid definitions (denominator, window, attribution) and guardrails.
  • Variance reduction pilot: Simulate a CUPED covariate (e.g., prior week activity) and show variance reduction on historical data.

Learning path

  • Before this: Hypothesis framing, metric design, basic statistics (proportions, means, alpha/power), traffic sizing.
  • Now: Pre-experiment analysis plan (this lesson).
  • Next: Implementation checks, randomization diagnostics, and post-experiment analysis.

Who this is for

  • Product Analysts, PMs, Data Scientists, Growth Analysts.

Prerequisites

  • Comfort with conversion and retention metrics.
  • Basic hypothesis testing (z-test, t-test), power and MDE concepts.
  • Familiarity with your product7s logging/events.

Mini challenge

Pick a real idea you7re considering. In 15 minutes, fill the template7s sections 110 at a draft level. If you can7t decide on the primary metric or MDE, pause the experiment and resolve those first.

Quick Test

Take the Quick Test below to check your understanding. Available to everyone; only logged-in users get saved progress.

Practice Exercises

3 exercises to complete

Instructions

Scenario: You test a new add-to-cart button style on product pages.

  • Baseline click-to-add rate: 20% per eligible session.
  • Choose and state: hypothesis, unit, eligibility, primary/secondary/guardrail metrics, MDE, alpha/power, sample size per arm, duration estimate, analysis test, variance reduction, stopping and decision rules, data filters (bots/outliers/missing).
Expected Output
A filled plan template with clear values (e.g., MDE +5% relative, two-proportion z-test, ~30–40k sessions per arm, fixed 7-day horizon, guardrails defined).

Pre Experiment Analysis Plan — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Pre Experiment Analysis Plan?

AI Assistant

Ask questions about this tool