Why this matters
As a Data Scientist, your experiments influence product decisions, revenue, and user experience. A clear analysis plan prevents p-hacking, aligns stakeholders, and speeds up readouts. You will use it to: define decision rules before seeing data, choose correct statistical tests, size samples, handle messy data, and deliver confident, reproducible results.
- Ship/no-ship decisions for new features
- Choosing between two onboarding flows
- Pricing/discount tests with revenue impacts
- Personalization model rollouts
Concept explained simply
An analysis plan is a written checklist of what you will compute and how you will decideâbefore you see the results. The readout is the concise story that presents the results, decisions, and next steps.
Mental model
Think of the analysis plan as a flight plan: route (metrics), instruments (tests), fuel (sample size), safety checks (guardrails), and landing criteria (decision rule). The readout is the landing report: clear, short, and actionable.
Core components of an analysis plan
- Primary objective and hypotheses: what you expect and why.
- Primary metric(s) and any secondary/diagnostic metrics.
- Experimental unit and inclusion/exclusion criteria.
- Statistical test(s) and assumptions; alternative if assumptions fail.
- Minimum Detectable Effect (MDE), power, sample size, and duration.
- Missing data, outliers, and logging issues handling.
- Guardrail metrics (e.g., latency, crash rate, unsubscribe rate).
- Segmentation plan (pre-specified only) and multiple testing control.
- Stopping rule (fixed horizon or sequential, and how often to peek).
- Decision rule (e.g., ship if p < 0.05 and lift â„ MDE; or Bayesian threshold).
- Readout template: visuals, narrative, and owner of the decision.
Worked examples
Example 1 â Conversion rate (binary)
Goal: Increase account signups by changing button copy.
- Primary metric: Signup conversion rate per visitor.
- Unit: Unique visitor; exclude bots; 1 exposure per visitor.
- Test: Two-proportion z-test (or chi-square) for difference in proportions.
- MDE: +2% absolute (e.g., 10% â 12%); Power 80%; Alpha 0.05.
- Sample size: Use standard proportion power calc; if baseline 10%, MDE 2%, expect tens of thousands of visitors per arm (illustrative; compute precisely).
- Assumptions: Independent visitors; enough counts in each cell.
- Guardrails: Error rate no increase > 0.2pp; page load time +100ms max.
- Stopping rule: Fixed 14 days; no early peeks.
- Decision rule: Ship if p < 0.05 AND observed lift â„ +1.5% AND guardrails pass.
Example 2 â Revenue per user (skewed)
Goal: Test free shipping threshold impact on revenue per visitor.
- Primary metric: Revenue per visitor (RPV), heavy-tailed.
- Test: Nonparametric (MannâWhitney) or permutation test on mean; confirm with bootstrap CI for mean difference.
- Data handling: Winsorize top 0.1% or apply log-transform for modeling; keep raw for final mean difference reporting.
- MDE: +3% in mean RPV; Power 80%; Alpha 0.05; larger N due to high variance.
- Guardrails: Conversion rate non-decreasing; refund rate not +>0.3pp.
- Segmentation: New vs returning (pre-specified only); Holm correction across 2 segments.
- Decision rule: Ship if 95% bootstrap CI for mean lift â„ 0 and point estimate â„ +2% with guardrails OK.
Example 3 â Notifications count (overdispersed)
Goal: Reduce spam complaints while keeping engagement via notifications.
- Primary metric: Sessions per user in 14 days (count, overdispersed).
- Test/model: Negative binomial regression with treatment indicator; report marginal mean difference and 95% CI.
- Guardrails: Unsubscribe rate not +>0.5pp; complaint rate not +>0.1pp.
- Stopping: Fixed horizon (21 days).
- Decision rule: Ship if adjusted mean sessions change within â1% to +â and guardrails stable (non-inferiority on engagement).
Step-by-step to write your analysis plan
- State business goal and primary hypothesis in one sentence.
- Select the primary metric (one) and diagnostic metrics (few).
- Define unit, exposure, and inclusion/exclusion rules.
- Pick your test based on metric type and distribution; note backup method.
- Choose MDE, alpha, power; compute sample size and duration.
- Predefine missing/outlier handling and data quality checks.
- List guardrails and their acceptable ranges.
- Specify segmentation and multiple testing control (only pre-specified).
- Write the stopping rule and decision rule.
- Draft the readout: visual(s), narrative bullets, and the decision owner.
Simple readout template you can reuse
- What we tested: [Feature], audience, dates, exposure.
- Primary result: [Metric], lift, CI/p-value, decision rule outcome.
- Diagnostics: Secondary metrics and guardrails.
- Segments (pre-specified only): Key differences or none.
- Decision: Ship/Donât ship/Iterate. Why.
- Risks and follow-ups: What to monitor post-ship; next experiment.
Exercises (do these now)
Mirror of the graded exercises below. Do your draft, then compare with the solutions.
Exercise 1 â Binary conversion test
Scenario: Checkout page tweak expected to increase purchase conversion from 5% to 5.8%.
- Define: primary metric, test, MDE/power/alpha, stopping rule, decision rule.
- Add two guardrails.
Exercise 2 â Heavy-tailed revenue
Scenario: New recommendation module may raise mean order value; distribution is skewed with outliers.
- Choose analysis method(s).
- State how youâll handle outliers and missing data.
- Write a three-bullet readout summary template for this case.
Self-check checklist
- â One clear primary metric and hypothesis
- â Correct test chosen for metric type and distribution
- â MDE, alpha, power, and sample size logic explained
- â Guardrails and segmentation pre-specified
- â Decision rule is mechanical and unambiguous
- â Readout template is concise and actionable
Common mistakes and how to self-check
- Vague decision rules: Write a crisp rule that anyone could apply blindly.
- Too many metrics: Pick one primary; keep others diagnostic.
- Unplanned peeking: If you peek, predefine alpha-spending or use sequential methods.
- Ignoring distribution shape: Use nonparametric, robust, or bootstrap methods for skew/heavy tails.
- Uncontrolled multiple comparisons: Limit segments or adjust (Holm/BenjaminiâHochberg).
- Post-hoc exclusions: Predefine inclusion/exclusion; document any deviations with sensitivity checks.
Practical projects
- Write an analysis plan for an email subject line A/B test (opens as primary; unsub as guardrail).
- Draft a readout for a pricing banner test with revenue and conversion diagnostics.
- Design a non-inferiority test plan for reducing notification frequency without harming 7-day retention.
Who this is for
- Data Scientists and Analysts running product or growth experiments
- Product Managers who consume experiment readouts
- Engineers interested in trustworthy decision-making from tests
Prerequisites
- Basic statistics (hypothesis testing, confidence intervals)
- Understanding of experiment setup (randomization, exposure)
- Ability to compute sample size with a standard tool or library
Learning path
- Before: Hypotheses & metrics; Sampling & randomization
- Now: Analysis plan and readout
- Next: Interpreting results and making product decisions; Sequential testing and Bayesian readouts
Mini challenge
In 10 minutes, draft a one-paragraph analysis plan for testing a new search ranking algorithm. Include primary metric, test choice, MDE, guardrails, and decision rule. Keep it to 6 sentences max.
Next steps
- Turn your latest experiment into a one-page analysis plan using the checklist.
- Run a dry run readout with a teammate. Timebox to 5 minutes.
- Create a personal template you can reuse for future tests.
Quick test and progress note
The quick test below is available to everyone. Only logged-in users will have their progress saved.