Who this is for
- Data Scientists who plan A/B tests, AA tests, or model rollouts.
- Analysts and Product Managers converting ideas into measurable experiments.
- Engineers who need crisp, testable specs for changes.
Prerequisites
- Basic statistics: proportions/means, p-values, confidence intervals.
- Familiarity with key metrics (conversion rate, CTR, retention).
- Fundamentals of A/B testing (exposure, randomization, sample size).
Why this matters
In real data science work you will:
- Turn vague requests like “boost engagement” into testable statements.
- Choose a primary metric and guardrails before any code changes.
- Prevent p-hacking by defining direction, audience, and thresholds upfront.
- Communicate clearly with stakeholders and speed up approvals.
- Reduce wasted experiments and make your results decision-ready.
Concept explained simply
A hypothesis is a clear, testable statement about how a change will affect a metric for a specific audience within a timeframe, and why.
Use this format:
Because [mechanism], changing [X] for [audience] will [increase/decrease/no change] [primary metric] by [Δ or at least Δ] within [timeframe], without worsening [guardrail metrics] beyond [limits].
You’ll state both:
- H0 (Null): No effect (or effect ≤ threshold).
- H1 (Alternative): Effect in the stated direction (or ≥ threshold).
Mental model: SMART-HG
- Subject & Scope: who/where the change applies.
- Mechanism: why the change should work (the causal story).
- Action: what exactly is changing (the treatment).
- Result metric: one primary metric tied to the goal.
- Threshold & Time: minimum detectable improvement and evaluation window.
- H0/H1: explicit null and alternative hypotheses.
- Guardrails: must-not-worsen metrics with limits.
Tip: Directional vs. two-sided
Use a one-sided hypothesis when your decision rule is asymmetric (e.g., you’ll only ship if it increases conversion). Use two-sided when any change (up or down) matters (e.g., latency must be stable).
Worked examples
Example 1 — E-commerce checkout button color
Hypothesis (H1): Because higher contrast improves visual salience, changing the checkout button from gray to green for all desktop users will increase purchase conversion by at least 1 percentage point (absolute) over 2 weeks, without increasing refund rate above 0.5% or decreasing average order value by more than 1%.
H0: Conversion increase ≤ 1 pp, or guardrails violated.
- Primary metric: Purchase conversion (session → order).
- Audience: Desktop web, all geographies.
- Timeframe: 2 weeks.
- MDE: 1 pp absolute.
- Guardrails: Refund rate, AOV.
Why this is framed well
- Includes mechanism (salience).
- Sets direction and minimal threshold (decision rule).
- Defines scope (desktop) and guardrails.
Example 2 — Recommendation ranking model
Hypothesis (H1): Because the new ranking model increases relevance via better user-item embeddings, replacing the current ranker for logged-in mobile users will increase homepage CTR by 3–5% relative within 3 weeks, without increasing bounce rate by more than 0.5 pp or increasing average latency above 50 ms.
H0: CTR lift < 3% or guardrails violated.
- Primary metric: Homepage CTR.
- Audience: Logged-in mobile users only.
- Timeframe: 3 weeks.
- MDE: 3% relative.
- Guardrails: Bounce rate, latency.
Why this is framed well
- Aligns to product goal (engagement) with relevance mechanism.
- Sets explicit latency guardrail for user experience.
- Segmented audience reduces noise and focuses impact.
Example 3 — Pricing page copy
Hypothesis (H1): Because clarifying benefits reduces confusion, replacing jargon with plain-language bullets on the pricing page for new visitors will decrease exit rate by at least 5% relative over 1 week, without reducing free-trial starts by more than 1%.
H0: Exit rate reduction < 5% or free-trial starts drop > 1%.
- Primary metric: Pricing page exit rate.
- Guardrail: Free-trial start rate.
- Assumption: Traffic sources remain stable.
Why this is framed well
- States a plausible mechanism (clarity).
- Protects downstream conversion via guardrail.
- Short evaluation window suited for high-traffic page.
Step-by-step: framing a strong hypothesis
- Define outcome — Pick one primary metric tightly tied to the decision.
- Specify audience — Segment by platform, geography, user state, or funnel stage.
- Articulate mechanism — The causal reason this change should move the metric.
- Set threshold — Minimum effect size worth shipping (MDE) and direction.
- Add guardrails — Metrics that must not degrade beyond set limits.
- Choose timeframe — Enough time to capture behavior and stabilize variance.
- Write H0/H1 — Make them falsifiable and tied to the decision rule.
- Pre-commit — Record hypothesis before running the experiment.
Checklist — does your hypothesis pass?
- Primary metric is single, decision-aligned, and measurable.
- Audience/segment is explicit.
- Direction and threshold are stated (or two-sided if required).
- Mechanism is plausible and specific.
- Guardrails and limits are defined.
- Timeframe is realistic for traffic and behavior cycles.
- H0/H1 are explicit and testable.
Exercises
Practice here, then compare with solutions. Everyone can do the exercises; only logged-in users will have their progress saved.
Exercise 1 — Rewrite vague goals as testable hypotheses
Take each vague statement and rewrite it using the template and SMART-HG. Then state H0 and H1.
- “Improve onboarding.”
- “Reduce churn.”
- “Make search better.”
Hints
- Pick one primary metric per statement.
- Specify the audience and timeframe.
- Add a minimum detectable effect and guardrails.
Exercise 2 — Define metrics, guardrails, and thresholds
Scenario: You will add personalized subject lines to marketing emails for active subscribers.
- Choose a primary metric and 1–2 guardrails.
- State a plausible mechanism.
- Write H0/H1 with direction and threshold.
- Pick an evaluation window.
Hints
- Marketing emails often optimize open rate or downstream conversion.
- Guardrails might include unsubscribe rate or spam complaints.
- Short windows can work if you send at scale weekly.
Self-check after exercises
- Can someone reading your hypothesis run the test without extra clarifications?
- Is the decision rule obvious from the threshold and guardrails?
- Would you accept “no ship” if results don’t meet H1? If not, refine.
Common mistakes and self-check
- Too many primary metrics: Pick one; others are guardrails or secondary.
- Vague audience: Name the platform, user type, geography, or funnel stage.
- No mechanism: Add the causal story; it guides diagnostics.
- No threshold: Without MDE you can’t decide to ship or stop.
- Missing guardrails: Prevent harmful trade-offs (e.g., CTR vs. latency).
- Open-ended timeframe: Predefine the window to avoid peeking bias.
Quick self-audit
- Can H0/H1 be falsified with planned data?
- Is the metric stable enough in the chosen window?
- Are seasonal or campaign confounders controlled?
Practical projects
- Audit 5 past experiments: rewrite each hypothesis with SMART-HG and note what changed in decisions.
- Create a hypothesis library: 10 ideas mapped to metrics, thresholds, and guardrails for your product area.
- Run a simulated A/A test plan with a fully pre-registered hypothesis and “no effect” decision rule.
Learning path
- Before: Metrics design, randomization basics, power/MDE intuition.
- Now: Hypothesis framing (this lesson) — tie ideas to measurable outcomes and guardrails.
- Next: Sample size and power, experiment execution, effect interpretation, and iteration planning.
Next steps
- Convert one live idea into a SMART-HG hypothesis and review with your team.
- Pre-register the hypothesis in your experiment doc before implementation.
- Take the quick test below to check retention.
Mini challenge
Write a one-sentence hypothesis for a navigation redesign on mobile that targets task completion. Include audience, mechanism, metric, threshold, timeframe, and one guardrail.
Quick Test
Everyone can take the test. Only logged-in users will have their progress saved.