Why this matters
In real product work, you will run A/B tests to ship features safely. Primary metrics tell you if the change achieved the goal. Guardrail metrics make sure you didn’t harm the business or users while chasing that goal.
- Decide if a test ships: Did the primary metric move as intended?
- Protect the business: Did any guardrail breach your safety thresholds?
- Speed up iteration: Clear metrics avoid debates and post-hoc fishing.
Progress note: The quick test is available to everyone; only logged-in learners have their progress saved.
Concept explained simply
Primary metric: the single best success indicator for your experiment’s objective.
Guardrail metrics: safety checks that must not worsen beyond agreed limits.
Mental model: Goal Gate + Safety Rails
Picture your experiment as a road:
- The Goal Gate is your primary metric: if it improves, you can pass.
- Safety Rails are guardrails: if any rail is hit (breached), you stop or investigate, even if the primary looks good.
Primary vs. secondary vs. guardrail
- Primary: the main decision-maker aligned to the experiment goal.
- Secondary: diagnostic metrics that help explain why (not used for ship/no-ship).
- Guardrail: critical protections (e.g., churn, latency, errors, costs).
Qualities of good metrics
- Aligned: Directly maps to the objective (e.g., Purchase Conversion for checkout changes).
- Sensitive: Likely to move if the change works (not overly lagged).
- Unambiguous: Clear directionality (up = good or down = good).
- Low latency: Measurable in test window (e.g., 7–14 days).
- Stable and reliable: Not overly noisy or seasonality-skewed.
- Ethical and user-centric: Avoids incentivizing harmful behavior.
How to choose metrics (step-by-step)
- State the objective: What user or business outcome is the experiment meant to change?
- Map the funnel or system: Identify the stage you’re influencing.
- Pick one primary metric: The most direct, sensitive measure of success.
- Select guardrails across pillars:
- User experience/quality (e.g., error rate, latency, complaint rate)
- Engagement/retention (e.g., D7 retention, unsubscribe rate)
- Revenue/cost/compliance (e.g., refunds, CPA, fraud)
- Define measurement window and unit: User or session; 1, 7, or 14-day window.
- Pre-spec thresholds: What counts as improvement, and what breach stops the test?
- Plan analysis: Test length, power, alpha; one- or two-tailed for the primary; two-tailed for guardrails.
Common formulas
- Conversion rate = Conversions / Eligible users
- Revenue per user (RPU) = Total revenue / Users
- Refund rate = Refunds / Orders
- Error rate = Error events / Requests
- Unsubscribe rate = Unsubscribes / Recipients
- Latency (p95) = 95th percentile of response times
Worked examples
Example 1: Homepage banner
- Goal: Increase product page visits.
- Primary: Click-through rate to product pages (CTR to PDP).
- Guardrails: Bounce rate (must not increase), p95 page load (must not worsen), error rate (no spike).
- Decision: If CTR improves and no guardrail breaches, ship.
Example 2: Checkout UX simplification
- Goal: Improve purchase conversion.
- Primary: Purchase conversion rate (eligible users).
- Guardrails: Refund rate, fraud rate, support contact rate, average order value (no material drop).
- Decision: If conversion rises but refund rate exceeds threshold, pause and investigate.
Example 3: Push notification timing
- Goal: Improve D7 retention.
- Primary: D7 retention.
- Guardrails: Unsubscribe rate, complaint/spam rate, crash rate, battery-intensive background time (proxy if available).
- Decision: If D7 retention improves but unsubscribes spike, do not ship.
Example 4: Pricing test
- Goal: Maximize revenue per visitor (RPV).
- Primary: RPV.
- Guardrails: Conversion rate (must not collapse), refund rate, NPS/complaints (if surveyed), CAC changes (if applicable).
Measurement window and assignment
- Unit of analysis: Match to how users experience the change (user-level for onboarding; session-level for page tweaks).
- Window: Choose the shortest window that still captures the effect (e.g., 7 days for retention; 1–3 days for CTR).
- Normalization: Per user or per session as needed; avoid mixed units.
Thresholds and decision rules
- Primary success: Pre-specify minimum detectable effect (MDE) and significance (e.g., +3% relative at 95% confidence).
- Guardrail breach: Pre-specify limits (e.g., refund rate must not increase by more than 0.3 pp; p95 latency must not exceed +5%).
- Stopping: Stop early only with strict rules (e.g., alpha spending); otherwise run to planned sample size.
Practical template you can reuse
Experiment: [Name] Objective: [e.g., Increase purchase conversion] Primary metric: [One metric, with window] Guardrails: [List 3–5 with thresholds and window] Unit/Window: [User-level, 14 days] Stat plan: [Alpha, power, tails, MDE] Stop rules: [Guardrail breach, data quality issue] Notes: [Risks, assumptions]
Common mistakes (and self-check)
- Too many primary metrics: Choose one primary. Others are secondary or guardrails.
- Vague directionality: Define which way is good/bad.
- No guardrail thresholds: Add numeric limits before launch.
- Lagging primary: Pick a metric that moves within the test window.
- Ignoring variance: Noisy metrics make tests inconclusive; prefer stable proxies.
Self-check questions
- Can I state the objective in one sentence?
- Is the primary the most direct measure of that objective?
- Do guardrails cover user experience, business risk, and quality?
- Are thresholds numeric and pre-specified?
- Will my window capture effects without excessive noise?
Exercises
Complete these tasks, then compare with the solutions.
- Exercise 1 (ex1): Pick a primary and guardrails for a new onboarding tutorial.
- Exercise 2 (ex2): Turn baselines into numeric thresholds for a checkout test.
- Exercise 3 (ex3): Plan guardrails for a notification timing experiment.
- Checklist before checking solutions:
- Objective clearly written.
- Exactly one primary metric.
- 3–5 guardrails with quantitative thresholds.
- Unit and window defined.
Practical projects
- Audit an old experiment: Re-classify metrics into primary, secondary, guardrails, and propose thresholds.
- Metric sheet: For your product area, create a ready-to-use table of typical primaries and guardrails with formulas and windows.
- Simulation: Using sample baselines, estimate test length under different MDEs and pick pragmatic thresholds.
Who this is for
- Product Analysts and Data Scientists running A/B tests.
- PMs who define success criteria for experiments.
- Engineers owning experiment instrumentation.
Prerequisites
- Basic A/B testing concepts (control vs variant, p-values, confidence intervals).
- Comfort with ratios and averages (conversion rate, RPU).
- Understanding of your product funnel.
Learning path
- Start: Define objectives and map your funnel.
- Then: Select one primary metric; add guardrails across UX, retention, and risk.
- Next: Set windows and numeric thresholds; write a brief stat plan.
- Practice: Do the exercises and the quick test.
- Apply: Use the template in your next experiment PRD.
Next steps
- Use the template to pre-spec metrics for your next test.
- Share with your team for alignment before launch.
- Run the quick test below to validate your understanding.
Mini challenge
You launch a new recommendation widget aiming to increase add-to-cart rate. Pick one primary and three guardrails with thresholds. Write them down in under five minutes, then sanity-check with the self-check list above.