How to learn Segment Level Analysis for A/B Testing in Product Analyst for free

Who this is for

Product Analysts and data-curious PMs who run experiments and need to understand how different user groups respond to changes. If you already know A/B test basics, this lesson helps you find where the impact truly lives.

Prerequisites

Comfort with conversion rate, lift, confidence intervals, and p-values
Basic understanding of experiment design (control vs. variant, randomization)
Ability to compute metrics per group (e.g., SQL or spreadsheet skills)

Why this matters

In real products, user behavior varies by device, country, lifecycle stage, acquisition channel, and more. An experiment may be neutral overall but highly positive for a key segment—or vice versa. Segment-level analysis helps you:

Find where the lift is coming from (heterogeneous treatment effects)
Detect adverse effects hidden in averages (avoid shipping regressions)
Decide targeted rollouts (e.g., ship to iOS first)
Diagnose data issues like SRM (Sample Ratio Mismatch) within segments

Note: The quick test at the end is available to everyone; only logged-in users get saved progress.

Concept explained simply

Segment-level analysis means splitting users into meaningful groups and measuring the experiment effect within each group. You still follow the same steps as overall analysis—compute the metric for control and variant, estimate lift and uncertainty—but you do it repeatedly per segment.

Mental model

Think of your overall result as a weighted average of segment results. Large segments with small lift can outweigh small segments with big lift. If segment behavior differs a lot, the average can mislead you (Simpson’s paradox).

Good segments to consider

Platform: iOS, Android, Web
Lifecycle: New vs. Returning users, Tenure buckets
Acquisition: Paid vs. Organic, Channel
Geography: Country, Region, Language
User attributes: Price tier, Device class, App version
Context: Time-of-day, Day-of-week, Seasonality

When to segment (and how much)

Pre-specify a short list of primary segments tied to your hypothesis (e.g., “This navigation change likely helps mobile more”).
Limit to a manageable number to avoid power loss and false positives.
Use segments that existed before the experiment (no post-hoc cherry-picking).
If exploring many segments, treat findings as directional and confirm with follow-up tests.

Workflow: step-by-step

Define segments and metrics upfront (primary vs. exploratory).
Check data health per segment: sample balance and SRM.
Compute per-segment metrics for control and variant.
Estimate lift and uncertainty per segment (CI or p-value).
Look for consistent patterns (e.g., mobile segments all positive).
Make a decision: ship globally, ship to a segment, iterate, or stop.

Tip: Guardrails per segment

Track critical guardrails (e.g., crash rate, latency) by segment to catch regressions even if primary metrics look fine.

Worked examples

Example 1: Conversion by device

Metric: Signup conversion. Hypothesis: New layout helps small screens more.

iOS: Control 10,000 users, 1,000 signups (10.0%); Variant 10,500 users, 1,190 signups (11.33%) → Lift ≈ +13.3%
Android: Control 12,000 users, 1,560 signups (13.0%); Variant 11,800 users, 1,416 signups (12.0%) → Lift ≈ −7.7%
Overall: Control 22,000/2,560 (11.64%); Variant 22,300/2,606 (11.69%) → Overall lift ≈ +0.4%

Insight: Overall looks flat, but iOS is clearly up and Android down. Consider shipping to iOS only while iterating on Android UX.

Example 2: Revenue per user by acquisition channel

Metric: Revenue/user (RPU). Hypothesis: Paywall tweak helps Organic more than Paid.

Organic: Control $1.80; Variant $2.00 → +11.1%
Paid: Control $2.40; Variant $2.36 → −1.7%
Overall depends on mix: if Organic is 70% of traffic, overall may still be positive; if Paid dominates, effect can vanish.

Decision: If Paid users are sensitive to pricing, roll out to Organic first; run a tailored variant for Paid.

Example 3: Retention by lifecycle stage

Metric: D7 retention. Hypothesis: New onboarding helps new users more.

New users: Control 24%; Variant 27% → +3pp
Returning users: Control 44%; Variant 43% → −1pp

Weighted overall effect will depend on the share of new vs. returning users. If your growth strategy focuses on acquisition, the +3pp for new users may justify shipping with a guardrail for returning users.

Quality checks before trusting segment results

SRM by segment: Are control/variant allocations close to expected (e.g., 50/50)? Large deviations can indicate tracking or assignment issues.
Power: Tiny segments may show noisy swings. Prefer confidence intervals and directionality over just p-values in small segments.
Multiple looks: If you peek at many segments, treat results as exploratory and confirm.

Quick SRM self-check

Compare the share of each segment in control vs. variant. If the difference is >2 percentage points for a big segment, investigate randomization and event logging.

Common mistakes and how to self-check

Over-segmentation: Splitting too thinly reduces power. Self-check: Do your segments each have enough users to stabilize metrics?
Post-hoc storytelling: Only highlighting favorable segments. Self-check: Which segments were pre-registered vs. exploratory?
Ignoring guardrails: Shipping wins that hurt stability. Self-check: Review error/crash/latency metrics per segment.
Simpson’s paradox: Trusting the overall average. Self-check: Compare segment effects to overall; if mixed signs, never ship blindly.
Mix shift: Comparing segments across days with traffic mix changes. Self-check: Use the same test window and consider weighting.

Exercises

Try the dataset below. Compute per-segment conversion, lift, SRM check, and a rollout recommendation.

Dataset: Signup Conversion

Segments: iOS_New, iOS_Returning, Android_New, Android_Returning

Show data

iOS_New: Control 4,000 users, 360 signups; Variant 4,200 users, 504 signups
iOS_Returning: Control 3,000 users, 420 signups; Variant 3,100 users, 403 signups
Android_New: Control 3,500 users, 350 signups; Variant 3,100 users, 279 signups
Android_Returning: Control 2,500 users, 325 signups; Variant 2,800 users, 336 signups

Compute conversion rates and lifts per segment.
Compute overall conversion and lift.
SRM check: Compare each segment’s share of users in control vs. variant. Flag any segment with >2pp difference.
Recommend: Global ship, segment-only ship, or iterate. Justify with numbers.

Checklist:
- Per-segment conversion and lift calculated
- Overall numbers calculated
- SRM check done per segment
- Decision noted with reasoning

Compare your work with the solution in the exercise section below.

Practical projects

Build a reusable notebook or spreadsheet that takes segment-level inputs and outputs lift, CIs, and SRM checks per segment.
Create a segment-aware dashboard for your growth or monetization experiments, including guardrails.
Run a follow-up targeted experiment on the best-performing segment to validate and size the opportunity.

Mini challenge

Your test shows +0.5% overall revenue/user, but Paid traffic is −3% and Organic is +2%. Paid makes up 30% of traffic and has higher ARPU customers. What do you do?

View a good answer

Ship to Organic only; run a tailored variant for Paid. Monitor guardrails for Paid closely. Consider pricing sensitivity or creative changes for Paid users, then re-test before global rollout.

Learning path

Before this: A/B test fundamentals (metrics, power, bias)
This lesson: Finding heterogeneous effects across segments
Next: Multiple testing control and confirmatory follow-ups

Next steps

Apply this workflow to your last experiment’s raw data.
Pre-register 3–5 meaningful segments for your next test.
Take the quick test below to cement your understanding.

Instructions

Using the dataset below, compute per-segment conversion rates, lifts, overall conversion and lift, perform an SRM check by segment, and make a rollout decision.

Show dataset

iOS_New: Control 4,000 users, 360 signups; Variant 4,200 users, 504 signups
iOS_Returning: Control 3,000 users, 420 signups; Variant 3,100 users, 403 signups
Android_New: Control 3,500 users, 350 signups; Variant 3,100 users, 279 signups
Android_Returning: Control 2,500 users, 325 signups; Variant 2,800 users, 336 signups

Step 1: Compute conversion = conversions / users for each cell.
Step 2: Compute lift per segment: (Variant - Control) / Control.
Step 3: Compute overall conversion for control and variant; then lift.
Step 4: SRM check: Compare segment share in control vs. variant. Flag if |difference| > 2pp.
Step 5: State your recommendation (global ship, segment-only, or iterate) with rationale.

Menu

Segment Level Analysis

Table of Contents