Why this matters in the job
Defining the experiment population is the foundation of trustworthy A/B tests. As a Product Analyst, you will be asked to:
- Scope who is eligible for a test so results reflect the intended users.
- Prevent contamination (users seeing both variants or treatment outside the window).
- Make results reproducible and explainable to product, engineering, and leadership.
- Protect fairness and comply with legal constraints (e.g., age, region restrictions).
- Power the experiment correctly by estimating the reachable sample size.
Typical real tasks you’ll handle
- Write inclusion/exclusion criteria for onboarding experiments.
- Choose the unit of randomization (user, device, session, account, market).
- Define the exposure window and baseline activity filters.
- Decide how to handle returning users, bots, and employees.
- Plan guardrails to catch cross-test interference.
Who this is for
- Product Analysts and Data Scientists designing or reviewing experiments.
- Product Managers who need to interpret experiment results.
- Engineers implementing experiment assignment.
Prerequisites
- Basic understanding of A/B testing concepts (control vs. treatment, randomization).
- Comfort with event data (users, sessions, events, attributes).
- Basic statistics (sampling, independence, measuring outcomes).
Concept explained simply
Experiment population definition answers: “Who exactly can be part of this experiment, when, and under what conditions?” It is more than just “all users.” It sets the rules for eligibility, assignment unit, timing, and exclusions so your results reflect the intended target users and are free of avoidable bias.
Mental model: The 3 rings
- Target population: The real-world users we care about (e.g., new iOS users in the US).
- Eligible population: Subset of target users who meet strict inclusion/exclusion rules (e.g., created account in last 7 days, not employees).
- Assigned sample: Those who actually get randomized during the assignment window.
Every experiment should state these rings explicitly.
Core definitions you’ll use
- Unit of randomization: The entity randomized (user, device, session, account, store, region).
- Inclusion criteria: Conditions that must be true to be eligible (platform, geography, lifecycle stage, baseline activity).
- Exclusion criteria: Conditions that disqualify users (prior exposure, employees, bots, legal restrictions).
- Exposure window: When assignment and first exposure can happen.
- Attribution window: How long you measure outcomes after exposure.
- De-duplication rule: Ensure a unit is assigned once and stays in that variant.
- Interference: One unit’s treatment affecting another’s outcomes (e.g., shared devices or teams).
Define your experiment population in 7 steps
- State the decision question. What business decision will the test inform? Mini task: Write one sentence: “We need to know if X improves Y for Z.”
- Define the target population. Platform, region, language, lifecycle stage. Mini task: List exact attributes (e.g., iOS, US, new users <= 7 days).
- Choose the unit of randomization. Prefer the smallest independent unit. If spillover likely, move up (e.g., team or account). Mini task: Write “Unit = ____ because ____.”
- Write inclusion criteria. Clear, measurable rules; specify data fields and time windows. Mini task: Draft 3–6 bullet rules.
- Write exclusion criteria. Prior exposure, employees, bots, legal restrictions, conflicting tests. Mini task: Draft 3+ bullet rules.
- Set timing windows. Assignment start/end, exposure window, attribution window. Mini task: Specify dates or relative windows (e.g., 14-day attribution).
- Define de-duplication and handling edge cases. Sticky assignment by unit ID, one-sided holdout if needed, cross-device policy. Mini task: Write the exact ID used and the cross-device rule.
Common pitfalls at each step
- Too broad target: Dilutes effect; narrow to the real decision scope.
- Wrong unit: Session-level when effect persists across sessions → contamination.
- Missing exclusion: Prior exposure users bias results toward null.
- Loose timing: Mixed seasonality or release cycles increase noise.
- No sticky assignment: Users switch variants across devices.
Worked examples
Example 1: New onboarding flow
Decision: Should we ship a shorter onboarding to improve Day-1 activation?
- Target population: New mobile app users in US/CA, English locale.
- Unit: User (stable user_id across devices once logged in).
- Inclusion: First app install or account within last 24 hours; mobile app v5.0+.
- Exclusion: Employees, test devices, users who saw onboarding before (prior exposure), bots.
- Timing: Assignment on first app open within 30 days of start date; 7-day attribution.
- De-duplication: Sticky by user_id; pre-login assignment stored and reconciled after login.
Example 2: Email reminder subject line
Decision: Does subject line A increase open rate among churn-risk users?
- Target: Users flagged churn-risk last 7 days.
- Unit: Email-address (hashed), fallback to user_id when unique.
- Inclusion: Has marketing consent; valid email; flagged churn-risk score ≥ threshold.
- Exclusion: Unsubscribed; bounced in last 30 days; employees; already opened similar campaign.
- Timing: Assignment when campaign audience is built; single-send exposure.
- De-duplication: One email address → one variant; suppression list honored.
Example 3: Search ranking tweak
Decision: Does a new ranking model improve CTR without harming latency?
- Target: Web search users globally excluding regions with legal limits.
- Unit: Session (to avoid cross-request interference and allow fast iteration).
- Inclusion: Desktop web; logged-in or cookie-consented; JS enabled.
- Exclusion: Bot traffic; blocked regions; employees.
- Timing: Assignment per session ID; attribution per session only.
- De-duplication: New session → new assignment; session sticky during session lifetime.
Example 4: Price test on subscription
Decision: Adjust monthly price for new subscribers in the UK.
- Target: New UK users hitting paywall for the first time.
- Unit: User (or Account) to avoid price-shopping across sessions.
- Inclusion: UK IP + billing country UK; first-ever paywall exposure; app v4.3+.
- Exclusion: Existing subscribers; staff; VPN anomalies; prior exposure to price variants.
- Timing: Assignment at first paywall view; 30-day conversion attribution.
- De-duplication: Sticky per user across platforms; server-side gating to prevent variant switching.
Pre-randomization checklist
- Decision question and target population are written in one sentence each.
- Unit of randomization chosen with a stated reason.
- Inclusion and exclusion criteria are measurable from available fields.
- Assignment, exposure, and attribution windows are explicit.
- Sticky assignment and cross-device policy defined.
- Conflicts with other live tests checked; guardrails set.
- Sample size and expected traffic validated against the target population.
Common mistakes and how to self-check
- Mistake: Mixing new and returning users when outcome is onboarding-related. Self-check: Confirm inclusion is “first-time only.”
- Mistake: Randomizing by session for long-lived effects. Self-check: Ask “Does treatment persist?” If yes, use user/account.
- Mistake: Prior exposure to similar features. Self-check: Add an exclusion on historical exposure flags.
- Mistake: Region/legal mismatch. Self-check: Verify geo and consent filters are enforceable.
- Mistake: Overlapping experiments. Self-check: Query if eligible users are enrolled in other tests with same surface.
- Mistake: Inconsistent IDs across platforms. Self-check: Define exact ID precedence (device_id → temp_id → user_id) and reconciliation.
Practical projects
- Draft a population spec for a homepage layout test and review it with a pretend PM. Include all seven steps and a data dictionary of used fields.
- Audit a past experiment (real or hypothetical) and list 5 improvements to eligibility, timing, or unit choice. Estimate the bias reduced.
- Create a “population template” document your team can reuse. Include fill-in sections and example phrasing.
Hands-on exercises
Do these before the test. They mirror the graded exercises below.
Exercise 1 (ex1): Define the population for a push notification experiment
Scenario: You plan to test a new push notification that reminds users about items left in their wish list.
- Platforms: iOS and Android apps.
- Goal metric: 7-day return rate and add-to-cart events.
- Constraints: Exclude users who disabled notifications; exclude employees and test devices; avoid users who received a similar push in the last 14 days.
Write:
- Target population.
- Unit of randomization and why.
- Inclusion and exclusion criteria.
- Exposure and attribution windows.
- De-duplication and cross-device rules.
Exercise 2 (ex2): Define the population for a paywall A/B test with a 14-day free trial
Scenario: You test a new paywall design for new US iOS users with a 14-day free trial.
- Goal metric: Trial start rate; secondary: Trial-to-paid conversion.
- Constraints: Only first-ever paywall exposure; exclude VPN masking; exclude users with past subscription on any platform.
Write:
- Target population.
- Unit of randomization and why.
- Inclusion and exclusion criteria.
- Exposure and attribution windows.
- De-duplication and cross-platform rules.
Mini challenge
Pick any recent feature idea and in 5 minutes draft the three rings (target, eligible, assigned) plus the unit of randomization. If you can’t state each in one sentence, your scope is probably unclear.
Quick Test
Available to everyone. Only logged-in users have their progress saved.
When ready, take the quick test below.
Next steps
- Apply the seven-step template to your next experiment plan.
- Peer-review a teammate’s population definition; check for the common mistakes above.
- Move on to sampling, power, and minimal detectable effect to ensure your defined population can reach sign-off timelines.
Learning path
- Before: Experiment goals and metrics; Baseline analysis.
- Now: Experiment population definition (this lesson).
- Next: Randomization strategy; Sample size and power; Exposure and attribution design; Guardrail metrics and monitoring.