Why this matters
Choosing the right randomization unit (user, session, device, account, store, region, etc.) determines whether your A/B test measures the true causal effect. Pick too small a unit and users contaminate both groups; pick too large and your test becomes underpowered and slow. Product Analysts make these calls for experiments on pricing, onboarding, recommendations, notifications, and promotions.
- Real tasks: design a test for a new homepage, estimate power when randomizing by store, prevent cross-device contamination, and document assignment logic for engineering.
- Impact: credible results, faster decisions, fewer rollbacks, and safer launches.
Who this is for
- Product Analysts and Data Scientists running experiments.
- Product Managers who scope A/B tests and need credible metrics.
- Engineers implementing assignment logic.
Prerequisites
- Basic A/B testing concepts: control vs treatment, metrics, power, sample size.
- Understanding of your product identities: user ID, account ID, device ID, store/region codes.
Learning path
- Understand what a randomization unit is and why interference matters.
- Apply the selection checklist to your product surface.
- Estimate power changes from cluster-level randomization.
- Implement persistent assignment and guard against contamination.
- Practice with scenarios, then take the quick test.
Concept explained simply
The randomization unit is the entity you assign to variants (A or B). Examples: a user, a session, a device, a household, a store, or a region.
Your goal: choose the smallest unit that cleanly captures exposure while preventing spillovers between A and B. Spillovers happen when one unit’s treatment affects another unit’s outcome (network effects, multi-device usage, word-of-mouth). This violates the stability assumption and biases estimates.
Common units and when to use them
- User: most product UI changes, recommendations, notifications.
- Session: short-lived UX changes with no carryover (rarely ideal).
- Device: when identity is device-bound (e.g., TV app) and cross-device is minimal.
- Account/household: B2B features, shared subscriptions, family plans.
- Store/region: offline promos, supply constraints, or legal constraints.
- Time blocks (day/week): when implementation must switch globally but you can alternate blocks, being careful about time trends.
Mental model: cones of influence
Imagine each treated unit has a cone of influence. If cones overlap across treatment and control, you have contamination. Start with the smallest plausible unit (e.g., session), then move up (user → account → store → region) until cones no longer overlap in meaningful ways.
Choosing the unit: checklist and flow
- Exposure uniqueness: Can a unit be exposed to both A and B? If yes, pick a larger unit.
- Identity stability: Is the ID stable during the experiment window?
- Interference risk: Are there cross-unit effects (referrals, word-of-mouth, shared devices)?
- Sample size and power: Larger units reduce effective sample size; estimate the impact.
- Implementation feasibility: Can engineering randomize at this unit and log it reliably?
- Metric alignment: Does the unit match the metric (user-level metric → user-level unit is often best)?
- Fairness/ethics/compliance: Especially for pricing, promotions, and regions.
- Map exposure: What exactly changes, and who experiences it?
- List candidate units: session, user, device, account, store, region.
- Score candidates against the checklist.
- Prototype assignment logic and logging.
- Run a dry-run to check assignment stability and balance.
Worked examples
Example 1: New onboarding flow
- Candidate units: session, user, device.
- Pick: user. Onboarding spans multiple sessions; session-level risks cross-over.
- Notes: persist assignment across sessions and devices; exclude users who completed onboarding before the test.
Example 2: Pricing on cart page
- Candidate units: session, user, account, region.
- Pick: user (B2C) or account (B2B). Session-level enables price shopping across sessions; region-level introduces extra variance and confounding.
- Notes: document ethical review; ensure persistent assignment and guard against arbitrage.
Example 3: Store-level in-person promotion
- Candidate units: customer, store, region.
- Pick: store. The treatment is deployed at stores; customers often visit multiple times to the same store.
- Notes: account for intra-store correlation; estimate design effect.
Example 4: Push notification timing
- Candidate units: device, user.
- Pick: user. Users can have multiple devices; device-level risks conflicting experiences.
- Notes: throttle frequency; ensure quiet hours are respected in both groups.
Example 5: Referral incentives
- Candidate units: user, ego-network (small social clusters), region.
- Pick: cluster (ego-network or community) if peer spillovers are strong; otherwise user with explicit spillover measurement.
- Notes: expect design effect; predefine how you attribute conversions to referrers.
Power and sample size implications
Cluster-level randomization increases variance because outcomes within a cluster are correlated. Use the design effect to adjust power:
Design effect (DE) ≈ 1 + (m - 1) × ICC
- m = average cluster size (e.g., users per store)
- ICC = intra-cluster correlation (0 to 1)
Example calculation
If m = 20 and ICC = 0.05, DE ≈ 1 + 19 × 0.05 = 1.95. You need about 1.95× the sample to achieve the same power as user-level randomization.
Practical tips
- Reduce ICC by stratifying (blocking) on strong predictors of the outcome.
- Increase the number of clusters rather than the size per cluster when possible.
- Use longer test duration only if traffic can’t increase cluster count.
Implementation notes (no code)
- Persistent assignment: hash a stable ID (e.g., user_id) with a salt to create a bucket. Never re-bucket mid-test.
- Identity resolution: ensure cross-device mapping to the chosen unit; define fallback behavior for unknown users.
- Stratification/blocking: split within strata (e.g., country, platform) to control variance and ensure balance.
- Exposure logging: store unit_id, variant, timestamp, and key attributes at exposure time.
- Guardrails: prevent multi-exposure to different variants after login, and document exceptions.
Exercises
Complete these before the quick test. Tip: write assumptions explicitly.
-
Exercise 1: Cross-device checkout button test
You plan an A/B test for a new checkout button style across web and mobile apps. Users often browse on mobile and purchase on desktop.
- Decide the randomization unit and justify it.
- List two steps to prevent contamination.
-
Exercise 2: Design effect and duration
You must randomize by store for an in-person promotion. You expect 30 customers per store during the test and ICC ≈ 0.04. Your user-level plan required 20,000 customers. Estimate the design effect, effective sample size, and whether you need to extend duration.
- [ ] I picked the smallest unit that prevents cross-variant exposure.
- [ ] I can explain why session-level is or isn’t acceptable.
- [ ] I adjusted power for clustering (if applicable).
- [ ] I documented identity resolution and assignment persistence.
Common mistakes and self-check
- Mistake: Randomizing by session for features with memory (e.g., onboarding). Self-check: Will behavior carry over to future sessions?
- Mistake: Device-level for multi-device users. Self-check: Can the same user see A on phone and B on desktop?
- Mistake: Ignoring ICC for cluster tests. Self-check: Did you compute DE and adjust power?
- Mistake: Re-bucketing mid-test. Self-check: Is assignment stable given ID merges or logins?
- Mistake: Misaligned metrics. Self-check: Do you evaluate at the same level as randomization or use appropriate hierarchical models/aggregation?
Practical projects
- Draft a unit selection memo for three upcoming tests (onboarding, pricing, notifications) with assumptions, risks, and DE estimates.
- Design a stratified randomization plan for a store-level promotion (block by region and store size).
- Run a dry-run: assign last week’s users to buckets and check balance across key covariates.
Next steps
- Discuss your unit selection with engineering and PM to validate feasibility.
- Set up monitoring for assignment stability and exposure logging.
- When ready, take the quick test below. Note: the test is available to everyone; only logged-in users get saved progress.
Mini challenge
You’re testing a new recommendation carousel on the homepage. Users sometimes share accounts within a household, and recommendations may influence what others in the household watch. Choose the randomization unit and list two measurements you’d add to assess spillovers.