How to learn Randomization Unit Selection for Experiment Design in Product Analyst for free

Why this matters

Choosing the right randomization unit (user, session, device, account, store, region, etc.) determines whether your A/B test measures the true causal effect. Pick too small a unit and users contaminate both groups; pick too large and your test becomes underpowered and slow. Product Analysts make these calls for experiments on pricing, onboarding, recommendations, notifications, and promotions.

Real tasks: design a test for a new homepage, estimate power when randomizing by store, prevent cross-device contamination, and document assignment logic for engineering.
Impact: credible results, faster decisions, fewer rollbacks, and safer launches.

Who this is for

Product Analysts and Data Scientists running experiments.
Product Managers who scope A/B tests and need credible metrics.
Engineers implementing assignment logic.

Prerequisites

Basic A/B testing concepts: control vs treatment, metrics, power, sample size.
Understanding of your product identities: user ID, account ID, device ID, store/region codes.

Learning path

Understand what a randomization unit is and why interference matters.
Apply the selection checklist to your product surface.
Estimate power changes from cluster-level randomization.
Implement persistent assignment and guard against contamination.
Practice with scenarios, then take the quick test.

Concept explained simply

The randomization unit is the entity you assign to variants (A or B). Examples: a user, a session, a device, a household, a store, or a region.

Your goal: choose the smallest unit that cleanly captures exposure while preventing spillovers between A and B. Spillovers happen when one unit’s treatment affects another unit’s outcome (network effects, multi-device usage, word-of-mouth). This violates the stability assumption and biases estimates.

Common units and when to use them

User: most product UI changes, recommendations, notifications.
Session: short-lived UX changes with no carryover (rarely ideal).
Device: when identity is device-bound (e.g., TV app) and cross-device is minimal.
Account/household: B2B features, shared subscriptions, family plans.
Store/region: offline promos, supply constraints, or legal constraints.
Time blocks (day/week): when implementation must switch globally but you can alternate blocks, being careful about time trends.

Mental model: cones of influence

Imagine each treated unit has a cone of influence. If cones overlap across treatment and control, you have contamination. Start with the smallest plausible unit (e.g., session), then move up (user → account → store → region) until cones no longer overlap in meaningful ways.

Choosing the unit: checklist and flow

Exposure uniqueness: Can a unit be exposed to both A and B? If yes, pick a larger unit.
Identity stability: Is the ID stable during the experiment window?
Interference risk: Are there cross-unit effects (referrals, word-of-mouth, shared devices)?
Sample size and power: Larger units reduce effective sample size; estimate the impact.
Implementation feasibility: Can engineering randomize at this unit and log it reliably?
Metric alignment: Does the unit match the metric (user-level metric → user-level unit is often best)?
Fairness/ethics/compliance: Especially for pricing, promotions, and regions.

Map exposure: What exactly changes, and who experiences it?
List candidate units: session, user, device, account, store, region.
Score candidates against the checklist.
Prototype assignment logic and logging.
Run a dry-run to check assignment stability and balance.

Worked examples

Example 1: New onboarding flow

Candidate units: session, user, device.
Pick: user. Onboarding spans multiple sessions; session-level risks cross-over.
Notes: persist assignment across sessions and devices; exclude users who completed onboarding before the test.

Example 2: Pricing on cart page

Candidate units: session, user, account, region.
Pick: user (B2C) or account (B2B). Session-level enables price shopping across sessions; region-level introduces extra variance and confounding.
Notes: document ethical review; ensure persistent assignment and guard against arbitrage.

Example 3: Store-level in-person promotion

Candidate units: customer, store, region.
Pick: store. The treatment is deployed at stores; customers often visit multiple times to the same store.
Notes: account for intra-store correlation; estimate design effect.

Example 4: Push notification timing

Candidate units: device, user.
Pick: user. Users can have multiple devices; device-level risks conflicting experiences.
Notes: throttle frequency; ensure quiet hours are respected in both groups.

Example 5: Referral incentives

Candidate units: user, ego-network (small social clusters), region.
Pick: cluster (ego-network or community) if peer spillovers are strong; otherwise user with explicit spillover measurement.
Notes: expect design effect; predefine how you attribute conversions to referrers.

Power and sample size implications

Cluster-level randomization increases variance because outcomes within a cluster are correlated. Use the design effect to adjust power:

Design effect (DE) ≈ 1 + (m - 1) × ICC

m = average cluster size (e.g., users per store)
ICC = intra-cluster correlation (0 to 1)

Example calculation

If m = 20 and ICC = 0.05, DE ≈ 1 + 19 × 0.05 = 1.95. You need about 1.95× the sample to achieve the same power as user-level randomization.

Practical tips

Reduce ICC by stratifying (blocking) on strong predictors of the outcome.
Increase the number of clusters rather than the size per cluster when possible.
Use longer test duration only if traffic can’t increase cluster count.

Implementation notes (no code)

Persistent assignment: hash a stable ID (e.g., user_id) with a salt to create a bucket. Never re-bucket mid-test.
Identity resolution: ensure cross-device mapping to the chosen unit; define fallback behavior for unknown users.
Stratification/blocking: split within strata (e.g., country, platform) to control variance and ensure balance.
Exposure logging: store unit_id, variant, timestamp, and key attributes at exposure time.
Guardrails: prevent multi-exposure to different variants after login, and document exceptions.

Exercises

Complete these before the quick test. Tip: write assumptions explicitly.

Exercise 1: Cross-device checkout button test
You plan an A/B test for a new checkout button style across web and mobile apps. Users often browse on mobile and purchase on desktop.
- Decide the randomization unit and justify it.
- List two steps to prevent contamination.
Exercise 2: Design effect and duration
You must randomize by store for an in-person promotion. You expect 30 customers per store during the test and ICC ≈ 0.04. Your user-level plan required 20,000 customers. Estimate the design effect, effective sample size, and whether you need to extend duration.

[ ] I picked the smallest unit that prevents cross-variant exposure.
[ ] I can explain why session-level is or isn’t acceptable.
[ ] I adjusted power for clustering (if applicable).
[ ] I documented identity resolution and assignment persistence.

Common mistakes and self-check

Mistake: Randomizing by session for features with memory (e.g., onboarding). Self-check: Will behavior carry over to future sessions?
Mistake: Device-level for multi-device users. Self-check: Can the same user see A on phone and B on desktop?
Mistake: Ignoring ICC for cluster tests. Self-check: Did you compute DE and adjust power?
Mistake: Re-bucketing mid-test. Self-check: Is assignment stable given ID merges or logins?
Mistake: Misaligned metrics. Self-check: Do you evaluate at the same level as randomization or use appropriate hierarchical models/aggregation?

Practical projects

Draft a unit selection memo for three upcoming tests (onboarding, pricing, notifications) with assumptions, risks, and DE estimates.
Design a stratified randomization plan for a store-level promotion (block by region and store size).
Run a dry-run: assign last week’s users to buckets and check balance across key covariates.

Next steps

Discuss your unit selection with engineering and PM to validate feasibility.
Set up monitoring for assignment stability and exposure logging.
When ready, take the quick test below. Note: the test is available to everyone; only logged-in users get saved progress.

Mini challenge

You’re testing a new recommendation carousel on the homepage. Users sometimes share accounts within a household, and recommendations may influence what others in the household watch. Choose the randomization unit and list two measurements you’d add to assess spillovers.

Menu

Randomization Unit Selection

Table of Contents

Why this matters

Who this is for

Prerequisites

Learning path

Concept explained simply

Choosing the unit: checklist and flow

Worked examples

Power and sample size implications

Implementation notes (no code)

Exercises

Common mistakes and self-check

Practical projects

Next steps

Mini challenge

Practice Exercises

Choose the right unit for a cross-device checkout button test

Instructions

Expected Output

Estimate design effect and duration for store-level test

Randomization Unit Selection — Quick Test

Have questions about Randomization Unit Selection?

AI Assistant