Why this matters
Choosing the right experimental unit and randomization scheme is the foundation of trustworthy A/B tests. As a Data Scientist, you will be asked to:
- Decide whether to randomize by user, session, device, household, store, city, or time.
- Prevent spillovers (interference) and contamination between groups.
- Balance covariates, detect assignment issues, and ensure results are analyzable.
- Communicate trade-offs between precision, power, and practicality.
Concept explained simply
Two questions define solid experiments:
- Unit selection: Who or what receives the treatment? (e.g., a user, a store, a city)
- Randomization: How do we assign units to groups in a fair, reproducible way?
Pick the smallest unit that:
- Experiences the treatment consistently (exposure is well-defined).
- Does not affect other unitsâ outcomes (no or acceptable interference).
- Matches how outcomes are measured (analysis unit aligns with assignment or is properly modeled).
Mental model
- Who is treated? The entity that actually experiences the variant.
- Where can interference flow? Within device, across devices for a user, across people in a household, across stores in a region, across friends in a network.
- What do we measure? Choose metrics at or above the assignment level to avoid bias or use models that account for clustering.
- How does randomness happen? Deterministic hashing or random draws that assign each unit to treatment or control in a reproducible, sticky way.
Core techniques
- User-level randomization: Best for logged-in, cross-device experiences.
- Session/device/cookie-level randomization: Use only if exposure is session-bound and cross-session contamination is unlikely.
- Cluster randomization (e.g., by household, store, city): Use when within-cluster spillovers are strong.
- Blocked/stratified randomization: Balance important covariates (e.g., platform, country, pre-period activity) before assignment.
- Sticky assignment: A unit always gets the same variant across the test window.
- SRM checks: Sample Ratio Mismatch indicates randomization or tracking problems.
Worked examples
Example 1: Free shipping banner on an e-commerce site
Situation: Banner appears for logged-in users across web and app.
- Unit: User account (to stay consistent across devices).
- Randomization: Deterministic hash of user_id to buckets, 50/50 split.
- Risks: Logged-out traffic. Mitigation: For anonymous users, either exclude from test or use cookie-level with short duration and clear analysis separation.
- Outcome: Per-user conversion rate and revenue per user over test window.
Example 2: Notification send-time experiment
Situation: Compare two send schedules for push notifications.
- Unit: User (schedules influence multiple days; per-session assignment would cross-contaminate).
- Randomization: User-level hashing to Schedule A or B, sticky for entire test.
- Outcome: Per-user opens, conversions, and opt-out rate during the test.
- Note: Repeated measures per user are aggregated at user-level for analysis.
Example 3: Price change with potential arbitrage
Situation: Price differences may leak across users in the same location.
- Unit: City (cluster randomization).
- Randomization: Randomly assign cities to treatment/control with stratification by pre-period revenue and region.
- Outcome: City-level revenue and units sold.
- Trade-off: Fewer clusters reduce power; account for design effect and use pre-period covariates to improve precision.
How to decide your unit and randomization
- Map exposure: Where does the treatment actually touch the user/system?
- List interference paths: Same user across devices? Users influencing each other? Shared caches?
- Choose the smallest safe unit: Avoid interference and ensure consistent exposure.
- Make assignment sticky: Deterministic and reproducible through the whole test.
- Balance covariates: Block/stratify on key variables (platform, country, activity).
- Plan analysis: Aggregate to the assignment level or use appropriate clustered models.
- Add guardrails: Monitor SRM and key health metrics.
Common mistakes and self-check
- Mixing assignment and analysis units: Analyzing per-session when randomizing by user inflates Type I error. Self-check: Aggregate metrics at user-level if assignment is by user.
- Non-sticky assignment: Users switch variants across sessions. Self-check: Verify a unitâs variant is constant over time.
- Ignoring spillovers: Friends, households, or stores influence each other. Self-check: Sketch likely interference paths; consider cluster randomization.
- No stratification: Imbalance on platform or country increases variance. Self-check: Compare pre-period covariates across groups before launch.
- Undefined exposure window: Partial exposure leads to diluted effects. Self-check: Define inclusion criteria (e.g., active users during test period).
- Sample Ratio Mismatch (SRM) not monitored: Assignment or tracking bugs go unnoticed. Self-check: Run chi-square test for expected allocation shares.
- Underestimating cluster variance: Using individual-level formulas for cluster designs. Self-check: Apply design effect = 1 + (m â 1)Ï.
- Forgetting repeated measures correlation: Per-event analysis pretends observations are independent. Self-check: Aggregate per unit or use cluster-robust methods.
Exercises
Complete these in order. Then check your answers below the tasks.
Exercise 1: Choose the unit and randomization
A music app tests a new playlist layout that persists across app and web. Some users are logged-in; some are anonymous. Define:
- Your assignment unit(s) and handling for anonymous visitors.
- How you ensure sticky assignment.
- Primary analysis unit and key guardrails.
Hints
- Think about cross-device consistency.
- Decide whether to exclude or separately handle anonymous traffic.
- Guardrails often include SRM and platform-level balance.
Show solution
Suggested answer:
- Unit: Logged-in users randomized by user_id. Anonymous traffic either excluded or randomized by cookie with separate analysis.
- Sticky assignment: Deterministic hash of user_id (or cookie_id) into buckets; same value every visit.
- Analysis: Per-user outcomes (e.g., weekly listening minutes) for logged-in cohort; anonymous cohort analyzed separately or excluded.
- Guardrails: SRM check, platform mix balance (iOS/Android/Web), app crash rate, opt-out/uninstall rate.
Exercise 2: Cluster design effect
You randomize by city. Average users per city m = 500. Intra-cluster correlation Ï = 0.02. Compute the design effect and describe how it impacts required sample size.
Hints
- Use design effect = 1 + (m â 1)Ï.
- Design effect inflates variance; required N scales roughly by this factor.
Show solution
Compute: 1 + (500 â 1)*0.02 = 1 + 499*0.02 = 1 + 9.98 = 10.98.
Impact: You need about 10.98Ă the sample (or time) compared to an individual-level design for the same detectable effect size and power.
Exercise checklist
- I stated a clear unit aligned with exposure.
- I ensured sticky assignment and reproducibility.
- I identified interference and justified cluster vs. individual randomization.
- I aligned analysis with assignment or planned clustered methods.
- I considered stratification and SRM monitoring.
Practical projects
- Project 1: Write a one-page randomization plan for three scenarios: UI layout change (user-level), store signage change (store-level), regional pricing (city-level). Include unit, randomization, stratification, analysis unit, guardrails.
- Project 2: Build a mock assignment table (in a spreadsheet) using a hash-like deterministic rule for 10,000 synthetic users; verify stickiness and 50/50 balance by platform.
- Project 3: For a cluster test with Ï values {0.005, 0.02, 0.05} and m = 300, compute design effects and rewrite your power assumptions accordingly.
Learning path
- Hypotheses and outcome metrics.
- Randomization and unit selection (this page).
- Blocking/stratification and guardrail metrics.
- Power, MDE, and sample size with cluster adjustments when needed.
- Execution playbook: instrumentation, SRM monitoring, and data QA.
- Analysis: aggregation, variance estimation, and cluster-robust methods.
- Sequential testing and test governance.
- Advanced: network experiments and interference-aware designs.
Mini tasks
- List two potential interference paths in your current product and how you would block them.
- Draft a one-sentence exposure definition for your next experiment.
- Pick one covariate to stratify on and explain why it matters for variance.
Quick test info
The quick test at the end is available to everyone; only logged-in users get saved progress.
Next steps
- Apply these principles to your next planned experiment and document unit, randomization, and analysis alignment.
- Prepare a short checklist your team can reuse before launching any test.
- Move on to blocking/stratification and power analysis to tighten precision.
Mini challenge
Your marketplace launches a new âbundle discountâ that can be seen by buyers and sellers. Buyers and sellers often interact repeatedly within a city. Propose:
- The experimental unit (and why).
- Your randomization approach (include stratification if any).
- Primary analysis unit/metrics and how you will handle interference.
- Guardrails and SRM plan.
Considerations
- Cross-role spillovers (buyer-seller) and geographic clustering.
- Design effect and number of clusters.
- Pre-period covariate balance to improve precision.