luvv to helpDiscover the Best Free Online Tools
Topic 2 of 8

Randomization Checks

Learn Randomization Checks for free with explanations, exercises, and a quick test (for Product Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Who this is for

Product Analysts and experiment owners who launch or QA A/B tests and need to ensure groups are comparable before trusting results.

Prerequisites

  • Basic A/B testing concepts (variants, metrics, exposures).
  • Descriptive statistics (means, proportions, standard deviation).
  • Comfort reading simple statistical test outputs (t-test, chi-square).

Learning path

  • Understand what randomization checks are and why they matter.
  • Learn core checks: Sample Ratio Mismatch (SRM), covariate balance, missingness.
  • Run checks with simple formulas and thresholds.
  • Diagnose and act on issues without overreacting.
  • Practice with worked examples and a quick test.

Why this matters

Randomization checks tell you if your A/B test actually created comparable groups. If randomization failed or traffic was skewed, you can get misleading lifts. Real tasks where this matters:

  • QA a newly launched experiment for SRM within the first hours.
  • Verify device, geography, and traffic-source balance before reporting results.
  • Catch instrumentation bugs that send more qualified users to one variant.
  • Decide whether to continue, pause, or re-run a test after detecting imbalance.

Concept explained simply

Randomization checks are early diagnostic tests that confirm your A and B groups are similar on factors unrelated to the treatment. You compare pre-treatment attributes (e.g., device, country, prior usage) across variants using simple statistics. If differences are too large, either fix the assignment or adjust your analysis.

Mental model

Imagine two jars filled by flipping a fair coin for each marble. If the coin or the process is faulty, one jar ends up with more red marbles (certain user types). Randomization checks look inside the jars early to ensure both jars have similar marbles before judging which jar is heavier (treatment effect).

What to check

  • Sample Ratio Mismatch (SRM): Are observed allocations close to the planned split (e.g., 50/50)?
  • Covariate balance (pre-treatment): device, platform, country/region, traffic source, user tenure, pre-period activity/spend.
  • Missingness/eligibility: are data missing at similar rates across variants?
  • Exposure parity: are both groups eligible and exposed under the same rules?
Quick thresholds to remember
  • SRM: chi-square test on counts, often flag if p < 0.01.
  • Standardized Mean Difference (SMD) for numeric covariates: |SMD| < 0.1 generally acceptable.
  • Chi-square test for categorical distributions: large p-values mean balance; flag small p-values (e.g., p < 0.05) with practical judgment.

Step-by-step workflow

  1. Check SRM first. Compare observed counts to planned split using a chi-square goodness-of-fit test.
  2. Check covariate balance. For numeric variables (e.g., prior 7-day sessions), compute SMD; for categorical variables (e.g., device), use chi-square test.
  3. Check missingness. Compare missing rates of key fields across variants (proportion test).
  4. Investigate if flagged. Review assignment logic, eligibility filters, bot filters, rollout timing, and any targeting or holdouts.
  5. Decide. If issues are minor, proceed with covariate-adjusted analysis; if major (e.g., SRM, misrouting), pause and fix before continuing.

Worked examples

Example 1: Numeric covariate — SMD

Goal: Check balance on pre-experiment 7-day sessions.

  • Variant A: n=10,000, mean=4.10, sd=3.0
  • Variant B: n=10,050, mean=4.22, sd=3.1

Pooled SD ≈ sqrt(((9999*3.0^2)+(10049*3.1^2))/(10000+10050-2)) ≈ 3.05. SMD = (4.10−4.22)/3.05 ≈ −0.039. |SMD|=0.039 < 0.1 ⇒ balanced.

Interpretation

The groups have very similar prior engagement; any outcome difference is unlikely due to prior sessions.

Example 2: Categorical covariate — device (chi-square)

Counts:

  • A: Mobile=6,400, Desktop=3,200, Tablet=400
  • B: Mobile=6,300, Desktop=3,500, Tablet=250

Totals: A=10,000, B=10,050. Expected counts are proportional to totals by category. Running a chi-square test (3 categories, df=2) yields a p-value ≈ 0.06.

Interpretation

p≈0.06 does not strongly indicate imbalance. Combined with practical importance (e.g., is Desktop materially different in behavior?), you likely proceed.

Example 3: SRM

Planned 50/50 split. First 1 hour after launch: A=5,400 users, B=4,600 users (total 10,000). Under 50/50, expected each=5,000. Chi-square statistic = sum((obs-exp)^2/exp) = (400^2/5000) + (−400^2/5000) = 64/5 ≈ 12.8, p < 0.001 (df=1) ⇒ SRM flagged.

Interpretation

Pause and investigate: assignment code, feature flag bucketing, geo rollouts, bots, or eligibility filters.

Exercises

Do these before taking the Quick Test. Progress in the test is available to everyone; only logged-in users get saved progress.

  • Exercise ex1 mirrors the data and questions below.
  • Exercise ex2 focuses on diagnosis and action planning.
Checklist before you start
  • Identify planned split and total sample size.
  • List pre-treatment covariates to check (device, country, traffic source, prior usage).
  • Pick tests: chi-square for categorical, SMD for numeric, proportion test for missingness.
  • Decide thresholds (e.g., SRM p < 0.01, |SMD| < 0.1).

Common mistakes and self-check

  • Using post-treatment variables (e.g., conversions) for balance checks. Self-check: only use data from before exposure.
  • Overreacting to tiny p-values with huge sample sizes. Self-check: also review effect sizes (SMD) and practical impact.
  • Ignoring SRM because outcomes look good. Self-check: SRM questions the validity of all estimates.
  • Not re-checking after rollout stages. Self-check: re-run checks after major traffic shifts.
  • Multiple testing without context. Self-check: expect a few small p-values by chance; look for patterns and magnitude.

Practical projects

  • Build a reusable randomization check template: inputs (counts, means/SDs, category tables) and outputs (SRM p-value, SMDs, chi-square p-values, pass/fail flags).
  • Create a covariate balance dashboard for top geos, devices, and prior engagement bands.
  • Draft a runbook: what to do when SRM or imbalance is detected (contacts, logs to pull, decision tree).

Mini challenge

Your test shows SRM p=0.003 in the first 2 hours, driven by more traffic in B from one country. After a geo rollout completes, SRM disappears. What do you report? Write 3 bullet points: (1) finding, (2) cause hypothesis, (3) impact on analysis plan.

Next steps

  • Automate: schedule daily randomization checks for active experiments.
  • Augment: add covariate-adjusted estimators to reduce variance when balanced.
  • Document: include a “Randomization Checks” section in every experiment readout.

Quick Test

Ready to check your understanding? The Quick Test is available to everyone; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Use the aggregates below to assess balance.

Data
Planned split: 50/50
A: n=12,000
B: n=12,200

Numeric covariate (prior 7-day sessions):
A: mean=3.8, sd=2.9
B: mean=3.9, sd=3.0

Categorical covariate (device):
           A       B
Mobile   7,800   7,700
Desktop  3,600   4,000
Tablet     600     500

Missingness (country missing):
A: 4.2%   B: 4.7%
  • 1) Test SRM on counts (A=12,000, B=12,200). Flag if p < 0.01.
  • 2) Compute SMD for prior sessions. Use pooled SD.
  • 3) Run a chi-square test for device distribution.
  • 4) Compare missingness rates with a two-proportion check. Comment if the difference is likely material.
Expected Output
SRM: not flagged or borderline; SMD small (<0.1); chi-square p-value around moderate level; missingness difference small and likely acceptable.

Randomization Checks — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Randomization Checks?

AI Assistant

Ask questions about this tool