How to learn Randomization Checks for A/B Testing in Product Analyst for free

Who this is for

Product Analysts and experiment owners who launch or QA A/B tests and need to ensure groups are comparable before trusting results.

Prerequisites

Basic A/B testing concepts (variants, metrics, exposures).
Descriptive statistics (means, proportions, standard deviation).
Comfort reading simple statistical test outputs (t-test, chi-square).

Learning path

Understand what randomization checks are and why they matter.
Learn core checks: Sample Ratio Mismatch (SRM), covariate balance, missingness.
Run checks with simple formulas and thresholds.
Diagnose and act on issues without overreacting.
Practice with worked examples and a quick test.

Why this matters

Randomization checks tell you if your A/B test actually created comparable groups. If randomization failed or traffic was skewed, you can get misleading lifts. Real tasks where this matters:

QA a newly launched experiment for SRM within the first hours.
Verify device, geography, and traffic-source balance before reporting results.
Catch instrumentation bugs that send more qualified users to one variant.
Decide whether to continue, pause, or re-run a test after detecting imbalance.

Concept explained simply

Randomization checks are early diagnostic tests that confirm your A and B groups are similar on factors unrelated to the treatment. You compare pre-treatment attributes (e.g., device, country, prior usage) across variants using simple statistics. If differences are too large, either fix the assignment or adjust your analysis.

Mental model

Imagine two jars filled by flipping a fair coin for each marble. If the coin or the process is faulty, one jar ends up with more red marbles (certain user types). Randomization checks look inside the jars early to ensure both jars have similar marbles before judging which jar is heavier (treatment effect).

What to check

Sample Ratio Mismatch (SRM): Are observed allocations close to the planned split (e.g., 50/50)?
Covariate balance (pre-treatment): device, platform, country/region, traffic source, user tenure, pre-period activity/spend.
Missingness/eligibility: are data missing at similar rates across variants?
Exposure parity: are both groups eligible and exposed under the same rules?

Quick thresholds to remember

SRM: chi-square test on counts, often flag if p < 0.01.
Standardized Mean Difference (SMD) for numeric covariates: |SMD| < 0.1 generally acceptable.
Chi-square test for categorical distributions: large p-values mean balance; flag small p-values (e.g., p < 0.05) with practical judgment.

Step-by-step workflow

Check SRM first. Compare observed counts to planned split using a chi-square goodness-of-fit test.
Check covariate balance. For numeric variables (e.g., prior 7-day sessions), compute SMD; for categorical variables (e.g., device), use chi-square test.
Check missingness. Compare missing rates of key fields across variants (proportion test).
Investigate if flagged. Review assignment logic, eligibility filters, bot filters, rollout timing, and any targeting or holdouts.
Decide. If issues are minor, proceed with covariate-adjusted analysis; if major (e.g., SRM, misrouting), pause and fix before continuing.

Worked examples

Example 1: Numeric covariate — SMD

Goal: Check balance on pre-experiment 7-day sessions.

Variant A: n=10,000, mean=4.10, sd=3.0
Variant B: n=10,050, mean=4.22, sd=3.1

Pooled SD ≈ sqrt(((9999*3.0^2)+(10049*3.1^2))/(10000+10050-2)) ≈ 3.05. SMD = (4.10−4.22)/3.05 ≈ −0.039. |SMD|=0.039 < 0.1 ⇒ balanced.

Interpretation

The groups have very similar prior engagement; any outcome difference is unlikely due to prior sessions.

Example 2: Categorical covariate — device (chi-square)

Counts:

A: Mobile=6,400, Desktop=3,200, Tablet=400
B: Mobile=6,300, Desktop=3,500, Tablet=250

Totals: A=10,000, B=10,050. Expected counts are proportional to totals by category. Running a chi-square test (3 categories, df=2) yields a p-value ≈ 0.06.

Interpretation

p≈0.06 does not strongly indicate imbalance. Combined with practical importance (e.g., is Desktop materially different in behavior?), you likely proceed.

Example 3: SRM

Planned 50/50 split. First 1 hour after launch: A=5,400 users, B=4,600 users (total 10,000). Under 50/50, expected each=5,000. Chi-square statistic = sum((obs-exp)^2/exp) = (400^2/5000) + (−400^2/5000) = 64/5 ≈ 12.8, p < 0.001 (df=1) ⇒ SRM flagged.

Interpretation

Pause and investigate: assignment code, feature flag bucketing, geo rollouts, bots, or eligibility filters.

Exercises

Do these before taking the Quick Test. Progress in the test is available to everyone; only logged-in users get saved progress.

Exercise ex1 mirrors the data and questions below.
Exercise ex2 focuses on diagnosis and action planning.

Checklist before you start

Identify planned split and total sample size.
List pre-treatment covariates to check (device, country, traffic source, prior usage).
Pick tests: chi-square for categorical, SMD for numeric, proportion test for missingness.
Decide thresholds (e.g., SRM p < 0.01, |SMD| < 0.1).

Common mistakes and self-check

Using post-treatment variables (e.g., conversions) for balance checks. Self-check: only use data from before exposure.
Overreacting to tiny p-values with huge sample sizes. Self-check: also review effect sizes (SMD) and practical impact.
Ignoring SRM because outcomes look good. Self-check: SRM questions the validity of all estimates.
Not re-checking after rollout stages. Self-check: re-run checks after major traffic shifts.
Multiple testing without context. Self-check: expect a few small p-values by chance; look for patterns and magnitude.

Practical projects

Build a reusable randomization check template: inputs (counts, means/SDs, category tables) and outputs (SRM p-value, SMDs, chi-square p-values, pass/fail flags).
Create a covariate balance dashboard for top geos, devices, and prior engagement bands.
Draft a runbook: what to do when SRM or imbalance is detected (contacts, logs to pull, decision tree).

Mini challenge

Your test shows SRM p=0.003 in the first 2 hours, driven by more traffic in B from one country. After a geo rollout completes, SRM disappears. What do you report? Write 3 bullet points: (1) finding, (2) cause hypothesis, (3) impact on analysis plan.

Next steps

Automate: schedule daily randomization checks for active experiments.
Augment: add covariate-adjusted estimators to reduce variance when balanced.
Document: include a “Randomization Checks” section in every experiment readout.

Quick Test

Ready to check your understanding? The Quick Test is available to everyone; only logged-in users get saved progress.

Menu

Randomization Checks

Table of Contents

Who this is for

Prerequisites

Learning path

Why this matters

Concept explained simply

Mental model

What to check

Step-by-step workflow

Worked examples

Example 1: Numeric covariate — SMD

Example 2: Categorical covariate — device (chi-square)

Example 3: SRM

Exercises

Common mistakes and self-check

Practical projects

Mini challenge

Next steps

Quick Test

Practice Exercises

Run a quick randomization check on aggregates

Instructions

Expected Output

Diagnose and act on a suspected SRM

Randomization Checks — Quick Test

Have questions about Randomization Checks?

AI Assistant