luvv to helpDiscover the Best Free Online Tools

A/B Testing Basics

Learn A/B Testing Basics for Data Analyst for free: roadmap, examples, subskills, and a skill exam.

Published: December 20, 2025 | Updated: December 20, 2025

Why A/B Testing matters for a Data Analyst

A/B testing is how Data Analysts turn ideas into measurable business impact. You help teams choose the right metric, size tests correctly, ensure clean randomization, monitor guardrails, and make confident decisions using statistical evidence. Mastering these basics lets you answer: Did the change work? By how much? Is it safe to ship?

Who this is for

  • Data Analysts who support product, growth, or marketing teams
  • People moving from reporting/BI into experimentation
  • Anyone who needs to interpret experiment results and communicate decisions

Prerequisites

  • Basic SQL: filtering, grouping, joins, simple window functions
  • Familiarity with core product metrics (conversion, CTR, retention)
  • Comfort with percentages, proportions, confidence intervals
  • A spreadsheet or a scripting language (Python/R) to run quick checks

Learning path

Step 1 — Frame the hypothesis

Write a clear, testable statement with an expected direction and minimum detectable effect (MDE).

Mini-task: Turn an idea into H0/H1

Example: Changing the signup button text from “Start” to “Get Started” will increase signup rate by at least 1.5 percentage points.

  • H0: No difference in signup rate
  • H1: Variant increases signup rate by ≥ 1.5 pp
  • Audience: New visitors on web

Step 2 — Select metrics

Pick one primary metric aligned to the goal and 1–3 guardrails that should not degrade (e.g., error rate, latency, unsubscribe rate).

Checklist: Good primary metric?
  • Directly reflects the goal
  • Sensitive to the change
  • Stable and well-tracked
  • Not easily gamed

Step 3 — Plan power and runtime

Estimate sample size using baseline rate, MDE, alpha (usually 0.05), and power (usually 0.8). Convert sample to runtime using daily eligible traffic and allocation.

Formula cheat sheet (two-proportion, rough)
Given baseline p, target uplift d (absolute), alpha=0.05 (Z=1.96), power=0.8 (Z=0.84), equal groups:
po = p; p1 = p + d
po_var = po*(1-po); p1_var = p1*(1-p1)
SE = sqrt(po_var + p1_var)
num = (1.96 + 0.84) * SE
n_per_group ≈ 2 * (num**2) / (d**2)

Step 4 — Design the experiment

Choose unit of randomization (user, session, geo), allocation (50/50 unless constrained), inclusion/exclusion rules, and duration. Document power, metrics, and stop rules.

Decision hints
  • User-level randomization for user experience changes
  • Session-level for UI changes with low carryover risk
  • Switchback or geo holdout when network effects or supply/demand coupling exist

Step 5 — Run and monitor quality

Check randomization balance and tracking daily. Watch guardrails; pause if they degrade significantly.

Invariant checks to run
  • Sample counts per variant near allocation (e.g., 50/50 ± 1–2%)
  • Stable device/geo/browser mix across variants
  • No sudden drops in key events (tracking health)

Step 6 — Analyze, interpret, decide

Compute effect size with confidence intervals, assess significance, consider guardrails, and recommend ship/iterate/stop. Document assumptions and caveats.

Readout template
  • Hypothesis and design summary
  • Primary metric effect (point estimate, CI, p-value)
  • Guardrails and key secondary metrics
  • Data quality notes
  • Decision and rationale
  • Follow-ups (iterations, further tests)

Worked examples

1) Framing hypothesis + metrics

Idea: Move “Apply coupon” earlier in checkout.

  • Primary: Purchase conversion rate
  • Guardrails: Refund rate, page load time p95
  • H1: Variant increases purchase rate by ≥ 0.8 pp

2) SQL: Compute conversion and uplift

-- events table: user_id, variant ('A','B'), event_date, converted (0/1)
WITH agg AS (
  SELECT variant,
         COUNT(*) AS users,
         SUM(converted) AS conv
  FROM sessions
  WHERE is_eligible = 1
  GROUP BY variant
), rates AS (
  SELECT a.variant,
         a.conv * 1.0 / a.users AS cr
  FROM agg a
)
SELECT rA.cr AS cr_A, rB.cr AS cr_B,
       (rB.cr - rA.cr) AS abs_diff,
       (rB.cr - rA.cr) / NULLIF(rA.cr,0) AS rel_diff
FROM rates rA
JOIN rates rB ON rA.variant = 'A' AND rB.variant = 'B';
Add a simple 95% CI for difference
-- Using normal approximation for proportions
WITH agg AS (
  SELECT variant, COUNT(*) AS n, SUM(converted) AS x
  FROM sessions WHERE is_eligible=1 GROUP BY variant
), stats AS (
  SELECT 
    MAX(CASE WHEN variant='A' THEN x*1.0/n END) AS pA,
    MAX(CASE WHEN variant='A' THEN n END) AS nA,
    MAX(CASE WHEN variant='B' THEN x*1.0/n END) AS pB,
    MAX(CASE WHEN variant='B' THEN n END) AS nB
  FROM agg
)
SELECT 
  pB - pA AS diff,
  1.96 * SQRT( (pA*(1-pA))/nA + (pB*(1-pB))/nB ) AS margin,
  (pB - pA) - 1.96 * SQRT( (pA*(1-pA))/nA + (pB*(1-pB))/nB ) AS ci_low,
  (pB - pA) + 1.96 * SQRT( (pA*(1-pA))/nA + (pB*(1-pB))/nB ) AS ci_high
FROM stats;

Note: Normal approximation works well for large samples and rates not too close to 0 or 1.

3) Randomization check

-- Compare device mix across variants
SELECT variant, device_type, COUNT(*) AS ct
FROM sessions
WHERE is_eligible=1
GROUP BY variant, device_type;
Quick chi-square in Python (copy data from SQL)
# counts is a 2D list [[A_mobile, A_desktop, ...],[B_mobile, B_desktop, ...]]
import numpy as np
from math import isfinite

def chi2_independence(counts):
    counts = np.array(counts, dtype=float)
    row_sums = counts.sum(axis=1, keepdims=True)
    col_sums = counts.sum(axis=0, keepdims=True)
    total = counts.sum()
    expected = row_sums @ col_sums / total
    chi2 = ((counts - expected)**2 / expected).sum()
    # For a simple threshold, df=(r-1)*(c-1). Compare to 3.84 for df=1 @ 0.05
    return chi2

If device mix is imbalanced, investigate assignment logic or eligibility filters before continuing.

4) Sample size and runtime (rough)

Baseline conversion p = 0.10, MDE d = 0.015, alpha 0.05, power 0.8, equal groups.

p = 0.10; d = 0.015
po = p; p1 = p + d
import math
po_var = po*(1-po); p1_var = p1*(1-p1)
SE = math.sqrt(po_var + p1_var)
num = (1.96 + 0.84) * SE
n_per_group = int( 2 * (num**2) / (d**2) )
print(n_per_group)  # ≈ 12,000 per group (rough)

With 20k eligible users/day, 60% exposure, 2 variants: per group per day ≈ 20k * 0.6 / 2 = 6k. Runtime ≈ 2 days after steady-state.

5) Readout decision

Primary: +0.4 pp, 95% CI [−0.1, +0.9], p=0.11. Guardrail (error rate): +0.2 pp, p=0.02.

Decision: Do not ship. The benefit is statistically uncertain and a guardrail shows significant degradation. Iterate to fix errors and retest.

Drills and exercises

  • Write H0/H1 and MDE for two ideas you’ve heard this week.
  • Pick a primary metric and justify why it’s sensitive and aligned.
  • List two guardrails relevant to your product area.
  • Query last month’s eligible traffic and compute days to reach 10k per group at 50/50.
  • Run a randomization balance check across device, geo, and browser.
  • Calculate a 95% CI for a proportion difference from a past test.
  • Explain a p-value and a confidence interval to a non-technical partner.
  • Identify one potential bias (novelty, seasonality, contamination) in a recent test.
  • Draft an experiment design doc for a small UI change.
  • Practice a 3-slide readout: context, results, decision.

Mini project: End-to-end experiment

Scenario: You’re testing a new onboarding tooltip.

  1. Frame hypothesis with audience and MDE.
  2. Choose primary (activation rate D1) and guardrails (time on task, error rate).
  3. Estimate sample size and runtime from last week’s traffic.
  4. Define eligibility and unit of randomization; document stop rules.
  5. Run invariant checks daily for 5 days; log any anomalies.
  6. Analyze results: effect size, CI, significance, guardrails.
  7. Write a decision memo: ship/iterate/stop + follow-ups.
Acceptance criteria
  • Clear H0/H1 with MDE
  • Primary metric and 2 guardrails with rationale
  • Sample size math shown
  • SQL or spreadsheet used for analysis
  • Readout includes CI and decision

Practical projects

  • Optimize email subject line: click-through as primary, unsubscribe as guardrail.
  • Checkout UI change: purchase conversion primary, latency and errors as guardrails.
  • Recommendation ranking tweak: CTR primary, dwell time and complaint rate guardrails.

Common mistakes and debugging tips

  • Peeking too often: Plan interim looks or use sequential methods; otherwise wait for pre-planned sample.
  • Underpowered tests: Increase MDE or duration; stack-rank ideas to test larger expected effects first.
  • Poor randomization: Verify assignment code; ensure consistent user bucketing and eligibility filters.
  • Metric mismatch: If the change affects discovery, avoid using deep-funnel metrics as primary.
  • Seasonality or events: Avoid launches during holidays; if unavoidable, ensure both variants see the same periods.
  • Novelty and learning effects: Run long enough to pass initial novelty; consider ramping traffic.
  • Contamination: Prevent users seeing both variants; prefer user-level bucketing.
  • Tracking bugs: Monitor event volumes; keep a stable set of invariant events.

Subskills

  • Hypothesis Framing — Turn ideas into testable H0/H1 with direction and MDE.
  • Primary Metric Selection — Choose one metric tightly aligned to the goal.
  • Guardrail Metrics — Define metrics that must not degrade (safety/quality).
  • Randomization Basics — Assign units consistently and check balance.
  • Sample Size and Power Basics — Plan detectable effects with alpha/power.
  • Running Time Estimation — Convert sample needs into days with traffic.
  • Experiment Design Basics — Unit, allocation, eligibility, and duration.
  • Data Quality Checks — Track invariants and detect logging issues.
  • Statistical Significance Basics — Interpret p-values and Type I/II errors.
  • Confidence Intervals Interpretation — Read effect ranges and uncertainty.
  • Multiple Testing Awareness — Avoid false positives across many looks/segments.
  • Experiment Readout and Decision Making — Communicate results and actions.
  • Common Pitfalls and Biases — Spot novelty, seasonality, contamination.

Next steps

  • Practice with the mini project and at least one practical project.
  • Take the skill exam below to check your readiness. Everyone can take the exam; logged-in learners get progress saved.
  • Once confident, continue to more advanced experimentation topics (e.g., sequential testing, CUPED, geo experiments).

A/B Testing Basics — Skill Exam

Complete this exam to check your readiness in A/B Testing Basics. There is no time limit. You can retry as many times as you like. Everyone can take the exam; if you are logged in, your progress and best score will be saved automatically.

14 questions70% to pass

Have questions about A/B Testing Basics?

AI Assistant

Ask questions about this tool