How to learn A/B Testing Basics for Data Analyst for free

Why A/B Testing matters for a Data Analyst

A/B testing is how Data Analysts turn ideas into measurable business impact. You help teams choose the right metric, size tests correctly, ensure clean randomization, monitor guardrails, and make confident decisions using statistical evidence. Mastering these basics lets you answer: Did the change work? By how much? Is it safe to ship?

Who this is for

Data Analysts who support product, growth, or marketing teams
People moving from reporting/BI into experimentation
Anyone who needs to interpret experiment results and communicate decisions

Prerequisites

Basic SQL: filtering, grouping, joins, simple window functions
Familiarity with core product metrics (conversion, CTR, retention)
Comfort with percentages, proportions, confidence intervals
A spreadsheet or a scripting language (Python/R) to run quick checks

Learning path

Step 1 — Frame the hypothesis

Write a clear, testable statement with an expected direction and minimum detectable effect (MDE).

Mini-task: Turn an idea into H0/H1

Example: Changing the signup button text from “Start” to “Get Started” will increase signup rate by at least 1.5 percentage points.

H0: No difference in signup rate
H1: Variant increases signup rate by ≥ 1.5 pp
Audience: New visitors on web

Step 2 — Select metrics

Pick one primary metric aligned to the goal and 1–3 guardrails that should not degrade (e.g., error rate, latency, unsubscribe rate).

Checklist: Good primary metric?

Directly reflects the goal
Sensitive to the change
Stable and well-tracked
Not easily gamed

Step 3 — Plan power and runtime

Estimate sample size using baseline rate, MDE, alpha (usually 0.05), and power (usually 0.8). Convert sample to runtime using daily eligible traffic and allocation.

Formula cheat sheet (two-proportion, rough)

Given baseline p, target uplift d (absolute), alpha=0.05 (Z=1.96), power=0.8 (Z=0.84), equal groups:
po = p; p1 = p + d
po_var = po*(1-po); p1_var = p1*(1-p1)
SE = sqrt(po_var + p1_var)
num = (1.96 + 0.84) * SE
n_per_group ≈ 2 * (num**2) / (d**2)

Step 4 — Design the experiment

Choose unit of randomization (user, session, geo), allocation (50/50 unless constrained), inclusion/exclusion rules, and duration. Document power, metrics, and stop rules.

Decision hints

User-level randomization for user experience changes
Session-level for UI changes with low carryover risk
Switchback or geo holdout when network effects or supply/demand coupling exist

Step 5 — Run and monitor quality

Check randomization balance and tracking daily. Watch guardrails; pause if they degrade significantly.

Invariant checks to run

Sample counts per variant near allocation (e.g., 50/50 ± 1–2%)
Stable device/geo/browser mix across variants
No sudden drops in key events (tracking health)

Step 6 — Analyze, interpret, decide

Compute effect size with confidence intervals, assess significance, consider guardrails, and recommend ship/iterate/stop. Document assumptions and caveats.

Readout template

Hypothesis and design summary
Primary metric effect (point estimate, CI, p-value)
Guardrails and key secondary metrics
Data quality notes
Decision and rationale
Follow-ups (iterations, further tests)

Worked examples

1) Framing hypothesis + metrics

Idea: Move “Apply coupon” earlier in checkout.

Primary: Purchase conversion rate
Guardrails: Refund rate, page load time p95
H1: Variant increases purchase rate by ≥ 0.8 pp

2) SQL: Compute conversion and uplift

-- events table: user_id, variant ('A','B'), event_date, converted (0/1)
WITH agg AS (
  SELECT variant,
         COUNT(*) AS users,
         SUM(converted) AS conv
  FROM sessions
  WHERE is_eligible = 1
  GROUP BY variant
), rates AS (
  SELECT a.variant,
         a.conv * 1.0 / a.users AS cr
  FROM agg a
)
SELECT rA.cr AS cr_A, rB.cr AS cr_B,
       (rB.cr - rA.cr) AS abs_diff,
       (rB.cr - rA.cr) / NULLIF(rA.cr,0) AS rel_diff
FROM rates rA
JOIN rates rB ON rA.variant = 'A' AND rB.variant = 'B';

Add a simple 95% CI for difference

-- Using normal approximation for proportions
WITH agg AS (
  SELECT variant, COUNT(*) AS n, SUM(converted) AS x
  FROM sessions WHERE is_eligible=1 GROUP BY variant
), stats AS (
  SELECT 
    MAX(CASE WHEN variant='A' THEN x*1.0/n END) AS pA,
    MAX(CASE WHEN variant='A' THEN n END) AS nA,
    MAX(CASE WHEN variant='B' THEN x*1.0/n END) AS pB,
    MAX(CASE WHEN variant='B' THEN n END) AS nB
  FROM agg
)
SELECT 
  pB - pA AS diff,
  1.96 * SQRT( (pA*(1-pA))/nA + (pB*(1-pB))/nB ) AS margin,
  (pB - pA) - 1.96 * SQRT( (pA*(1-pA))/nA + (pB*(1-pB))/nB ) AS ci_low,
  (pB - pA) + 1.96 * SQRT( (pA*(1-pA))/nA + (pB*(1-pB))/nB ) AS ci_high
FROM stats;

Note: Normal approximation works well for large samples and rates not too close to 0 or 1.

3) Randomization check

-- Compare device mix across variants
SELECT variant, device_type, COUNT(*) AS ct
FROM sessions
WHERE is_eligible=1
GROUP BY variant, device_type;

Quick chi-square in Python (copy data from SQL)

# counts is a 2D list [[A_mobile, A_desktop, ...],[B_mobile, B_desktop, ...]]
import numpy as np
from math import isfinite

def chi2_independence(counts):
    counts = np.array(counts, dtype=float)
    row_sums = counts.sum(axis=1, keepdims=True)
    col_sums = counts.sum(axis=0, keepdims=True)
    total = counts.sum()
    expected = row_sums @ col_sums / total
    chi2 = ((counts - expected)**2 / expected).sum()
    # For a simple threshold, df=(r-1)*(c-1). Compare to 3.84 for df=1 @ 0.05
    return chi2

If device mix is imbalanced, investigate assignment logic or eligibility filters before continuing.

4) Sample size and runtime (rough)

Baseline conversion p = 0.10, MDE d = 0.015, alpha 0.05, power 0.8, equal groups.

p = 0.10; d = 0.015
po = p; p1 = p + d
import math
po_var = po*(1-po); p1_var = p1*(1-p1)
SE = math.sqrt(po_var + p1_var)
num = (1.96 + 0.84) * SE
n_per_group = int( 2 * (num**2) / (d**2) )
print(n_per_group)  # ≈ 12,000 per group (rough)

With 20k eligible users/day, 60% exposure, 2 variants: per group per day ≈ 20k * 0.6 / 2 = 6k. Runtime ≈ 2 days after steady-state.

5) Readout decision

Primary: +0.4 pp, 95% CI [−0.1, +0.9], p=0.11. Guardrail (error rate): +0.2 pp, p=0.02.

Decision: Do not ship. The benefit is statistically uncertain and a guardrail shows significant degradation. Iterate to fix errors and retest.

Drills and exercises

Write H0/H1 and MDE for two ideas you’ve heard this week.
Pick a primary metric and justify why it’s sensitive and aligned.
List two guardrails relevant to your product area.
Query last month’s eligible traffic and compute days to reach 10k per group at 50/50.
Run a randomization balance check across device, geo, and browser.
Calculate a 95% CI for a proportion difference from a past test.
Explain a p-value and a confidence interval to a non-technical partner.
Identify one potential bias (novelty, seasonality, contamination) in a recent test.
Draft an experiment design doc for a small UI change.
Practice a 3-slide readout: context, results, decision.

Mini project: End-to-end experiment

Scenario: You’re testing a new onboarding tooltip.

Frame hypothesis with audience and MDE.
Choose primary (activation rate D1) and guardrails (time on task, error rate).
Estimate sample size and runtime from last week’s traffic.
Define eligibility and unit of randomization; document stop rules.
Run invariant checks daily for 5 days; log any anomalies.
Analyze results: effect size, CI, significance, guardrails.
Write a decision memo: ship/iterate/stop + follow-ups.

Acceptance criteria

Clear H0/H1 with MDE
Primary metric and 2 guardrails with rationale
Sample size math shown
SQL or spreadsheet used for analysis
Readout includes CI and decision

Practical projects

Optimize email subject line: click-through as primary, unsubscribe as guardrail.
Checkout UI change: purchase conversion primary, latency and errors as guardrails.
Recommendation ranking tweak: CTR primary, dwell time and complaint rate guardrails.

Common mistakes and debugging tips

Peeking too often: Plan interim looks or use sequential methods; otherwise wait for pre-planned sample.
Underpowered tests: Increase MDE or duration; stack-rank ideas to test larger expected effects first.
Poor randomization: Verify assignment code; ensure consistent user bucketing and eligibility filters.
Metric mismatch: If the change affects discovery, avoid using deep-funnel metrics as primary.
Seasonality or events: Avoid launches during holidays; if unavoidable, ensure both variants see the same periods.
Novelty and learning effects: Run long enough to pass initial novelty; consider ramping traffic.
Contamination: Prevent users seeing both variants; prefer user-level bucketing.
Tracking bugs: Monitor event volumes; keep a stable set of invariant events.

Subskills

Hypothesis Framing — Turn ideas into testable H0/H1 with direction and MDE.
Primary Metric Selection — Choose one metric tightly aligned to the goal.
Guardrail Metrics — Define metrics that must not degrade (safety/quality).
Randomization Basics — Assign units consistently and check balance.
Sample Size and Power Basics — Plan detectable effects with alpha/power.
Running Time Estimation — Convert sample needs into days with traffic.
Experiment Design Basics — Unit, allocation, eligibility, and duration.
Data Quality Checks — Track invariants and detect logging issues.
Statistical Significance Basics — Interpret p-values and Type I/II errors.
Confidence Intervals Interpretation — Read effect ranges and uncertainty.
Multiple Testing Awareness — Avoid false positives across many looks/segments.
Experiment Readout and Decision Making — Communicate results and actions.
Common Pitfalls and Biases — Spot novelty, seasonality, contamination.

Next steps

Practice with the mini project and at least one practical project.
Take the skill exam below to check your readiness. Everyone can take the exam; logged-in learners get progress saved.
Once confident, continue to more advanced experimentation topics (e.g., sequential testing, CUPED, geo experiments).

Menu

A/B Testing Basics

Table of Contents

Why A/B Testing matters for a Data Analyst

Who this is for

Prerequisites

Learning path

Step 1 — Frame the hypothesis

Step 2 — Select metrics

Step 3 — Plan power and runtime

Step 4 — Design the experiment

Step 5 — Run and monitor quality

Step 6 — Analyze, interpret, decide

Worked examples

1) Framing hypothesis + metrics

2) SQL: Compute conversion and uplift

3) Randomization check

4) Sample size and runtime (rough)

5) Readout decision

Drills and exercises

Mini project: End-to-end experiment

Practical projects

Common mistakes and debugging tips

Subskills

Next steps

A/B Testing Basics — Skill Exam

Topics

Hypothesis Framing

Primary Metric Selection

Guardrail Metrics

Randomization Basics

Sample Size and Power Basics

Running Time Estimation

Experiment Design Basics

Data Quality Checks

Statistical Significance Basics

Confidence Intervals Interpretation

Multiple Testing Awareness

Experiment Readout and Decision Making

Common Pitfalls and Biases

Have questions about A/B Testing Basics?

AI Assistant