luvv to helpDiscover the Best Free Online Tools
Topic 9 of 13

Statistical Significance Basics

Learn Statistical Significance Basics for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

As a Data Analyst, you are often asked: Did the new variant actually improve the metric, or is it just random noise? Statistical significance helps you make that call with confidence. You will use it to:

  • Decide if a new design increases conversion rate.
  • Verify whether average order value changed after a pricing tweak.
  • Communicate results clearly with confidence intervals and error rates.

Concept explained simply

Think of experimentation as listening for a faint signal (the true effect) in a noisy room (random variation). Statistical significance sets a rule for when the signal is strong enough to act on.

Mental model

  • Null hypothesis (H0): There is no real difference; any observed change is noise.
  • Alternative (H1): There is a real difference.
  • Alpha (α): The false alarm tolerance (commonly 0.05). If p-value < α, you reject H0.
  • p-value: How unusual your data would be if H0 were true. Smaller p = stronger evidence against H0.
  • Confidence interval (CI): A range of plausible true effects. If the CI excludes 0, it’s significant at that level.
  • Power (1−β): Chance to detect a real effect. Plan sample size to get adequate power for your Minimum Detectable Effect (MDE).
  • Type I error: False positive. Type II error: False negative.
Formula mini-cheat sheet (practical)
  • Two-proportion z-test (conversion): compare pA=xA/nA vs pB=xB/nB. Pooled p=(xA+xB)/(nA+nB). SE = sqrt(p(1−p)(1/nA+1/nB)). z = (pB−pA)/SE.
  • Two-sample t-test (means): diff = meanB−meanA; SE = sqrt(sA^2/nA + sB^2/nB). t = diff/SE (Welch approximation is robust).
  • 95% CI (approx): diff ± 1.96×SE.
  • Sample size (rough, proportions): n per group ≈ 16×p×(1−p)/MDE^2 for 80% power at α≈0.05.

Quick reference

  • Use a two-proportion z-test for conversion rates; a two-sample t-test for averages (AOV, time, revenue/user).
  • Default to two-sided tests unless you have a strong, pre-registered one-sided hypothesis.
  • Avoid “peeking” repeatedly at the p-value without correction; it inflates false positives. If you must, use sequential designs or adjust α.
  • Report effect size and CI, not just the p-value.
  • Plan MDE and sample size before launching. Underpowered tests waste time.

Worked examples

Example 1: Conversion rate (two-proportion z-test)

Variant A: 460 conversions / 10,000 users (4.6%). Variant B: 520 / 10,000 (5.2%).

Show calculation
  • pA = 0.046, pB = 0.052, diff = 0.006 (0.6 pp).
  • Pooled p = (460+520)/(10000+10000) = 980/20000 = 0.049.
  • SE = sqrt(0.049×0.951×(1/10000+1/10000)) ≈ 0.00305.
  • z = 0.006 / 0.00305 ≈ 1.97 → two-sided p ≈ 0.049.
  • 95% CI ≈ 0.006 ± 1.96×0.00305 ≈ [0.000, 0.012].

Interpretation: Borderline significant at α=0.05; effect is small but likely positive.

Example 2: Average order value (two-sample t-test)

A: n=600, mean=52, sd=20. B: n=620, mean=55, sd=21.

Show calculation
  • diff = 55−52 = 3.
  • SE = sqrt(20^2/600 + 21^2/620) = sqrt(400/600 + 441/620) ≈ sqrt(0.6667 + 0.7113) ≈ 1.17.
  • t ≈ 3 / 1.17 ≈ 2.56 → two-sided p ≈ 0.01.
  • 95% CI ≈ 3 ± 1.96×1.17 ≈ [0.7, 5.3].

Interpretation: Significant increase in AOV, with a plausible lift between ~0.7 and ~5.3 currency units.

Example 3: Rough sample size for a conversion lift

Baseline p=5%, desired MDE=+0.5 pp, α=0.05, power≈80%.

Show calculation
  • n per group ≈ 16×p×(1−p)/MDE^2.
  • n ≈ 16×0.05×0.95 / 0.005^2 ≈ 30,400 per variant (rough planning figure).

Interpretation: You need large samples to detect small lifts. Use this as a ballpark, then refine with a proper calculator.

How to do it (step-by-step)

  1. Define primary metric and direction of interest (two-sided by default).
  2. Predefine α (commonly 0.05) and MDE. Estimate sample size and duration.
  3. Choose test: two-proportion z-test for conversion, t-test for means.
  4. Collect clean, randomized data with stable tracking.
  5. Compute effect size and SE; get p-value and CI.
  6. Decide: if p < α (or CI excludes 0), reject H0. Also check if the effect is practically meaningful.
  7. Document: metric, α, test type, effect size, CI, p-value, duration, traffic, caveats.

Pre-launch checklist

  • [ ] Clear primary metric and variant naming.
  • [ ] α and test type documented.
  • [ ] MDE and sample size estimate.
  • [ ] Randomization and data quality checks.
  • [ ] Decision rules (stop, extend, or iterate).

Exercises (hands-on)

Try these before peeking at solutions.

Exercise 1: Two-proportion z-test (conversion)

A: 300 conversions / 8,000 users. B: 360 / 8,000. Two-sided, α=0.05. Is B significantly better?

  • Compute pA, pB, pooled p, SE, z, p-value, CI, and decision.
Exercise 2: Two-sample t-test (mean)

A: n=500, mean=24.0, sd=9.5. B: n=480, mean=25.2, sd=9.2. Two-sided, α=0.05. Is the mean higher in B?

  • Compute diff, SE, t, p-value, CI, and decision.

Common mistakes and self-checks

  • Peeking too often: Repeated looks inflate Type I error. Self-check: Did you predefine looks or adjust α?
  • Ignoring effect size: A tiny but significant effect may be useless. Self-check: Did you report CI and practical impact?
  • Underpowered tests: Inconclusive results waste time. Self-check: Did you plan MDE and sample size?
  • Multiple comparisons: Testing many metrics/segments without correction increases false positives. Self-check: Limit primary metrics or adjust for multiplicity.
  • Mismatched test: Using a t-test for rates or z-test for skewed means without care. Self-check: Choose tests matched to metric type.
  • Dirty data: Bot traffic, tracking bugs, or non-random exposure bias results. Self-check: Run data quality checks.

Practical projects

  • Build a simple AB significance calculator in a spreadsheet: inputs (xA, nA, xB, nB), outputs (diff, SE, z, p, CI).
  • Simulate experiments: Generate fake conversions with a known true lift; verify how often you detect significance at α=0.05.
  • Experiment audit: Take a past test, recompute p-value and CI, and write a one-page results summary with decision and caveats.

Who this is for

  • Data Analysts and growth practitioners running or validating A/B tests.
  • Anyone interpreting experiment results for product decisions.

Prerequisites

  • Comfort with basic arithmetic and percentages.
  • Understanding of mean, standard deviation, and proportions.
  • Basic familiarity with spreadsheets or analytical tools.

Learning path

  • Before: Experiment design, randomization, metric selection.
  • Now: Statistical significance, p-values, confidence intervals.
  • Next: Power analysis, MDE tuning, sequential testing and multiple comparisons.

Mini challenge

Baseline conversion is 7%. You want to detect a +0.7 pp lift with 80% power at α=0.05. Approximately how many users per variant do you need?

Reveal answer

n ≈ 16×0.07×0.93 / 0.007^2 ≈ 16×0.0651 / 0.000049 ≈ 1.0416 / 0.000049 ≈ ~21,265 per variant (rough planning figure).

Next steps

  • Turn these steps into a repeatable checklist and template.
  • Implement a lightweight review process: pre-analysis plan, post-mortem, and dashboarding.
  • Advance to power analysis and sequential methods to run faster, safer experiments.

Quick Test

Take the short quiz to check your understanding. Available to everyone; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

A: 300 conversions / 8,000 users. B: 360 / 8,000. Use a two-sided test at α=0.05.

  • Compute pA, pB, pooled p, SE, z, p-value, 95% CI, and the decision.
Expected Output
Approximately: diff=+0.75 pp; z≈2.38; p≈0.02; 95% CI roughly [0.14 pp, 1.36 pp]; Reject H0 at 5%.

Statistical Significance Basics — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Statistical Significance Basics?

AI Assistant

Ask questions about this tool