luvv to helpDiscover the Best Free Online Tools
Topic 6 of 9

Experiment Result Visuals

Learn Experiment Result Visuals for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

Data Scientists turn experiments into decisions. Strong visuals help stakeholders quickly see what changed, how sure we are, and what action to take. You will use these visuals to present A/B tests, model evaluations, feature rollouts, and hyperparameter results.

  • Communicate effect size and uncertainty clearly
  • Compare models or variants fairly
  • Summarize many trials without overwhelming your audience

Who this is for

  • Data Scientists and ML Engineers who present test results
  • Analysts who run A/B/n experiments
  • Anyone who needs decision-ready charts, not just pretty ones

Prerequisites

  • Basic statistics: mean/proportion, confidence intervals, standard error
  • Classification metrics: precision, recall, ROC/PR basics
  • Comfort reading bar charts, line charts, heatmaps

Concept explained simply

Experiment Result Visuals are charts that show three things together:

  • What changed (effect size: difference, lift, delta)
  • How sure (uncertainty: confidence interval, credible interval, variability)
  • So what (recommended action: ship, iterate, or stop)

Mental model

Think in layers:

  • Layer 1: Baseline vs Variant(s) — show metric values
  • Layer 2: Uncertainty — add error bars/intervals
  • Layer 3: Decision — add a brief takeaway (e.g., "+1.2 pp lift, likely positive; safe to ship")
Helpful defaults
  • Effect size unit: percentage point (pp) for absolute change, % for relative change
  • Intervals: 95% CI for frequentist, or 95% credible interval for Bayesian
  • Footnotes: n (sample size), window, metric definition, method for CI

Visual grammar by goal

  • Show difference and uncertainty: Difference plot (dot with CI) or bar chart with error bars
  • Compare classifier performance: ROC and PR curves; confusion matrix heatmap at chosen threshold
  • Hyperparameter search: Heatmap (2D grid) or parallel coordinates for >2 params
  • Trend over time or sequential tests: Cumulative metric line with CI ribbon

Worked examples

Example 1 — A/B test (conversion rate)

Scenario: A (n=8000, conv=920 → 11.5%), B (n=7900, conv=1007 → 12.75%).

  • Absolute lift: 12.75% − 11.5% = +1.25 pp
  • Relative lift: 1.25/11.5 ≈ +10.9%
  • 95% CI (normal approx):
    • A: 11.5% ± 0.70% → [10.8%, 12.2%]
    • B: 12.75% ± 0.74% → [12.01%, 13.49%]
    • Difference: 1.25 pp ± 1.02 pp → [0.23, 2.27] pp

Recommended visuals:

  • Bar chart with 95% CI error bars for A and B
  • Difference plot: a single dot at +1.25 pp with a horizontal CI line

Decision note: CI of difference is above 0 → likely positive; consider rollout.

What to annotate
  • Title: Variant B improved conversion
  • Subtitle: +1.25 pp (≈+10.9% rel), 95% CI [0.23, 2.27] pp
  • Footnote: nA=8000, nB=7900, window: 14 days, CI: normal approx

Example 2 — Classification model comparison

Scenario: Validation set with class imbalance. Model M1 vs M2.

  • Show ROC and PR curves for both models
  • Mark recommended threshold on each curve
  • Add confusion matrix heatmap at recommended threshold

Example threshold (M1 at 0.4) with actual positives=1200, negatives=8800:

  • TP=880, FN=320, FP=440, TN=8360
  • Precision=880/(880+440)=0.667, Recall=880/1200=0.733, F1≈0.698

Decision note: If false positives are costly, adjust threshold to boost precision; show trade-off with a threshold vs metric line chart.

What to annotate
  • Title: M1 vs M2 — ROC (AUC) and PR (AUC)
  • Subtitle: Chosen threshold 0.40 → P=0.667, R=0.733, F1=0.698
  • Footnote: Validation split details; positive rate; metrics definitions

Example 3 — Hyperparameter tuning

Scenario: Grid search over (max_depth, learning_rate) for a gradient boosting model; score = validation AUC.

  • Heatmap: x=learning_rate, y=max_depth, color=AUC
  • Annotate best cell and top-3 candidates
  • Optionally add a small multiple showing train vs validation AUC to spot overfitting

Decision note: Choose the simplest hyperparameters within 0.5% of the best score to reduce variance and complexity.

What to annotate
  • Title: Validation AUC by hyperparameters
  • Subtitle: Best AUC=0.912 at (depth=5, lr=0.05)
  • Footnote: 5-fold CV mean±std; highlight robustness, not just peak

How to build decision-ready visuals (steps)

Step 1: Choose the core message (difference, trade-off, trend)
Step 2: Pick the minimal chart that shows it (plus uncertainty)
Step 3: Add annotations: effect size, CI, n, dates, metric definition
Step 4: Add a one-line decision note (ship / iterate / stop)
Step 5: Add a footnote for methods (CI type, split method)

Common mistakes and self-check

  • Only showing means without uncertainty → Always add CI/error bars
  • Using relative lift without baseline → Add absolute difference and baseline
  • Overlapping CIs interpreted as "no effect" → Check CI of the difference
  • Truncated y-axes exaggerating effects → Start at 0 for rates; justify if not
  • Cherry-picking time windows → Show full window or justify filters
  • Ignoring class imbalance → Use PR curve, not just ROC, for imbalanced data
  • Hiding sample sizes → Always include n and date range
Self-check checklist
  • [ ] Effect size shown (absolute and relative)
  • [ ] Uncertainty shown (CI/error bars)
  • [ ] Sample size and window stated
  • [ ] Metric and CI method defined
  • [ ] Decision note present
  • [ ] Axes not misleading

Exercises

These exercises mirror the practice tasks below. Do them first, then compare with the solutions. The quick test at the end is available to everyone; if you log in, your progress will be saved.

Exercise 1 — A/B test CI bars and difference plot

You ran an A/B test on signups: A: n=8000, signups=920; B: n=7900, signups=1007. Create a decision-ready visual.

  • Compute absolute and relative lift
  • Approximate 95% CI for each group (normal approx)
  • Compute 95% CI for the difference
  • Describe the bar chart with error bars and the difference plot, including title, subtitle, and footnote
Hints
  • p = x/n; SE = sqrt(p(1-p)/n); 95% CI ≈ p ± 1.96·SE
  • SE of difference ≈ sqrt(SE_A^2 + SE_B^2)

Exercise 2 — Model threshold visuals

Binary classifier with class imbalance (positives=1200, negatives=8800). At threshold 0.40: TP=880, FN=320, FP=440, TN=8360. Create visuals for a product review.

  • Compute Precision, Recall, F1
  • Pick two visuals to show and justify (e.g., PR curve + confusion matrix heatmap)
  • Draft a decision note for a scenario where false positives are costly
Hints
  • Precision = TP/(TP+FP); Recall = TP/(TP+FN)
  • If FP cost is high → favor precision (raise threshold)

Practical projects

  • A/B Decision Card: Bar + difference plot; include CI, n, and action
  • Model Evaluation Panel: ROC, PR, threshold vs F1, confusion matrix
  • Tuning Heatmap: Highlight top-3 parameter sets; annotate robustness

Learning path

  • Before: Basic plotting; Intro to experiment design; Classification metrics
  • Now: Experiment Result Visuals (this lesson)
  • Next: Dashboarding and storytelling with data; Experiment design pitfalls

Mini challenge

In one slide, summarize a week-long feature flag rollout that improved retention by +0.8 pp with 95% CI [0.1, 1.5] pp, n=120k users, segmented by platform. Include:

  • One overall difference plot
  • A small multiples bar chart (iOS vs Android) with CIs
  • One-line decision note
  • Footnote with dates, metric definition, CI method

Keep it concise and decision-ready.

Next steps

  • Convert your most common analyses into reusable visualization templates
  • Standardize footnotes: metric, n, window, CI method
  • Practice writing one-line decision notes for every chart

Practice Exercises

2 exercises to complete

Instructions

You ran an A/B test on signups.

  • Group A: n=8000, signups=920
  • Group B: n=7900, signups=1007

Tasks:

  • Compute absolute and relative lift
  • Approximate 95% CI for each proportion using normal approximation
  • Approximate 95% CI for the difference
  • Describe what your bar chart with error bars and your difference plot would show, including title, subtitle, footnote
Expected Output
A decision-ready visual plan: A and B bars with 95% CI, plus a difference plot showing +1.25 pp (≈+10.9% rel), CI of difference ~[0.23, 2.27] pp, with footnote including n, window, and CI method.

Experiment Result Visuals — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Experiment Result Visuals?

AI Assistant

Ask questions about this tool