Who this is for
This lesson is for Data Scientists and ML practitioners who need to make decisions under uncertainty, set sensible thresholds, and communicate risks clearly.
- Binary/multi-class classification owners choosing operating points
- Experimenters running A/B tests
- Forecasters producing predictions with intervals
- Anyone explaining model risks to stakeholders
Prerequisites
- Probability basics: random variables, independence, Bayes' rule (at a high level)
- Basic statistics: mean, variance, confidence vs. prediction intervals (conceptual)
- Comfort with simple arithmetic; Python/R experience helps but not required here
Why this matters
Real DS work is rarely certain. Probabilistic thinking lets you:
- Pick classification thresholds that minimize expected business cost, not just error rate
- Quantify uplift and risk in A/B tests before rolling out changes
- Forecast with prediction intervals to plan inventory or staffing
- Handle missing data and measurement noise without overconfidence
- Communicate uncertainty credibly so decisions are robust
Concept explained simply
Probabilistic thinking means treating unknowns as distributions, not single numbers. You ask: What could be true? How likely? What action has the lowest expected cost (or highest expected value)?
- Data-generating process (DGP): a story for how the data came to be (signals + noise).
- Prior → Likelihood → Posterior: start with beliefs (prior), see data (likelihood), update beliefs (posterior).
- Predictive distribution: uncertainty about future observations, not just parameters.
- Decision by expected loss: combine probabilities with costs/benefits to choose actions.
Mental model (quick)
Imagine a dim room with a noisy meter. Your model is the meter. The world is the room. You never see the true value directly; you see noisy readings. Instead of guessing a single value, keep a distribution that says where the true value likely is. Then choose the action that does best on average under that distribution.
Core tools and terms
- Likelihood: probability of observed data given parameters.
- Posterior: updated belief over parameters after seeing data.
- Predictive interval: range where a future data point is likely to fall.
- Calibration: predicted probabilities match observed frequencies.
- Conditional independence: simplifies models (e.g., Naive Bayes).
- Expected loss: sum of outcomes weighted by their probabilities.
- MCAR/MAR/MNAR: missingness mechanisms; model them to avoid bias.
Worked examples
1) Cost-sensitive threshold for a spam filter
Suppose P(spam|x) = p. Costs: false positive (marking real mail as spam) c_fp = 1, false negative (missing spam) c_fn = 5.
- If you predict "spam": expected cost = c_fp × (1 − p)
- If you predict "not spam": expected cost = c_fn × p
Choose "spam" when c_fp(1 − p) ≤ c_fn p → p ≥ c_fp / (c_fp + c_fn) = 1 / (1 + 5) ≈ 0.167. The optimal threshold is ~0.167, not 0.5.
2) A/B test with a simple Bayesian update
Prior for conversion rate per variant: Beta(1,1). Observations:
- Variant A: 20 conversions / 200 visitors ⇒ posterior A ~ Beta(21,181), mean ≈ 0.104
- Variant B: 28 conversions / 200 visitors ⇒ posterior B ~ Beta(29,173), mean ≈ 0.144
Approximate each posterior as normal to compare means:
- Var(A) ≈ 0.000459, Var(B) ≈ 0.000606 → SD of difference ≈ 0.033
- Mean difference ≈ 0.040 → Z ≈ 1.21 ⇒ P(B > A) ≈ 0.89
Interpretation: B likely beats A, but not certain. If the upside justifies the risk, ship B; otherwise keep testing. This frames rollout as a decision under uncertainty, not a binary accept/reject.
3) Forecast with predictive uncertainty
A regression predicts mean demand μ = 120 units with predictive SD σ = 15.
- 95% prediction interval ≈ 120 ± 1.96×15 ⇒ [90.6, 149.4]
- P(demand > 150) ≈ P(Z > 2) ≈ 0.025
- For a 95% one-sided service level, stock at the 95th percentile ≈ μ + 1.64σ = 144.6
Decision example: If a stockout costs more than overstock, target a higher quantile accordingly.
Hands-on exercises
Do these now. The solutions are available below each exercise in the Exercises panel and include full workings.
Exercise 1: Cost-sensitive decision
You have P(positive|x) = 0.12. Costs: c_fn = 10, c_fp = 1.
- Compute the optimal threshold t.
- Decide whether to predict positive or negative for this x.
- Checklist:
- Set expected cost for both actions
- Solve for indifference threshold
- Compare p to the threshold
Exercise 2: Posterior means for an A/B test
Prior Beta(1,1) for each variant. Data: A has 12/100, B has 9/60 conversions.
- Write the posterior Beta parameters for A and B.
- Compute the posterior mean for each and pick the higher expected conversion.
- Checklist:
- Update: Beta(α+successes, β+failures)
- Mean = α / (α + β)
- State your decision and brief justification
Common mistakes and how to self-check
- Using 0.5 threshold by default. Fix: align threshold with costs and class balance.
- Reporting point estimates only. Fix: include intervals or distributions and explain their meaning.
- Confusing confidence vs. prediction intervals. Fix: prediction intervals are for future observations; they are wider.
- Ignoring calibration. Fix: check reliability curves/Brier score; calibrate if needed (e.g., isotonic, Platt).
- Assuming independence carelessly. Fix: justify independence or test sensitivity.
- Forgetting the DGP. Fix: write a short DGP story before modeling (signals, noise, missingness).
Practical projects
- Threshold tuner: take a trained classifier, define a cost matrix, and compute the optimal operating point. Show expected cost vs. threshold.
- A/B uplift explorer: model conversion with Beta-Binomial, plot posterior means and 95% intervals, and estimate P(B > A).
- Forecast with service levels: produce weekly demand forecasts with predictive intervals and choose reorder points for 90%, 95%, and 99% service levels.
- Missing data simulation: create MCAR/MAR scenarios and compare bias for simple deletion vs. modeled imputation.
Learning path
- Refresh: conditional probability, Bayes' rule, expectation, variance.
- Practice: derive cost-sensitive thresholds and apply to a confusion-matrix simulation.
- Bayesian basics: Beta-Binomial for proportions; Normal-Normal for means.
- Predictive uncertainty: compute and interpret prediction intervals.
- Calibration: diagnose and fix miscalibrated probabilities.
- Communicate: write a 1-page brief that states assumptions, uncertainties, and decision rule.
Next steps
- Deepen Bayes: conjugate priors and simple probabilistic programming.
- Probabilistic graphical models for conditional independence structure.
- Decision theory: utilities, risk constraints, and expected value of information.
- Experimentation: sequential tests and multi-armed bandits.
Mini challenge (15–20 min)
Your classifier outputs P(default|applicant) and your business costs are: c_fn (missed default) = 100, c_fp (unnecessary decline) = 5.
- Compute the optimal threshold t.
- For three applicants with p = 0.03, 0.07, 0.22, choose approve/decline to minimize expected loss.
- Write two sentences to justify your policy to a stakeholder.
Tip: sanity checks
- If c_fn » c_fp, the threshold should be low (you predict positive more easily).
- Predicted probabilities averaged over many similar cases should match observed frequencies if well calibrated.
Take the quick test
Test is available to everyone. If you are logged in, your progress will be saved.