What you’ll learn
How to think in priors and likelihoods, apply Bayes’ rule to real data problems, pick simple conjugate priors (Beta-Binomial, Normal-Normal), and interpret posterior estimates and credible intervals. You’ll practice small, realistic calculations used in product analytics, A/B testing, and ML.
Note: Tests are available to everyone; only logged-in users get saved progress.
Why this matters
- A/B tests: Combine historical knowledge (prior) with new experiment data to make faster, more stable decisions.
- Spam/fraud detection: Update risk in real-time as new evidence arrives.
- Forecasting: Start from prior beliefs (e.g., seasonal rates) and refine as new data streams in.
- ML modeling: Bayesian thinking helps with uncertainty estimates, regularization, and small-data robustness.
Concept explained simply
Bayesian inference updates what you believe after seeing evidence.
- Prior: Your belief before new data (e.g., baseline conversion rate).
- Likelihood: How probable the observed data is under a hypothesis.
- Posterior: Updated belief after seeing data.
Bayes’ rule: Posterior ∝ Likelihood × Prior.
Mental model: “Belief thermostat”
Your prior is the current setting. New data nudges the setting. If data is strong/reliable, it nudges more; if noisy/weak, it nudges less. Over time, the thermostat stabilizes near the truth.
Worked examples
Example 1: Medical test positive
Prevalence P(D)=0.01; sensitivity P(+|D)=0.99; false positive P(+|¬D)=0.05.
- Compute numerator: P(+|D)P(D)=0.99×0.01=0.0099
- Compute denominator: 0.0099 + 0.05×0.99=0.0099+0.0495=0.0594
- Posterior: P(D|+)=0.0099/0.0594≈0.1667 (≈ 16.7%)
Interpretation: Even a good test can yield a modest posterior when the base rate is low.
Example 2: Beta-Binomial (conversion rate)
Prior Beta(2,2) for conversion p (weakly centers at 0.5). You observe 30 successes out of 100 trials.
- Posterior parameters: α=2+30=32, β=2+70=72 → Beta(32,72)
- Posterior mean: 32/(32+72)=32/104≈0.3077
- MAP: (α−1)/(α+β−2)=31/102≈0.3039
Interpretation: The prior gently pulls the estimate toward 0.5 versus raw 0.30, providing regularization.
Example 3: Normal-Normal (mean with known variance)
Prior for mean μ: Normal(μ0=50, τ0^2=25). Data: sample mean x̄=54, n=20, known variance σ^2=16.
- Precision: 1/τ0^2=0.04; n/σ^2=20/16=1.25; sum=1.29
- Posterior variance: 1/1.29≈0.7752
- Posterior mean: (μ0/τ0^2 + n·x̄/σ^2) / (1/τ0^2 + n/σ^2) = (50/25 + 20·54/16) / 1.29 = (2 + 67.5)/1.29 ≈ 53.88
Interpretation: The posterior mean balances prior and data weighted by their precisions.
Key ideas you’ll reuse
- Odds form: Posterior odds = Prior odds × Likelihood ratio (Bayes factor). Great for step-by-step evidence updates.
- Conjugate priors: Pick priors that yield posteriors in the same family (e.g., Beta-Binomial, Normal-Normal) for fast, exact updates.
- Credible interval: A 95% credible interval contains 95% posterior mass. It is a probability statement about the parameter.
- MAP vs posterior mean: MAP is the mode; posterior mean averages uncertainty. With symmetric unimodal posteriors they’re often similar.
How to do it (step-by-step)
- Write the prior. Choose Beta(α,β) for probabilities; Normal(μ0,τ0^2) for means.
- Define the likelihood. Binomial for counts; Normal for averages with known variance.
- Update. Use conjugate formulas or Bayes’ rule.
- Summarize. Posterior mean/MAP and a credible interval.
- Decide. Compare to a threshold, or compute odds/expected utility.
Exercises
Match these with the exercise panel below. Show your work and round to 4 decimals where needed.
- ex1 — Spam word “free”: P(Spam)=0.2, P("free"|Spam)=0.4, P("free"|NotSpam)=0.05. Compute P(Spam|"free").
- ex2 — Beta-Binomial update: Prior Beta(2,2). Observe 8 successes, 2 failures. Find posterior, posterior mean, and MAP.
- ex3 — Normal-Normal mean: Prior μ0=50, τ0^2=25. Data: n=20, x̄=54, σ^2=16. Compute posterior mean and variance.
Self-check checklist
- I identified prior, likelihood, and posterior for each exercise.
- I computed denominators using total probability when needed.
- For Beta-Binomial, I updated α and β correctly.
- For Normal-Normal, I combined precisions (not variances) when updating.
- I stated results with clear interpretation (what the number means).
Common mistakes and how to self-check
- Ignoring base rates: High sensitivity doesn’t imply high posterior when prevalence is low. Always include P(A).
- Mixing up α, β updates: For Beta-Binomial, α += successes, β += failures. Verify counts.
- Using variances instead of precisions: In Normal-Normal, weight by 1/variance. Recompute carefully.
- Overconfident priors: If the prior is too sharp, new data barely moves the posterior. Try a weaker prior unless you truly have strong evidence.
- Confusing credible vs confidence intervals: Credible is about parameter probability; confidence is about long-run frequency of procedures.
Quick self-audit
- Did I write the full Bayes numerator and denominator?
- Did I check that posterior sits between prior and data (unless prior is extremely strong)?
- Do my parameters stay in valid ranges (probabilities in [0,1])?
Who this is for
- Aspiring and practicing Data Scientists who want principled uncertainty quantification.
- Analysts running A/B tests and business experiments.
- ML engineers adding calibrated probabilities to models.
Prerequisites
- Basic probability (events, conditional probability).
- Distributions: Bernoulli/Binomial and Normal.
- Algebra and comfort with fractions/ratios.
Learning path
- Refresh conditional probability and odds.
- Bayes’ rule and posterior interpretation (this lesson).
- Conjugate priors: Beta-Binomial, Normal-Normal.
- Bayes factors and odds updates.
- From closed-form to computation: brief intro to MCMC/approximate methods.
Practical projects
- Bayesian A/B test dashboard: Posterior for two proportions using Beta priors with real or simulated data.
- Spam word scorer: Maintain prior odds and update with likelihood ratios per word.
- Forecast with uncertainty: Normal-Normal update for a daily metric’s mean.
Next steps
- Try Bayesian A/B testing on a historical experiment to compare decisions to frequentist methods.
- Explore sensitivity: vary priors (weak to strong) and see how posteriors shift.
- Move to hierarchical models for partial pooling when you have many related groups.
Mini challenge
You launch a feature to 1,000 users: 70 conversions. Prior Beta(5,5). Compute the posterior Beta parameters, posterior mean, and decide if the mean exceeds 0.06. Explain your decision rule.
Quick Test
Take the short test below. Note: Everyone can take the test; only logged-in users have progress saved.