Who this is for
This lesson is for aspiring and practicing Data Scientists who need quick, reliable bounds on probabilities when full distributions are unknown or sampling is limited.
Prerequisites
- Random variables, expectation, variance, independence
- Sample mean and basic concentration ideas
- Comfort with algebra and reading inequalities
Why this matters
Probability inequalities are safety nets. They give worst-case guarantees without requiring full distribution knowledge.
- Product analytics: bound the chance your A/B test result deviates from truth when traffic is small.
- Risk estimates: ensure metrics (e.g., downtime, cost overruns) stay below thresholds with high probability.
- Monitoring: set alert thresholds with guarantees on false alarm probabilities.
Concept explained simply
Probability inequalities upper-bound the probability of rare-but-bad events using a few summary stats (mean, variance, bounds). They trade tightness for universality: they work under minimal assumptions.
- Markov: if X ≥ 0 and you know E[X], then P(X ≥ a) ≤ E[X]/a.
- Chebyshev: if you know mean μ and variance σ², then P(|X − μ| ≥ kσ) ≤ 1/k².
- Union bound (Boole): P(A₁ ∪ ... ∪ A_n) ≤ Σ P(A_i).
- Hoeffding (for averages of bounded independent variables): deviations shrink exponentially with n.
Mental model
Imagine guardrails on a mountain road: they don’t tell you the exact path of the car (full distribution) but guarantee you won’t fall off the cliff (catastrophic probabilities stay small). More information gives tighter guardrails:
- Only mean known → Markov (loose, but minimal assumptions).
- Mean + variance known → Chebyshev (tighter).
- Independent, bounded samples → Hoeffding (tightest here; exponential decay).
Core inequalities overview
Markov's inequality (nonnegative X)
If X ≥ 0 and a > 0:
P(X ≥ a) ≤ E[X] / a.
Use when you only know a mean and X can't be negative (times, counts, losses).
Chebyshev's inequality (finite variance)
For any random variable with mean μ and variance σ² and any k > 0:
P(|X − μ| ≥ kσ) ≤ 1 / k².
For sample mean X̄ of n i.i.d. variables with variance σ²: P(|X̄ − μ| ≥ ε) ≤ σ² / (n ε²).
Union bound (Boole's inequality)
For any events A₁,...,A_n (no independence needed):
P(∪ A_i) ≤ Σ P(A_i).
Use to control multiple risks or multiple comparisons.
One-sided Chebyshev (Cantelli)
For mean μ, variance σ², and t ≥ 0:
P(X − μ ≥ t) ≤ σ² / (σ² + t²). Equivalently, P(X − μ ≥ kσ) ≤ 1 / (1 + k²).
Use when you're worried about only the upper tail.
Hoeffding's inequality (bounded, independent)
Let X₁,...,X_n be independent with a_i ≤ X_i ≤ b_i and X̄ their mean. For any ε > 0:
P(|X̄ − E[X̄]| ≥ ε) ≤ 2 exp( −2 n² ε² / Σ (b_i − a_i)² ).
For identical bounds a ≤ X_i ≤ b: P(|X̄ − μ| ≥ ε) ≤ 2 exp( −2 n ε² / (b − a)² ).
Worked examples
Example 1 — Markov for support tickets
Let X be daily time spent on high-priority tickets (hours). Assume X ≥ 0 and E[X] = 5. What is P(X ≥ 15)?
By Markov: P(X ≥ 15) ≤ 5 / 15 = 1/3 ≈ 0.333.
Interpretation: Without more info, the chance of ≥15 hours is at most 33.3%.
Example 2 — Chebyshev for sample mean accuracy
A metric has variance σ² = 16. You take n = 64 i.i.d. samples. Bound P(|X̄ − μ| ≥ 0.5).
Chebyshev for the mean: ≤ σ² / (n ε²) = 16 / (64 × 0.25) = 16 / 16 = 1. So the bound is 1 (trivial). Chebyshev can be loose. If ε = 1, then 16 / (64 × 1) = 0.25.
Example 3 — Union bound for multiple alerts
Three independent monitoring alerts with false trigger probabilities 0.01, 0.02, 0.01 in an hour. What is P(at least one fires)?
Union bound: ≤ 0.01 + 0.02 + 0.01 = 0.04. Independence not required for this bound.
Example 4 — Hoeffding for click-through rate
In an A/B test, each user click indicator X_i ∈ {0,1}. True mean μ is the CTR. With n = 2000 users and ε = 0.02, bound P(|X̄ − μ| ≥ 0.02).
Here a=0, b=1, so (b − a)² = 1. Hoeffding: ≤ 2 exp(−2 n ε²) = 2 exp(−2 × 2000 × 0.0004) = 2 exp(−1.6) ≈ 2 × 0.2019 ≈ 0.4038.
Note: This is conservative. Tighter bounds (e.g., Bernstein/Chernoff) can be smaller when variance is low.
Exercises
These exercises mirror the practice tasks below. Try them first, then open the solutions.
- Exercise ex1: Nonnegative X with E[X] = 8. Bound P(X ≥ 20) using Markov. Improve the bound at threshold 40.
- Exercise ex2: A variable has variance 9. For n = 100 samples, bound P(|X̄ − μ| ≥ 1) using Chebyshev. What n makes this bound ≤ 0.05?
- Use the correct inequality for the information given.
- Write the formula before plugging numbers.
- State assumptions (e.g., nonnegative, bounded, independence).
Common mistakes and self-check
- Using Markov when X can be negative. Self-check: confirm X ≥ 0.
- Forgetting that Chebyshev needs finite variance. Self-check: do you have σ²?
- Applying Hoeffding without bounded support or independence. Self-check: verify bounds a,b and independence across samples.
- Interpreting bounds as exact probabilities. Self-check: remember these are upper bounds.
- Ignoring that union bound can be loose. Self-check: if events overlap heavily, bound may be much larger than the true probability.
Learning path
- Reinforce expectation and variance basics.
- Master Markov and Chebyshev with quick calculations.
- Learn union bound and one-sided Cantelli for directional risks.
- Apply Hoeffding to sample means of bounded data.
- Compare bounds on the same problem to choose the tightest valid one.
Practical projects
- Alert design: Given hourly false alarm targets, use union bounds to set per-alert thresholds.
- A/B test guardrails: Use Hoeffding to compute sample sizes for target error probabilities at chosen effect sizes.
- SLA risk audit: With only mean or variance of response times known, provide worst-case exceedance probabilities using Markov and Chebyshev.
Mini challenge
You track daily API errors per region (nonnegative). You only know the global daily mean per region is 12. You have 6 regions. What is an upper bound on the probability that any region exceeds 60 errors tomorrow?
Hint
Use Markov per region, then union bound across regions.
Solution
Per region: P(X ≥ 60) ≤ 12/60 = 0.2. Any of 6 regions: ≤ 6 × 0.2 = 1.2 → cap at 1, so bound is 1 (trivial). This shows bounds can be loose; consider better information or tighter inequalities if available.
Next steps
- Repeat the exercises with different thresholds and sample sizes.
- Compare Markov, Chebyshev, and Hoeffding on the same dataset to see which is tightest under valid assumptions.
- Move on to stronger inequalities (e.g., Bernstein/Chernoff) when variance or moment info is available.