luvv to helpDiscover the Best Free Online Tools
Topic 1 of 8

Probability Inequalities Basics

Learn Probability Inequalities Basics for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Who this is for

This lesson is for aspiring and practicing Data Scientists who need quick, reliable bounds on probabilities when full distributions are unknown or sampling is limited.

Prerequisites

  • Random variables, expectation, variance, independence
  • Sample mean and basic concentration ideas
  • Comfort with algebra and reading inequalities

Why this matters

Probability inequalities are safety nets. They give worst-case guarantees without requiring full distribution knowledge.

  • Product analytics: bound the chance your A/B test result deviates from truth when traffic is small.
  • Risk estimates: ensure metrics (e.g., downtime, cost overruns) stay below thresholds with high probability.
  • Monitoring: set alert thresholds with guarantees on false alarm probabilities.

Concept explained simply

Probability inequalities upper-bound the probability of rare-but-bad events using a few summary stats (mean, variance, bounds). They trade tightness for universality: they work under minimal assumptions.

  • Markov: if X ≥ 0 and you know E[X], then P(X ≥ a) ≤ E[X]/a.
  • Chebyshev: if you know mean μ and variance σ², then P(|X − μ| ≥ kσ) ≤ 1/k².
  • Union bound (Boole): P(A₁ ∪ ... ∪ A_n) ≤ Σ P(A_i).
  • Hoeffding (for averages of bounded independent variables): deviations shrink exponentially with n.

Mental model

Imagine guardrails on a mountain road: they don’t tell you the exact path of the car (full distribution) but guarantee you won’t fall off the cliff (catastrophic probabilities stay small). More information gives tighter guardrails:

  1. Only mean known → Markov (loose, but minimal assumptions).
  2. Mean + variance known → Chebyshev (tighter).
  3. Independent, bounded samples → Hoeffding (tightest here; exponential decay).

Core inequalities overview

Markov's inequality (nonnegative X)

If X ≥ 0 and a > 0:

P(X ≥ a) ≤ E[X] / a.

Use when you only know a mean and X can't be negative (times, counts, losses).

Chebyshev's inequality (finite variance)

For any random variable with mean μ and variance σ² and any k > 0:

P(|X − μ| ≥ kσ) ≤ 1 / k².

For sample mean X̄ of n i.i.d. variables with variance σ²: P(|X̄ − μ| ≥ ε) ≤ σ² / (n ε²).

Union bound (Boole's inequality)

For any events A₁,...,A_n (no independence needed):

P(∪ A_i) ≤ Σ P(A_i).

Use to control multiple risks or multiple comparisons.

One-sided Chebyshev (Cantelli)

For mean μ, variance σ², and t ≥ 0:

P(X − μ ≥ t) ≤ σ² / (σ² + t²). Equivalently, P(X − μ ≥ kσ) ≤ 1 / (1 + k²).

Use when you're worried about only the upper tail.

Hoeffding's inequality (bounded, independent)

Let X₁,...,X_n be independent with a_i ≤ X_i ≤ b_i and X̄ their mean. For any ε > 0:

P(|X̄ − E[X̄]| ≥ ε) ≤ 2 exp( −2 n² ε² / Σ (b_i − a_i)² ).

For identical bounds a ≤ X_i ≤ b: P(|X̄ − μ| ≥ ε) ≤ 2 exp( −2 n ε² / (b − a)² ).

Worked examples

Example 1 — Markov for support tickets

Let X be daily time spent on high-priority tickets (hours). Assume X ≥ 0 and E[X] = 5. What is P(X ≥ 15)?

By Markov: P(X ≥ 15) ≤ 5 / 15 = 1/3 ≈ 0.333.

Interpretation: Without more info, the chance of ≥15 hours is at most 33.3%.

Example 2 — Chebyshev for sample mean accuracy

A metric has variance σ² = 16. You take n = 64 i.i.d. samples. Bound P(|X̄ − μ| ≥ 0.5).

Chebyshev for the mean: ≤ σ² / (n ε²) = 16 / (64 × 0.25) = 16 / 16 = 1. So the bound is 1 (trivial). Chebyshev can be loose. If ε = 1, then 16 / (64 × 1) = 0.25.

Example 3 — Union bound for multiple alerts

Three independent monitoring alerts with false trigger probabilities 0.01, 0.02, 0.01 in an hour. What is P(at least one fires)?

Union bound: ≤ 0.01 + 0.02 + 0.01 = 0.04. Independence not required for this bound.

Example 4 — Hoeffding for click-through rate

In an A/B test, each user click indicator X_i ∈ {0,1}. True mean μ is the CTR. With n = 2000 users and ε = 0.02, bound P(|X̄ − μ| ≥ 0.02).

Here a=0, b=1, so (b − a)² = 1. Hoeffding: ≤ 2 exp(−2 n ε²) = 2 exp(−2 × 2000 × 0.0004) = 2 exp(−1.6) ≈ 2 × 0.2019 ≈ 0.4038.

Note: This is conservative. Tighter bounds (e.g., Bernstein/Chernoff) can be smaller when variance is low.

Exercises

These exercises mirror the practice tasks below. Try them first, then open the solutions.

  1. Exercise ex1: Nonnegative X with E[X] = 8. Bound P(X ≥ 20) using Markov. Improve the bound at threshold 40.
  2. Exercise ex2: A variable has variance 9. For n = 100 samples, bound P(|X̄ − μ| ≥ 1) using Chebyshev. What n makes this bound ≤ 0.05?
  • Use the correct inequality for the information given.
  • Write the formula before plugging numbers.
  • State assumptions (e.g., nonnegative, bounded, independence).

Common mistakes and self-check

  • Using Markov when X can be negative. Self-check: confirm X ≥ 0.
  • Forgetting that Chebyshev needs finite variance. Self-check: do you have σ²?
  • Applying Hoeffding without bounded support or independence. Self-check: verify bounds a,b and independence across samples.
  • Interpreting bounds as exact probabilities. Self-check: remember these are upper bounds.
  • Ignoring that union bound can be loose. Self-check: if events overlap heavily, bound may be much larger than the true probability.

Learning path

  1. Reinforce expectation and variance basics.
  2. Master Markov and Chebyshev with quick calculations.
  3. Learn union bound and one-sided Cantelli for directional risks.
  4. Apply Hoeffding to sample means of bounded data.
  5. Compare bounds on the same problem to choose the tightest valid one.

Practical projects

  1. Alert design: Given hourly false alarm targets, use union bounds to set per-alert thresholds.
  2. A/B test guardrails: Use Hoeffding to compute sample sizes for target error probabilities at chosen effect sizes.
  3. SLA risk audit: With only mean or variance of response times known, provide worst-case exceedance probabilities using Markov and Chebyshev.

Mini challenge

You track daily API errors per region (nonnegative). You only know the global daily mean per region is 12. You have 6 regions. What is an upper bound on the probability that any region exceeds 60 errors tomorrow?

Hint

Use Markov per region, then union bound across regions.

Solution

Per region: P(X ≥ 60) ≤ 12/60 = 0.2. Any of 6 regions: ≤ 6 × 0.2 = 1.2 → cap at 1, so bound is 1 (trivial). This shows bounds can be loose; consider better information or tighter inequalities if available.

Next steps

  • Repeat the exercises with different thresholds and sample sizes.
  • Compare Markov, Chebyshev, and Hoeffding on the same dataset to see which is tightest under valid assumptions.
  • Move on to stronger inequalities (e.g., Bernstein/Chernoff) when variance or moment info is available.

Practice Exercises

2 exercises to complete

Instructions

Let X ≥ 0 with E[X] = 8.

  1. Compute an upper bound on P(X ≥ 20) using Markov.
  2. Compute an upper bound on P(X ≥ 40).
  3. Briefly explain how changing the threshold affects the bound.
Expected Output
Numerical bounds for the two probabilities and a short explanation about threshold effects.

Probability Inequalities Basics — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Probability Inequalities Basics?

AI Assistant

Ask questions about this tool