How to learn Probability for Data Scientist for free

Why Probability matters for Data Scientists

Probability is the language of uncertainty. As a Data Scientist, you use it to reason about noisy data, assess model risk, design A/B tests, build Bayesian models, simulate outcomes, and communicate confidence. Mastering probability helps you make sound decisions under uncertainty and avoid common analytical traps.

Experimentation: compute p-values, power, and credible intervals.
Modeling: choose and fit appropriate distributions for data (Poisson, Binomial, Gaussian, etc.).
Inference: apply Bayes’ rule and understand priors/posteriors.
Simulation: validate assumptions and estimate quantities you cannot solve analytically.
Sequential behavior: represent processes with Markov chains (e.g., user states in a funnel).

Who this is for

Career-switchers into Data Science wanting a strong statistical foundation.
Analysts/engineers who run experiments or build predictive models.
Students reinforcing theory with practical, code-first examples.

Prerequisites

Comfort with basic algebra and functions.
Familiarity with Python and NumPy/Pandas is helpful for examples (not strictly required).
Basic descriptive statistics (mean, variance, percentiles).

Learning path (practical roadmap)

Core probability rules — events, complements, independence, conditional probability, Bayes’ rule.
Random variables & distributions — Bernoulli, Binomial, Poisson, Normal, Exponential; PMF/PDF/CDF.
Moments — expectation, variance, covariance, correlation; linearity of expectation.
Asymptotics — Law of Large Numbers (LLN) and Central Limit Theorem (CLT) for confidence intervals.
Inequalities — Markov and Chebyshev for conservative bounds.
Markov chains — state transitions, powers of transition matrices, stationary distributions.
Simulation/Monte Carlo — random sampling, estimators, variance reduction, experiment simulation.
Probabilistic thinking for modeling — choosing distributions, priors, assumptions, and validation.

Milestone tips

Pair every concept with at least one coded example (even a small one-liner).
Simulate to build intuition when formulas feel abstract.
Keep a personal “assumptions checklist” for every analysis.

Worked examples (with code)

1) Bayes for simple spam filtering

Compute P(Spam | contains “win”).

See solution

import math
# Suppose: P(Spam)=0.2, P(Word="win"|Spam)=0.4, P(Word="win"|Ham)=0.05
p_spam = 0.2
p_win_given_spam = 0.4
p_win_given_ham = 0.05
p_ham = 1 - p_spam
p_win = p_spam*p_win_given_spam + p_ham*p_win_given_ham
posterior = (p_spam*p_win_given_spam) / p_win
posterior
# Interpretation: if posterior > threshold (like 0.5 or cost-weighted), flag as spam.

2) Binomial probability for an A/B test

What is P(X ≥ 60) for X ~ Binomial(n=200, p=0.25)?

See code

from math import comb
n, p = 200, 0.25
prob = sum(comb(n, k) * (p**k) * ((1-p)**(n-k)) for k in range(60, n+1))
prob

Use a Normal approximation if you need speed: mean=np=50, var=np(1-p)=37.5.

3) Expected value and variance of revenue

Let daily revenue R = 5X - 2Y, where X and Y are independent counts with E[X]=12, Var[X]=9; E[Y]=4, Var[Y]=4.

See solution

E[R] = 5E[X] - 2E[Y] = 5*12 - 2*4 = 60 - 8 = 52.

Var[R] = 25 Var[X] + 4 Var[Y] (no cross term since independent) = 25*9 + 4*4 = 225 + 16 = 241.

4) CLT-based confidence interval for a mean

Sample of n=400 sessions, sample mean = 5.4 min, sample sd = 2.0 min. Approximate 95% CI for the true mean.

See solution

SE = 2 / sqrt(400) = 0.1, so 95% CI ≈ 5.4 ± 1.96*0.1 = (5.204, 5.596).

5) Markov chain: predicting user state

States: {New, Active, Churn}. Transition matrix rows sum to 1:

import numpy as np
P = np.array([
  [0.1, 0.8, 0.1],  # New → New,Active,Churn
  [0.0, 0.9, 0.1],  # Active → New,Active,Churn
  [0.0, 0.0, 1.0],  # Churn → absorbing
])
pi0 = np.array([1.0, 0.0, 0.0])  # start with all users New
pi2 = pi0 @ np.linalg.matrix_power(P, 2)
pi2

Interpretation

pi2 shows the distribution over states after 2 periods. Use for forecasting churn and planning re-engagement.

Drills and quick exercises

☐ Compute P(A∪B) given P(A), P(B), and P(A∩B).
☐ For X ~ Poisson(λ=3), calculate P(X ≤ 2).
☐ Show that E[aX + b] = aE[X] + b for any constants a, b.
☐ Simulate 10,000 coin flips and estimate P(≥ 60 heads in 100 flips).
☐ Use CLT to build a 95% CI for a sample mean of your choosing.
☐ Construct a 2-state Markov chain and find its stationary distribution.
☐ Apply Bayes’ rule to a medical test with any plausible parameters you pick.

Common mistakes and debugging tips

Confusing independence with disjointness: disjoint events cannot both occur, independent events can. Check P(A∩B) = P(A)P(B) for independence.
Forgetting base rates in Bayes: a highly accurate test can still yield many false positives when prevalence is low. Always compute P(+) correctly.
Using Normal approximations too casually: check n·p and n·(1−p) ≥ ~10 for Binomial; otherwise consider exact methods or continuity corrections.
Ignoring variance in decision-making: compare expected value and uncertainty. Report intervals, not just point estimates.
Misusing CLT with heavy tails: large outliers slow convergence. Consider robust estimators or transformations.
Markov chain misuse: ensure each row sums to 1 and entries are non-negative. Validate with small power checks (P², P³).
Simulation bugs: seed randomness for reproducibility; verify simple moments (mean/variance) match theory before complex metrics.

Mini project: A/B Test Outcome Simulator

Build a tool that simulates an A/B test end-to-end and compares frequentist and Bayesian conclusions.

Define true conversion rates pA and pB and choose sample sizes.
Simulate outcomes with Binomial sampling for each variant.
Compute: (a) z-test and 95% CI for the difference; (b) Bayesian posterior with Beta priors and the probability that B > A.
Repeat many times (Monte Carlo) to estimate power and false positive rate.
Visualize distributions and intervals; log assumptions and decisions.

Starter code

import numpy as np
from scipy.stats import beta, norm
rng = np.random.default_rng(42)

pA, pB = 0.10, 0.12
nA, nB = 1000, 1000
sims = 5000

z_wins = 0
bayes_wins = 0

for _ in range(sims):
    xA = rng.binomial(nA, pA)
    xB = rng.binomial(nB, pB)
    pA_hat, pB_hat = xA/nA, xB/nB

    # z-test for difference in proportions
    se = np.sqrt(pA_hat*(1-pA_hat)/nA + pB_hat*(1-pB_hat)/nB)
    z = (pB_hat - pA_hat) / (se + 1e-12)
    pval = 2*(1 - norm.cdf(abs(z)))
    if pval < 0.05 and pB_hat > pA_hat:
        z_wins += 1

    # Bayesian with Beta(1,1) priors
    postA = beta(xA+1, nA-xA+1)
    postB = beta(xB+1, nB-xB+1)
    # Monte Carlo posterior comparison
    drawA = postA.rvs(2000, random_state=rng)
    drawB = postB.rvs(2000, random_state=rng)
    prob_B_better = np.mean(drawB > drawA)
    if prob_B_better > 0.95:
        bayes_wins += 1

z_power = z_wins / sims
bayes_power = bayes_wins / sims
z_power, bayes_power

Deliverables: (1) notebook or script, (2) chart of power vs. sample size, (3) a short write-up of assumptions and recommendations.

Practical projects

Churn Markov Model: define states (Active, Passive, Churn), estimate transition matrix from data, forecast retention.
Demand Modeling: fit Poisson/Negative Binomial to daily orders, simulate inventory risk and stockout probabilities.
Risk Scoring: build a simple Bayesian spam/fraud score using word/feature likelihoods and a tunable prior.

Subskills

Random Variables and Distributions — Understand PMF/PDF/CDF and when to use Bernoulli, Binomial, Poisson, Normal, Exponential.
Conditional Probability and Bayes Rule — Compute posteriors and reason with base rates in practical settings.
Expectation, Variance, Covariance — Calculate and combine moments; interpret correlation vs. causation carefully.
Law of Large Numbers and CLT — Use sampling distributions to form confidence intervals and sanity-check estimates.
Probability Inequalities Basics — Apply Markov and Chebyshev for conservative bounds when assumptions are weak.
Markov Chains Basics — Model sequential user states and long-run behavior.
Simulation and Monte Carlo — Estimate complex probabilities, validate models, and plan experiments.
Probabilistic Thinking for Modeling — Map business questions to probabilistic structures and test assumptions.

Next steps

Re-implement every example with your own numbers and validate via simulation.
Apply probability to one real dataset (experimentation, funnel, or demand).
Move on to statistical inference and causal analysis after you are comfortable with CLT and Bayesian basics.

Menu

Probability

Table of Contents

Why Probability matters for Data Scientists

Who this is for

Prerequisites

Learning path (practical roadmap)

Worked examples (with code)

1) Bayes for simple spam filtering

2) Binomial probability for an A/B test

3) Expected value and variance of revenue

4) CLT-based confidence interval for a mean

5) Markov chain: predicting user state

Drills and quick exercises

Common mistakes and debugging tips

Mini project: A/B Test Outcome Simulator

Practical projects

Subskills

Next steps

Probability — Skill Exam

Topics

Probability Inequalities Basics

Markov Chains Basics

Simulation And Monte Carlo

Probabilistic Thinking For Modeling

Random Variables And Distributions

Conditional Probability And Bayes Rule

Expectation Variance Covariance

Law Of Large Numbers And CLT

Have questions about Probability?

AI Assistant