luvv to helpDiscover the Best Free Online Tools
Topic 5 of 8

Random Variables And Distributions

Learn Random Variables And Distributions for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters for Data Scientists

Random variables and distributions let you quantify uncertainty. You will use them to:

  • Design and interpret A/B tests (Binomial, Normal approximation).
  • Model event counts like errors, tickets, or clicks per minute (Poisson).
  • Estimate SLAs and risk using tail probabilities (Normal, Exponential).
  • Calibrate model outputs and set decision thresholds (distributions and quantiles).
  • Simulate outcomes to compare product or policy choices (Monte Carlo with known distributions).

Concept explained simply

A random variable is a rule that turns uncertain outcomes into numbers. A distribution tells you how likely each number (or range) is.

  • Discrete random variable: takes separate values (e.g., number of signups). Described by a PMF P(X = x).
  • Continuous random variable: takes any value in a range (e.g., time in seconds). Described by a PDF f(x). Probabilities come from areas: P(a ≤ X ≤ b) = ∫ab f(x) dx.
  • CDF F(x): probability that X ≤ x. It works for both discrete and continuous cases.
Cheat sheet: quantities you will use often
  • Expectation (mean): E[X]. Linearity: E[aX + b] = aE[X] + b.
  • Variance: Var(X) = E[(X − E[X])^2]. For constants: Var(aX + b) = a^2 Var(X).
  • Bernoulli(p): E[X] = p, Var(X) = p(1−p).
  • Binomial(n, p): E[X] = np, Var(X) = np(1−p).
  • Poisson(λ): E[X] = Var(X) = λ.
  • Normal(μ, σ^2): Z = (X − μ)/σ ~ Normal(0,1).
  • Exponential(λ): mean 1/λ, memoryless.

Mental model

Think of a distribution as a landscape of likelihood. For discrete variables, it is like a row of bars (heights = probabilities). For continuous variables, it is a smooth hill (height = density). The exact probability is the bar height (discrete) or the area under the curve (continuous) over a region.

Core formulas you will actually use

  • Binomial probability: P(X = k) = C(n,k) p^k (1−p)^{n−k}.
  • Poisson probability: P(X = k) = e^{−λ} λ^k / k!.
  • Normal standardization: Z = (X − μ)/σ; use Z to find probabilities by areas.
  • Normal approximation to Binomial: if np ≥ 10 and n(1−p) ≥ 10, X ≈ Normal(np, np(1−p)).
  • Law of total expectation: E[X] = E[E[X | Y]].
  • Scaling Poisson: if rate is λ per unit time, then over t units, use λt.

Worked examples

1) Binomial: at least 3 signups out of 20 with p = 0.1

Let X ~ Binomial(n=20, p=0.1). We want P(X ≥ 3) = 1 − [P(0) + P(1) + P(2)].

  • P(0) = 0.9^{20} ≈ 0.1216
  • P(1) = 20·0.1·0.9^{19} ≈ 0.2702
  • P(2) = C(20,2)·0.1^2·0.9^{18} ≈ 0.2852

Sum ≈ 0.6769, so P(X ≥ 3) ≈ 0.3231.

2) Normal: late deliveries beyond 40 minutes

Assume delivery time T ~ Normal(μ=30, σ=5). Probability of being late (T > 40): Z = (40 − 30)/5 = 2, so P(T > 40) ≈ 0.0228 (about 2.3%).

95th percentile: 30 + 1.645·5 ≈ 38.2 minutes.

3) Poisson: chance of at least one defect per batch

Defects per batch D ~ Poisson(λ=2.5). P(D ≥ 1) = 1 − P(0) = 1 − e^{−2.5} ≈ 0.918.

Two independent batches: P(no defects in both) = e^{−5} ≈ 0.0067, so P(at least one defect across two) ≈ 0.9933.

4) Mixtures: conversion rate across platforms

40% mobile with p=0.04, 60% desktop with p=0.06. Expected conversions per 100 users: 100 · (0.4·0.04 + 0.6·0.06) = 5.2.

Exercises you can try now

These mirror the interactive exercises below. Try them here first, then submit in the exercise panel.

  1. CTR estimation (Binomial/Normal approximation): You observe 20 clicks out of 200 impressions. Compute p-hat, a 95% normal-approximation confidence interval, and the expected number of clicks in the next 1000 impressions.
  2. Ticket times (Normal): T ~ Normal(μ=30, σ=8). Find P(T > 45) and the 90th percentile time.
  3. Calls (Poisson): Calls arrive at 2.5 per minute on average. Compute P(X ≥ 5) in one minute, and P(0 calls) in 30 seconds.
Need a nudge? Hints
  • For a proportion, p-hat = x/n and SE ≈ sqrt(p-hat(1 − p-hat)/n); 95% CI ≈ p-hat ± 1.96·SE.
  • For Normal, standardize with Z = (x − μ)/σ; use common Z values (1.28, 1.645, 1.96, 2.33).
  • For Poisson, scale λ by time window: λnew = λ·t.
  • I computed each answer symbolically before plugging numbers.
  • I checked units (minutes vs. seconds; impressions vs. clicks).
  • I validated that probabilities are between 0 and 1.

Common mistakes and how to self-check

  • Mixing PDF with probability: For continuous X, P(X = a) = 0. Use areas under the curve, not f(a).
  • Using Normal approximation when np or n(1−p) is too small: Check both are at least around 10.
  • Forgetting to scale Poisson rates with time or area: Always adjust λ by the interval length.
  • Confusing variance and standard deviation: SD is the square root of variance.
  • Ignoring independence assumptions: Binomial needs independent trials with constant p. If not, consider alternative models.
Self-check mini-list
  • Did I write down the distribution and its parameters before calculating?
  • Did I verify assumptions (independence, identical p, rate stability)?
  • Are my results sensible in magnitude and units?

Practical projects

  • A/B test simulator: Simulate Binomial outcomes for control and treatment, compute lift distributions, and visualize overlap.
  • Queue risk dashboard: Model ticket arrivals with Poisson; compute the probability of exceeding capacity in each 15-minute window.
  • Anomaly thresholding: Fit a Normal distribution to a stable metric and set dynamic alert thresholds using quantiles (e.g., 99.5th percentile).
Suggested steps for the A/B simulator
  1. Choose n and p for control and treatment.
  2. Simulate outcomes 10,000 times for each arm.
  3. Compute lift and the proportion lift > 0.
  4. Plot histograms and report the 95% interval of lift.

Learning path

  1. Discrete basics: Bernoulli, Binomial, Geometric; expectation and variance.
  2. Counts: Poisson and Poisson processes; scaling and sums.
  3. Continuous basics: Uniform, Normal, Exponential; PDFs vs. CDFs.
  4. Approximations: Normal approx to Binomial; Central Limit Theorem intuition.
  5. Intervals and quantiles: Using Z-scores; interpreting tail risk.
  6. Mixtures and conditioning: Law of total expectation/variance.
  7. Simulation: Monte Carlo to validate analytic results.

Who this is for

  • Aspiring and junior Data Scientists preparing for product analytics, experimentation, or modeling roles.
  • Analysts and engineers who want reliable uncertainty estimates for decisions.

Prerequisites

  • Comfort with basic algebra and percentages.
  • Familiarity with mean, variance, and standard deviation.
  • A calculator or spreadsheet for simple computations (optional: Python/R for simulation).

Next steps

  • Practice by analyzing a small A/B test with Binomial confidence intervals.
  • Estimate the probability of breaching an SLA using Normal tails.
  • Move on to Sampling, CLT, and Hypothesis Testing to connect distributions with inference.

Mini challenge

A daily active user (DAU) session length S is modeled as a mixture: 70% short users with Exponential(λ=1/10 min), 30% power users with Exponential(λ=1/40 min). Compute:

  • The expected session length E[S].
  • P(S > 30 min).
Show reasoning
  • E[S] = 0.7·10 + 0.3·40 = 7 + 12 = 19 minutes.
  • P(S > 30) = 0.7·e^{−30/10} + 0.3·e^{−30/40} ≈ 0.7·e^{−3} + 0.3·e^{−0.75} ≈ 0.7·0.0498 + 0.3·0.4724 ≈ 0.0349 + 0.1417 ≈ 0.1766.

Quick Test

Everyone can take the Quick Test below. If you log in, your progress will be saved.

Practice Exercises

3 exercises to complete

Instructions

You observed 20 clicks out of 200 impressions.
  • Compute the point estimate p-hat.
  • Compute a 95% confidence interval using the Normal approximation.
  • Estimate expected clicks in the next 1000 impressions based on p-hat.
Expected Output
p-hat, 95% CI bounds, expected clicks for 1000 impressions (three numbers).

Random Variables And Distributions — Quick Test

Test your knowledge with 9 questions. Pass with 70% or higher.

9 questions70% to pass

Have questions about Random Variables And Distributions?

AI Assistant

Ask questions about this tool