Why this matters
Data Scientists use expectation (average outcome), variance (uncertainty), and covariance (how two quantities move together) to make decisions and quantify risk. You will apply these when:
- Estimating expected revenue or click-through from a campaign.
- Summarizing model predictions and their uncertainty.
- Combining features in linear models and understanding error propagation.
- Diagnosing relationships between variables (are they moving together or in opposite directions?).
Concept explained simply
- Expectation E[X]: the long-run average of X if you repeated the process many times.
- Variance Var(X): how spread out X is around its average. Larger variance = more uncertainty.
- Covariance Cov(X, Y): whether X and Y move together. Positive means they increase together; negative means when one increases, the other tends to decrease. Zero means no linear relationship.
- Correlation ρ(X, Y): a standardized covariance in [-1, 1].
Mental model
- Expectation: the balance point of the distribution.
- Variance: average squared distance from the balance point (units squared).
- Covariance: a signed measure of co-movement; think of two dancers moving in sync (positive), opposite (negative), or independently (near zero).
Key formulas and properties
- Linearity of expectation: E[aX + b] = a E[X] + b; and E[aX + bY + c] = aE[X] + bE[Y] + c.
- Variance: Var(X) = E[X^2] − (E[X])^2.
- Scaling: Var(aX + b) = a^2 Var(X).
- Sum of variables: Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X, Y).
- Independence: if X and Y are independent, then Cov(X, Y) = 0, so Var(X + Y) = Var(X) + Var(Y).
- Covariance scaling: Cov(aX + b, cY + d) = ac Cov(X, Y).
- Correlation: ρ(X, Y) = Cov(X, Y) / (σ_X σ_Y), where σ = standard deviation.
- Law of total expectation: E[X] = E[ E[X | Y] ].
- Law of total variance: Var(X) = E[ Var(X | Y) ] + Var( E[X | Y] ).
Worked examples
Example 1: Discrete — coin flips
Let X be the number of heads in 2 fair coin flips (Binomial n=2, p=0.5).
- E[X] = np = 2 × 0.5 = 1.
- Var(X) = np(1 − p) = 2 × 0.5 × 0.5 = 0.5.
Interpretation: on average 1 head; modest uncertainty.
Example 2: Continuous — Uniform(0,1)
- E[X] = 0.5.
- Var(X) = 1/12 ≈ 0.0833.
Interpretation: outcomes are evenly spread from 0 to 1 with low variance.
Example 3: Negative covariance — dice that sum to 7
Let X be a fair die (1–6), and Y = 7 − X. Then:
- E[X] = 3.5, E[Y] = 3.5.
- E[XY] = E[X(7 − X)] = 7E[X] − E[X^2] = 49/2 − 91/6 = 28/3.
- Cov(X, Y) = E[XY] − E[X]E[Y] = 28/3 − (7/2)(7/2) = −35/12 ≈ −2.9167.
Interpretation: when X is high, Y must be low; strong negative linear relationship.
Example 4: Variance of a linear combination (with covariance)
Suppose S = 2F1 + 0.5F2 where Var(F1)=4, Var(F2)=1, Cov(F1,F2)=0.6.
- Var(S) = 2^2 Var(F1) + 0.5^2 Var(F2) + 2·2·0.5·Cov(F1,F2) = 4×4 + 0.25×1 + 2×2×0.5×0.6 = 16 + 0.25 + 1.2 = 17.45.
Interpretation: covariance contributes to the overall uncertainty of the score.
Practice: do it now
Use this checklist when solving EV/Var/Cov problems:
- Identify the random variables and what each represents.
- Write down known parameters (means, variances, probabilities).
- Choose the formula: linearity, variance identity, or covariance.
- Compute step-by-step; keep units consistent (variance is in squared units).
- Interpret results in plain language.
Exercise 1 (mirrors ex1): A/B revenue variance
Each visitor converts with probability p = 0.04. Each conversion yields 120 revenue units. Let X ~ Bernoulli(0.04), R = 120X.
- Compute E[R] and Var(R).
- For 1000 independent visitors, compute expected total revenue and its standard deviation.
Try it before viewing the solution.
Exercise 2 (mirrors ex2): Linear score with covariance
Let F1 and F2 be features with E[F1]=4, E[F2]=3, Var(F1)=1.5, Var(F2)=2.0, Cov(F1,F2)=−0.8. Define S = 2F1 + F2.
- Compute E[S].
- Compute Var(S).
Interpret what negative covariance does to the uncertainty of S.
Common mistakes and how to self-check
- Forgetting that expectation is linear even when variables are dependent. Fix: always apply linearity first.
- Dropping the 2ab Cov(X,Y) term for Var(X+Y) when variables are not independent. Fix: check independence before simplifying.
- Confusing standard deviation with variance. Fix: SD = sqrt(Var).
- Using E[XY] = E[X]E[Y] without independence. Fix: verify independence or compute E[XY] directly.
- Ignoring units: variance has squared units. Fix: interpret SD for intuitive scale.
Self-check routine
- State assumptions (independence?) explicitly.
- Re-derive using both Var(X)=E[X^2]−E[X]^2 and linear-combination formulas; answers must match.
- Sanity check: does variance become zero if the variable is constant?
- Sign check: is covariance sign consistent with the story?
Practical projects
- Campaign planning: Build a simple spreadsheet that takes conversion rate p and average order value A and outputs expected revenue and its SD for N visitors.
- Feature combination risk: Given feature means, variances, and covariance, compute the mean and variance of a linear score S = w1F1 + w2F2. Explore how changing covariance changes SD(S).
- Scenario analysis: Use Law of Total Variance by splitting users into segments (e.g., new vs returning) with different p; compute overall Var using E[Var|segment] + Var(E|segment).
Who this is for
- Aspiring and practicing Data Scientists needing strong statistical fundamentals.
- Analysts and ML engineers interpreting experiments and model outputs.
Prerequisites
- Basic probability (events, distributions, independence).
- Algebra with sums and squares; comfortable with averages.
Learning path
- Master expectation, variance, covariance basics (this lesson).
- Apply to common distributions (Bernoulli, Binomial, Normal).
- Use conditional expectation/variance in segmentation and Bayesian updates.
- Connect to correlation, regression, and error propagation.
Next steps
- Complete the exercises above.
- Take the Quick Test below to check understanding. The test is available to everyone; only logged-in users get saved progress.
- Build one Practical Project from the list and write a short interpretation of results.
Mini challenge
A product’s weekly revenue R is 5 times the number of conversions C. Conversions C ~ Binomial(n=200, p=0.03). Estimate E[R] and SD(R). Hint: use Var(aX)=a^2 Var(X) and Var(Binomial)=np(1−p).
Reveal a quick solution sketch
- E[C]=200×0.03=6; Var(C)=200×0.03×0.97=5.82.
- R=5C ⇒ E[R]=5×6=30; Var(R)=25×5.82=145.5 ⇒ SD(R)=√145.5≈12.06.