How to learn Expectation Variance Covariance for Probability in Data Scientist for free

Why this matters

Data Scientists use expectation (average outcome), variance (uncertainty), and covariance (how two quantities move together) to make decisions and quantify risk. You will apply these when:

Estimating expected revenue or click-through from a campaign.
Summarizing model predictions and their uncertainty.
Combining features in linear models and understanding error propagation.
Diagnosing relationships between variables (are they moving together or in opposite directions?).

Concept explained simply

Expectation E[X]: the long-run average of X if you repeated the process many times.
Variance Var(X): how spread out X is around its average. Larger variance = more uncertainty.
Covariance Cov(X, Y): whether X and Y move together. Positive means they increase together; negative means when one increases, the other tends to decrease. Zero means no linear relationship.
Correlation ρ(X, Y): a standardized covariance in [-1, 1].

Mental model

Expectation: the balance point of the distribution.
Variance: average squared distance from the balance point (units squared).
Covariance: a signed measure of co-movement; think of two dancers moving in sync (positive), opposite (negative), or independently (near zero).

Key formulas and properties

Linearity of expectation: E[aX + b] = a E[X] + b; and E[aX + bY + c] = aE[X] + bE[Y] + c.
Variance: Var(X) = E[X^2] − (E[X])^2.
Scaling: Var(aX + b) = a^2 Var(X).
Sum of variables: Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X, Y).
Independence: if X and Y are independent, then Cov(X, Y) = 0, so Var(X + Y) = Var(X) + Var(Y).
Covariance scaling: Cov(aX + b, cY + d) = ac Cov(X, Y).
Correlation: ρ(X, Y) = Cov(X, Y) / (σ_X σ_Y), where σ = standard deviation.
Law of total expectation: E[X] = E[ E[X | Y] ].
Law of total variance: Var(X) = E[ Var(X | Y) ] + Var( E[X | Y] ).

Worked examples

Example 1: Discrete — coin flips

Let X be the number of heads in 2 fair coin flips (Binomial n=2, p=0.5).

E[X] = np = 2 × 0.5 = 1.
Var(X) = np(1 − p) = 2 × 0.5 × 0.5 = 0.5.

Interpretation: on average 1 head; modest uncertainty.

Example 2: Continuous — Uniform(0,1)

E[X] = 0.5.
Var(X) = 1/12 ≈ 0.0833.

Interpretation: outcomes are evenly spread from 0 to 1 with low variance.

Example 3: Negative covariance — dice that sum to 7

Let X be a fair die (1–6), and Y = 7 − X. Then:

E[X] = 3.5, E[Y] = 3.5.
E[XY] = E[X(7 − X)] = 7E[X] − E[X^2] = 49/2 − 91/6 = 28/3.
Cov(X, Y) = E[XY] − E[X]E[Y] = 28/3 − (7/2)(7/2) = −35/12 ≈ −2.9167.

Interpretation: when X is high, Y must be low; strong negative linear relationship.

Example 4: Variance of a linear combination (with covariance)

Suppose S = 2F1 + 0.5F2 where Var(F1)=4, Var(F2)=1, Cov(F1,F2)=0.6.

Var(S) = 2^2 Var(F1) + 0.5^2 Var(F2) + 2·2·0.5·Cov(F1,F2) = 4×4 + 0.25×1 + 2×2×0.5×0.6 = 16 + 0.25 + 1.2 = 17.45.

Interpretation: covariance contributes to the overall uncertainty of the score.

Practice: do it now

Use this checklist when solving EV/Var/Cov problems:

Identify the random variables and what each represents.
Write down known parameters (means, variances, probabilities).
Choose the formula: linearity, variance identity, or covariance.
Compute step-by-step; keep units consistent (variance is in squared units).
Interpret results in plain language.

Exercise 1 (mirrors ex1): A/B revenue variance

Each visitor converts with probability p = 0.04. Each conversion yields 120 revenue units. Let X ~ Bernoulli(0.04), R = 120X.

Compute E[R] and Var(R).
For 1000 independent visitors, compute expected total revenue and its standard deviation.

Try it before viewing the solution.

Exercise 2 (mirrors ex2): Linear score with covariance

Let F1 and F2 be features with E[F1]=4, E[F2]=3, Var(F1)=1.5, Var(F2)=2.0, Cov(F1,F2)=−0.8. Define S = 2F1 + F2.

Compute E[S].
Compute Var(S).

Interpret what negative covariance does to the uncertainty of S.

Common mistakes and how to self-check

Forgetting that expectation is linear even when variables are dependent. Fix: always apply linearity first.
Dropping the 2ab Cov(X,Y) term for Var(X+Y) when variables are not independent. Fix: check independence before simplifying.
Confusing standard deviation with variance. Fix: SD = sqrt(Var).
Using E[XY] = E[X]E[Y] without independence. Fix: verify independence or compute E[XY] directly.
Ignoring units: variance has squared units. Fix: interpret SD for intuitive scale.

Self-check routine

State assumptions (independence?) explicitly.
Re-derive using both Var(X)=E[X^2]−E[X]^2 and linear-combination formulas; answers must match.
Sanity check: does variance become zero if the variable is constant?
Sign check: is covariance sign consistent with the story?

Practical projects

Campaign planning: Build a simple spreadsheet that takes conversion rate p and average order value A and outputs expected revenue and its SD for N visitors.
Feature combination risk: Given feature means, variances, and covariance, compute the mean and variance of a linear score S = w1F1 + w2F2. Explore how changing covariance changes SD(S).
Scenario analysis: Use Law of Total Variance by splitting users into segments (e.g., new vs returning) with different p; compute overall Var using E[Var|segment] + Var(E|segment).

Who this is for

Aspiring and practicing Data Scientists needing strong statistical fundamentals.
Analysts and ML engineers interpreting experiments and model outputs.

Prerequisites

Basic probability (events, distributions, independence).
Algebra with sums and squares; comfortable with averages.

Learning path

Master expectation, variance, covariance basics (this lesson).
Apply to common distributions (Bernoulli, Binomial, Normal).
Use conditional expectation/variance in segmentation and Bayesian updates.
Connect to correlation, regression, and error propagation.

Next steps

Complete the exercises above.
Take the Quick Test below to check understanding. The test is available to everyone; only logged-in users get saved progress.
Build one Practical Project from the list and write a short interpretation of results.

Mini challenge

A product’s weekly revenue R is 5 times the number of conversions C. Conversions C ~ Binomial(n=200, p=0.03). Estimate E[R] and SD(R). Hint: use Var(aX)=a^2 Var(X) and Var(Binomial)=np(1−p).

Reveal a quick solution sketch

E[C]=200×0.03=6; Var(C)=200×0.03×0.97=5.82.
R=5C ⇒ E[R]=5×6=30; Var(R)=25×5.82=145.5 ⇒ SD(R)=√145.5≈12.06.

Menu

Expectation Variance Covariance

Table of Contents

Why this matters

Concept explained simply

Key formulas and properties

Worked examples

Practice: do it now

Common mistakes and how to self-check

Practical projects

Who this is for

Prerequisites

Learning path

Next steps

Mini challenge

Practice Exercises

A/B revenue variance

Instructions

Expected Output

Linear score with covariance

Expectation Variance Covariance — Quick Test

Have questions about Expectation Variance Covariance?

AI Assistant