Why this matters
Markov chains help Data Scientists model how systems move between states over time when the next step depends only on the current state (not the full history). You will use them to:
- Estimate user retention and churn across lifecycle states (Active → Dormant → Churned).
- Analyze conversion funnels (Visitor → Evaluating → Paid).
- Model regime changes in time series (e.g., Bull ↔ Bear markets).
- Understand random walks behind algorithms like PageRank and the intuition for MCMC.
Who this is for
- Data Scientists and Analysts who work with sequential behavior or state transitions.
- ML engineers wanting intuition for stochastic processes and MCMC foundations.
Prerequisites
- Basic probability (events, conditional probability).
- Matrix basics: vectors, matrix multiplication, and sums.
- Comfort with simple arithmetic and decimals.
Concept explained simply
A Markov chain is a set of states with probabilities of moving from one state to another in one step. Put those probabilities in a square matrix P (rows sum to 1). If you start with a probability vector over states (like 100% in one state), multiplying by P moves you one step forward.
- States: S = {s1, s2, ..., sk}.
- Transition matrix: P where P[i,j] = Pr(next = j | current = i).
- One step: row vector v(0) → v(1) = v(0) P.
- n steps: v(n) = v(0) P^n.
- Stationary distribution: a row vector π with πP = π (long-run proportions, under suitable conditions).
Mental model
Imagine a token sitting on a board of labeled squares (states). Each turn, you roll a loaded die whose sides depend on the square you are on. The die tells you which square you move to next. The transition matrix is the collection of those dice.
Core pieces to remember
- Each row of P must sum to 1 and all entries are between 0 and 1.
- n-step transitions come from P^n (matrix power).
- Stationary distribution π solves πP = π and sum(π) = 1.
- Absorbing state: row with a 1 on the diagonal and 0 elsewhere (once entered, you stay).
Worked examples
Example 1: Weather (2 states)
States: S = {Sunny (S), Rain (R)}. Transition matrix:
P = [ [0.7, 0.3],
[0.4, 0.6] ]
Q1. Two-step probability Sunny → Sunny?
Compute P^2 and read entry (S,S):
P^2 = P × P
= [ [0.7*0.7 + 0.3*0.4, 0.7*0.3 + 0.3*0.6],
[0.4*0.7 + 0.6*0.4, 0.4*0.3 + 0.6*0.6] ]
= [ [0.49 + 0.12, 0.21 + 0.18],
[0.28 + 0.24, 0.12 + 0.36] ]
= [ [0.61, 0.39],
[0.52, 0.48] ]
Answer: 0.61.
Q2. Stationary distribution π?
Let π = [x, 1−x]. Solve x = 0.7x + 0.4(1−x): x = 0.7x + 0.4 − 0.4x ⇒ 0.3x = 0.4 ⇒ x ≈ 1.333... (not possible!). Wait—check: correct equation is x = x*0.7 + (1−x)*0.4 = 0.7x + 0.4 − 0.4x = 0.3x + 0.4 ⇒ x − 0.3x = 0.4 ⇒ 0.7x = 0.4 ⇒ x = 4/7 ≈ 0.5714. So π ≈ [0.5714, 0.4286].
Example 2: Conversion funnel (3 states, absorbing)
States: V (Visitor), E (Evaluating), P (Paid, absorbing).
P = [ [0.5, 0.4, 0.1],
[0.2, 0.5, 0.3],
[0.0, 0.0, 1.0] ]
Starting from V, probability of being Paid after 2 steps is entry (V,P) of P^2.
P^2 (row V to col P) = 0.5*0.1 + 0.4*0.3 + 0.1*1 = 0.05 + 0.12 + 0.10 = 0.27
Answer: 0.27.
Example 3: Brand switching (stationary distribution)
States: A, B. Transition matrix:
P = [ [0.8, 0.2],
[0.3, 0.7] ]
Find π = [x, 1−x] with πP = π: x = 0.8x + 0.3(1−x) = 0.8x + 0.3 − 0.3x = 0.5x + 0.3 ⇒ 0.5x = 0.3 ⇒ x = 0.6. So π = [0.6, 0.4].
How to compute (by hand or spreadsheet)
- Write states in a consistent order and build P with rows summing to 1.
- One-step forecast from current distribution v: compute v × P.
- n-step forecast: compute P^n (small n by hand; larger n with a spreadsheet or code) and use v × P^n.
- Stationary distribution: solve πP = π with sum(π)=1 (for 2×2, use one equation + normalization).
- Absorbing checks: look for rows like [0, ..., 1, ..., 0] with the 1 on the diagonal.
Exercises
Do these now. The Quick Test is available to everyone; only logged-in users get saved progress.
- Exercise ex1: Two-step probability in a 2-state chain. Compute the probability of being in A after 2 steps when starting from A for the given matrix.
- Exercise ex2: Stationary distribution of a 2×2 chain. Solve for π.
Self-check checklist
- [ ] All rows of P sum to 1 (within rounding tolerance).
- [ ] Probabilities are in [0, 1].
- [ ] Matrix shapes match when multiplying (row vector × matrix).
- [ ] For stationary distribution, verify πP ≈ π and sum(π)=1.
Common mistakes and how to self-check
Mistake: Mixing up row vs column stochastic matrices
In this lesson we use row-stochastic matrices (rows sum to 1) and row vectors on the left. If you use column-stochastic by accident, your results will be inconsistent. Self-check: verify row sums equal 1 before multiplying.
Mistake: Confusing one-step with n-step probabilities
Reading from P gives one-step probabilities. For 2-step, use P^2. Self-check: if your result is greater than 1 or less than 0, you likely read from P instead of P^n.
Mistake: Assuming stationary distribution always exists and is unique
Uniqueness and convergence need conditions (e.g., finite, irreducible, aperiodic). Self-check: if there is an absorbing state, expect stationary mass on it.
Mistake: Rounding too early
Round only at the end. Self-check: recompute with more precision and ensure rows still sum to ~1.
Practical projects
- Simulate a 2-state weather chain for 1000 steps; compare empirical frequencies to the stationary distribution.
- Model a 3-state product funnel (Visitor, Trial, Paid). Estimate P from counts and forecast 7-day conversion.
- Build a simple word-level Markov text generator (order-1). Show a 50-word sample.
- What-if analysis: increase E→P by +0.05 and re-forecast 3-step paid probability.
Mini challenge
Design a 3-state chain with long-run distribution approximately [0.2, 0.5, 0.3]. Propose a row-stochastic P that makes this plausible, and verify numerically that πP ≈ π.
Learning path
- Step 1: Define states clearly and collect transition counts.
- Step 2: Normalize counts into a transition matrix P.
- Step 3: Answer one-step and n-step questions with P^n.
- Step 4: Find stationary distribution and interpret business meaning.
- Step 5: Explore absorbing states and expected long-term behavior.
- Step 6: Connect to PageRank and the intuition for MCMC.
Next steps
- Estimate transition matrices from real data and add confidence intervals.
- Study mixing time and convergence diagnostics.
- Extend to Markov Decision Processes (MDPs) with actions and rewards.
Quick Test
Take the Quick Test to confirm your understanding. Available to everyone; only logged-in users get saved progress.