How to learn Markov Chains Basics for Probability in Data Scientist for free

Why this matters

Markov chains help Data Scientists model how systems move between states over time when the next step depends only on the current state (not the full history). You will use them to:

Estimate user retention and churn across lifecycle states (Active → Dormant → Churned).
Analyze conversion funnels (Visitor → Evaluating → Paid).
Model regime changes in time series (e.g., Bull ↔ Bear markets).
Understand random walks behind algorithms like PageRank and the intuition for MCMC.

Who this is for

Data Scientists and Analysts who work with sequential behavior or state transitions.
ML engineers wanting intuition for stochastic processes and MCMC foundations.

Prerequisites

Basic probability (events, conditional probability).
Matrix basics: vectors, matrix multiplication, and sums.
Comfort with simple arithmetic and decimals.

Concept explained simply

A Markov chain is a set of states with probabilities of moving from one state to another in one step. Put those probabilities in a square matrix P (rows sum to 1). If you start with a probability vector over states (like 100% in one state), multiplying by P moves you one step forward.

States: S = {s1, s2, ..., sk}.
Transition matrix: P where P[i,j] = Pr(next = j | current = i).
One step: row vector v(0) → v(1) = v(0) P.
n steps: v(n) = v(0) P^n.
Stationary distribution: a row vector π with πP = π (long-run proportions, under suitable conditions).

Mental model

Imagine a token sitting on a board of labeled squares (states). Each turn, you roll a loaded die whose sides depend on the square you are on. The die tells you which square you move to next. The transition matrix is the collection of those dice.

Core pieces to remember

Each row of P must sum to 1 and all entries are between 0 and 1.
n-step transitions come from P^n (matrix power).
Stationary distribution π solves πP = π and sum(π) = 1.
Absorbing state: row with a 1 on the diagonal and 0 elsewhere (once entered, you stay).

Worked examples

Example 1: Weather (2 states)

States: S = {Sunny (S), Rain (R)}. Transition matrix:

P = [ [0.7, 0.3],
      [0.4, 0.6] ]

Q1. Two-step probability Sunny → Sunny?

Compute P^2 and read entry (S,S):

P^2 = P × P
= [ [0.7*0.7 + 0.3*0.4, 0.7*0.3 + 0.3*0.6],
    [0.4*0.7 + 0.6*0.4, 0.4*0.3 + 0.6*0.6] ]
= [ [0.49 + 0.12, 0.21 + 0.18],
    [0.28 + 0.24, 0.12 + 0.36] ]
= [ [0.61, 0.39],
    [0.52, 0.48] ]

Answer: 0.61.

Q2. Stationary distribution π?

Let π = [x, 1−x]. Solve x = 0.7x + 0.4(1−x): x = 0.7x + 0.4 − 0.4x ⇒ 0.3x = 0.4 ⇒ x ≈ 1.333... (not possible!). Wait—check: correct equation is x = x*0.7 + (1−x)*0.4 = 0.7x + 0.4 − 0.4x = 0.3x + 0.4 ⇒ x − 0.3x = 0.4 ⇒ 0.7x = 0.4 ⇒ x = 4/7 ≈ 0.5714. So π ≈ [0.5714, 0.4286].

Example 2: Conversion funnel (3 states, absorbing)

States: V (Visitor), E (Evaluating), P (Paid, absorbing).

P = [ [0.5, 0.4, 0.1],
      [0.2, 0.5, 0.3],
      [0.0, 0.0, 1.0] ]

Starting from V, probability of being Paid after 2 steps is entry (V,P) of P^2.

P^2 (row V to col P) = 0.5*0.1 + 0.4*0.3 + 0.1*1 = 0.05 + 0.12 + 0.10 = 0.27

Answer: 0.27.

Example 3: Brand switching (stationary distribution)

States: A, B. Transition matrix:

P = [ [0.8, 0.2],
      [0.3, 0.7] ]

Find π = [x, 1−x] with πP = π: x = 0.8x + 0.3(1−x) = 0.8x + 0.3 − 0.3x = 0.5x + 0.3 ⇒ 0.5x = 0.3 ⇒ x = 0.6. So π = [0.6, 0.4].

How to compute (by hand or spreadsheet)

Write states in a consistent order and build P with rows summing to 1.
One-step forecast from current distribution v: compute v × P.
n-step forecast: compute P^n (small n by hand; larger n with a spreadsheet or code) and use v × P^n.
Stationary distribution: solve πP = π with sum(π)=1 (for 2×2, use one equation + normalization).
Absorbing checks: look for rows like [0, ..., 1, ..., 0] with the 1 on the diagonal.

Exercises

Do these now. The Quick Test is available to everyone; only logged-in users get saved progress.

Exercise ex1: Two-step probability in a 2-state chain. Compute the probability of being in A after 2 steps when starting from A for the given matrix.
Exercise ex2: Stationary distribution of a 2×2 chain. Solve for π.

Self-check checklist

[ ] All rows of P sum to 1 (within rounding tolerance).
[ ] Probabilities are in [0, 1].
[ ] Matrix shapes match when multiplying (row vector × matrix).
[ ] For stationary distribution, verify πP ≈ π and sum(π)=1.

Common mistakes and how to self-check

Mistake: Mixing up row vs column stochastic matrices

In this lesson we use row-stochastic matrices (rows sum to 1) and row vectors on the left. If you use column-stochastic by accident, your results will be inconsistent. Self-check: verify row sums equal 1 before multiplying.

Mistake: Confusing one-step with n-step probabilities

Reading from P gives one-step probabilities. For 2-step, use P^2. Self-check: if your result is greater than 1 or less than 0, you likely read from P instead of P^n.

Mistake: Assuming stationary distribution always exists and is unique

Uniqueness and convergence need conditions (e.g., finite, irreducible, aperiodic). Self-check: if there is an absorbing state, expect stationary mass on it.

Mistake: Rounding too early

Round only at the end. Self-check: recompute with more precision and ensure rows still sum to ~1.

Practical projects

Simulate a 2-state weather chain for 1000 steps; compare empirical frequencies to the stationary distribution.
Model a 3-state product funnel (Visitor, Trial, Paid). Estimate P from counts and forecast 7-day conversion.
Build a simple word-level Markov text generator (order-1). Show a 50-word sample.
What-if analysis: increase E→P by +0.05 and re-forecast 3-step paid probability.

Mini challenge

Design a 3-state chain with long-run distribution approximately [0.2, 0.5, 0.3]. Propose a row-stochastic P that makes this plausible, and verify numerically that πP ≈ π.

Learning path

Step 1: Define states clearly and collect transition counts.
Step 2: Normalize counts into a transition matrix P.
Step 3: Answer one-step and n-step questions with P^n.
Step 4: Find stationary distribution and interpret business meaning.
Step 5: Explore absorbing states and expected long-term behavior.
Step 6: Connect to PageRank and the intuition for MCMC.

Next steps

Estimate transition matrices from real data and add confidence intervals.
Study mixing time and convergence diagnostics.
Extend to Markov Decision Processes (MDPs) with actions and rewards.

Quick Test

Take the Quick Test to confirm your understanding. Available to everyone; only logged-in users get saved progress.

Menu

Markov Chains Basics

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Core pieces to remember

Worked examples

How to compute (by hand or spreadsheet)

Exercises

Self-check checklist

Common mistakes and how to self-check

Practical projects

Mini challenge

Learning path

Next steps

Quick Test

Practice Exercises

Two-step probability in a 2-state chain

Instructions

Expected Output

Stationary distribution of a 2×2 chain

Markov Chains Basics — Quick Test

Have questions about Markov Chains Basics?

AI Assistant