luvv to helpDiscover the Best Free Online Tools
Topic 3 of 8

Simulation And Monte Carlo

Learn Simulation And Monte Carlo for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

Simulation and Monte Carlo methods let you answer “what might happen?” when formulas are hard or assumptions are shaky. As a Data Scientist, you will:

  • Estimate uncertainty of metrics with bootstrapping (e.g., conversion rate, revenue).
  • Forecast risk and ranges (e.g., demand, churn, cost overruns).
  • Evaluate A/B test power and expected lift before launching experiments.
  • Approximate complex probabilities when closed-form math is impractical.

Concept explained simply

Monte Carlo is about answering questions by random sampling. Instead of solving a messy equation, you simulate many possible worlds and summarize the results.

Mental model: “Ask the universe many times”

Imagine a box that can generate realistic outcomes for your problem (like daily conversions or demand). Each press of a button gives one outcome. Press it thousands of times, then compute the average, quantiles, or any metric you care about. That is Monte Carlo.

Core ingredients you need

  • Random number generator (RNG): produces pseudo-random samples. Set a seed for reproducibility.
  • Distribution model: choose a distribution that matches your data (e.g., Binomial for conversions, Normal/Lognormal for times or revenue, Poisson for counts).
  • Sampling loop: repeat many times, compute your statistic each time.
  • Law of Large Numbers: more samples generally mean better estimates. Standard error shrinks about as 1/sqrt(N).

Worked examples

Example 1: Estimating π by dart throwing

Idea: Throw random points into a 1x1 square. The fraction that fall inside the quarter circle of radius 1 approximates π/4.

  1. Sample x, y uniformly in [0, 1].
  2. Count hits where x^2 + y^2 ≤ 1.
  3. Estimate π ≈ 4 × (hits / N).
Python snippet
import numpy as np
rng = np.random.default_rng(42)
N = 1_000_00  # 100k samples
x = rng.random(N)
y = rng.random(N)
hits = (x*x + y*y) <= 1.0
pi_hat = 4 * hits.mean()
print(pi_hat)

Example 2: A/B test power check

Question: With 20,000 users per group, baseline p=3% vs. variant p=3.6%, what power do we have at α=0.05?

  1. Simulate Binomial conversions for A and B per trial.
  2. Compute difference in rates, run a simple z-test (approx) or compare via bootstrap CI.
  3. Power ≈ fraction of trials where you detect a significant improvement.
Python snippet (approx power via z-test)
import numpy as np
from math import sqrt
rng = np.random.default_rng(0)
trials = 5000
n = 20_000
pA, pB = 0.03, 0.036
alpha = 0.05
zcrit = 1.96
hits = 0
for _ in range(trials):
    cA = rng.binomial(n, pA)
    cB = rng.binomial(n, pB)
    ra, rb = cA/n, cB/n
    p_pool = (cA + cB) / (2*n)
    se = sqrt(p_pool*(1-p_pool)*(2/n))
    z = (rb - ra) / se
    if z > zcrit:
        hits += 1
power = hits / trials
print(power)

Example 3: Revenue range with uncertainty

Suppose daily sessions ~ 40,000 (± noise), conversion rate ~ 2.5%, average order value ~ $35 (lognormal). What is likely daily revenue?

  1. Model sessions as Normal(40,000, sd=3,000) clipped at 0.
  2. Conversions ~ Binomial(sessions, 0.025).
  3. Order values ~ Lognormal; total revenue = sum of simulated order values.
  4. Run many trials; take median and 95% interval.
Notes
  • Heavy-tailed revenues: prefer quantiles over mean to summarize.
  • Check that simulated distributions roughly match observed data.

Variance reduction (go faster, get tighter)

  • Antithetic variates: for each u ~ Uniform(0,1), also use 1-u. Reduces variance when function is monotonic in u.
  • Control variates: use a correlated variable with known expectation to adjust estimates.
  • Stratified/Latin Hypercube sampling: force coverage across the range instead of purely random draws.
  • Importance sampling: sample more where the action is (rare events), then reweight.

Reproducibility and quality checks

  • Set and log RNG seeds for each run.
  • Track sample size N and the standard error of your estimate.
  • Run convergence checks: increase N until your key metric stabilizes within a tolerance.
  • Validate model inputs against real data (means, variances, quantiles).

Exercises

These mirror the exercises below. Do them here, then verify your outputs against the expected ranges.

Exercise 1 — Monte Carlo π

Write a simulation to estimate π using random points in a unit square. Report your estimate and the absolute error |π_est - 3.14159|. Use at least N=100,000 and a fixed seed.

  • [ ] Use vectorized sampling for x and y.
  • [ ] Compute hits where x^2 + y^2 ≤ 1.
  • [ ] π ≈ 4 × hits/N; report error.
Exercise 2 — Bootstrap CI for mean

Given daily signups [12, 15, 7, 9, 14, 11, 10, 13, 8, 16], bootstrap the mean with 10,000 resamples. Compute a 95% percentile CI and check if target=12 is inside.

  • [ ] Resample with replacement 10,000 times.
  • [ ] Compute mean per resample.
  • [ ] Report 2.5% and 97.5% quantiles and inclusion of 12.

Self-check checklist

  • [ ] You set and recorded a random seed.
  • [ ] Your estimate changes less than your chosen tolerance when doubling N.
  • [ ] You reported both point estimate and uncertainty (SE or CI).
  • [ ] You sanity-checked inputs (distributions match data).

Common mistakes and how to self-check

  • Mistake: No seed → impossible to reproduce. Fix: set a seed and log it.
  • Mistake: Too few samples → noisy results. Fix: increase N until CI stabilizes.
  • Mistake: Wrong distribution. Fix: compare simulated and real histograms/quantiles.
  • Mistake: Using mean for heavy-tailed outcomes. Fix: use median and percentile CIs.
  • Mistake: Ignoring dependencies. Fix: model correlations explicitly or simulate joint draws.

Practical projects

  • Risk dashboard: simulate monthly revenue distribution with uncertainty in traffic, conversion, and order value. Output median, 5th, 95th percentiles.
  • Experiment planner: simulate A/B test outcomes for various sample sizes; plot power vs. sample size for a target MDE.
  • Rare-event estimator: estimate probability of a checkout failure that occurs 1 in 10,000 sessions using importance sampling.

Who this is for

  • Aspiring and practicing Data Scientists who need practical uncertainty estimation.
  • Analysts and ML engineers who plan experiments or forecast ranges.

Prerequisites

  • Basic probability (distributions, expectation, variance).
  • Comfort with Python, R, or a similar language for sampling and arrays.
  • Familiarity with NumPy/Pandas or base R data workflows.

Learning path

  1. Warm-up: simulate from common distributions (Uniform, Normal, Binomial, Poisson).
  2. Core Monte Carlo: estimate means, probabilities, and quantiles; monitor standard error.
  3. Bootstrapping: build CIs and test hypotheses without strict parametric assumptions.
  4. Variance reduction: antithetic, control variates, stratified sampling.
  5. Applied projects: experiment power, risk forecasting, rare events.

Next steps

  • Pick one project above and implement end-to-end with clear inputs and outputs.
  • Add convergence plots showing estimate vs. N.
  • Document assumptions and include a short “model validation” note.

Mini challenge

Simulate the probability that the weekly sum of conversions exceeds 1,000 given daily traffic and conversion-rate uncertainty. Report the probability and a 95% interval for the total conversions.

Quick Test

Take the quick test to check understanding. Everyone can take it; only logged-in users will have progress saved.

Practice Exercises

2 exercises to complete

Instructions

Estimate π by simulating N random points uniformly in the unit square. Count points inside the quarter circle (x^2 + y^2 ≤ 1) and compute π ≈ 4 × hits/N. Use N ≥ 100,000 and a fixed random seed. Report π_est and absolute error vs. 3.14159.

Expected Output
pi_est close to 3.1416 (absolute error typically < 0.01 for N around 1e6; expect ~0.02–0.05 for N around 1e5).

Simulation And Monte Carlo — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Simulation And Monte Carlo?

AI Assistant

Ask questions about this tool