luvv to helpDiscover the Best Free Online Tools
Topic 1 of 13

Basic Distributions

Learn Basic Distributions for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Why this matters

Distributions describe the shape of your data. As a Data Analyst, this helps you:

  • Pick the right summary stats (e.g., median instead of mean for skewed data).
  • Estimate probabilities (e.g., how often a queue will be empty or overflow).
  • Choose appropriate visualizations (histogram vs. bar chart) and transformations (log for right-skew).
  • Validate assumptions for A/B tests and forecasting.

Concept explained simply

A distribution tells you how likely different values are. Think of it as the "shape" a random process tends to produce over many repeats.

  • Discrete vs. continuous: Counts (0,1,2,...) vs. measurements (any real value).
  • Parameters: Small set of numbers that summarize the shape (e.g., mean, variance, rate).
  • Skew and tails: Right-skewed means long tail to the right; heavy tails mean more extreme values than Normal.

Common distributions at a glance

  • Bernoulli(p): Single yes/no outcome (clicked vs. not).
  • Binomial(n, p): Number of successes out of n independent trials.
  • Poisson(lambda): Count of events in a fixed interval when events happen independently at a constant average rate.
  • Uniform(a, b): All values in [a, b] equally likely.
  • Normal(mu, sigma): Bell curve; many natural aggregates; CLT makes sums/means tend toward Normal.
  • Log-normal(mu_log, sigma_log): Positive, right-skewed amounts; log of data is roughly Normal.
  • Exponential(rate): Time between Poisson events; memoryless.
  • t-distribution(df): Like Normal but with heavier tails; used when estimating means with small samples and unknown variance.
Mental model: Generative stories
  • Bernoulli: Flip a biased coin once. Heads probability = p.
  • Binomial: Flip the same coin n times; count heads.
  • Poisson: Events pop up randomly at average rate lambda per interval; count how many occur in an interval.
  • Exponential: Wait time until the next random event from a Poisson process.
  • Normal: Many small independent effects add up (measurement error, heights of people, average of many samples).
  • Log-normal: Multiply small independent effects (e.g., price growth factors) — taking logs turns products into sums.

Worked examples

Example 1: Support emails per hour

Scenario: Over many hours, you record counts of support emails: 0,1,2,... You suspect a Poisson process.

  1. Estimate lambda: the average count per hour (mean of the data). Suppose the mean is 3.2 emails/hour.
  2. Quick check: For Poisson, variance is approximately equal to the mean. If sample variance is near 3.2, it supports Poisson.
  3. Probability of zero emails in an hour: P(X=0) = exp(-lambda) = exp(-3.2) ≈ 0.040.
Example 2: Transaction amounts (skewed)

Scenario: Purchase amounts are positive and right-skewed; a few very large orders exist.

  1. Hypothesis: Log-normal. Take log(amount) and plot a histogram.
  2. If log(amount) looks roughly bell-shaped, use Normal summaries on the log scale.
  3. Median on the original scale equals exp(mean of log(amount)). If mean log = 2.1, median ≈ exp(2.1) ≈ 8.17.
Example 3: Email campaign conversion

Scenario: Each of 1,000 recipients independently converts with probability p.

  1. Model: Binomial(n=1000, p).
  2. Expected conversions: n * p. If p = 0.03, expected = 30.
  3. Standard deviation: sqrt(n p (1-p)) ≈ sqrt(1000 * 0.03 * 0.97) ≈ 5.4.
  4. Rough 95% range: expected ± 2*sd → 30 ± 10.8 → about 19 to 41.

Exercises

Complete the tasks below. Solutions are available in toggles. Your progress is saved if you are logged in; otherwise, you can still practice for free.

Exercise 1: Match scenarios to distributions

For each scenario, select the most suitable distribution and estimate its key parameter(s):

  1. A. Number of app crashes per day (rare, independent events).
  2. B. Whether a user clicks a button in a single app session.
  3. C. Number of clicks out of 200 ad impressions with stable probability.
  4. D. Time between consecutive signups on a landing page.
  5. E. Daily revenue values that are positive and heavily right-skewed.

Estimate parameters using these summaries (hypothetical):

  • Average crashes/day: 1.4
  • Click probability per impression: 0.05
  • Impressions in C: n = 200
  • Median time between signups: about 2 minutes
  • Mean of log(daily revenue): 3.2; SD of log(daily revenue): 0.6
  • I picked one distribution per scenario.
  • I estimated parameters using the given numbers.
  • My choices match the data type (count, yes/no, time, positive skew).

Exercise 2: Compute simple probabilities

Use the distribution formulas or standard approximations:

  1. Poisson with lambda = 3 per hour: probability of zero events in the next hour?
  2. Normal with mean = 48 and SD = 12 hours: probability a ticket resolves in over 72 hours?
  3. Log-normal where log(amount) ~ Normal(mean = 2.0, SD = 0.5): what is the median amount?
  4. Small sample mean: n = 25, sample mean = 68, sample SD = 10. 95% CI for the population mean?
  • I wrote the formula used for each calculation.
  • I showed a rounded numeric answer.
  • For (4), I used the t-multiplier (not Normal) since SD is estimated.

Common mistakes and self-check

  • Using Normal on raw, right-skewed amounts. Self-check: Plot histogram and try log; if log looks bell-shaped, prefer log-normal summaries.
  • Treating counts with mean near 0 as Normal. Self-check: If mean is small and variance ≈ mean, Poisson often fits better.
  • Forgetting independence. Self-check: If events cluster (dependence), Poisson may understate variance (overdispersion).
  • Using z-interval with small n and unknown sigma. Self-check: Use t with df = n-1 when sigma is unknown and n is small.
  • Confusing median and mean under log-normal. Self-check: Median = exp(mean of log data), not exp(mean of raw).

Practical projects

  • Helpdesk volume model: Collect hourly ticket counts for two weeks. Fit Poisson (estimate lambda). Compare observed variance to lambda; note any hours with overdispersion.
  • Revenue shape check: Take 3 months of order amounts. Plot raw and log histograms. Report median, IQR on raw; and mean, SD on log. Present one slide with recommendation.
  • CTR stability: For daily impressions and clicks, model clicks ~ Binomial(n, p). Compute daily p-hats and a 95% interval for p. Flag days outside the interval and explain practical reasons.

Mini challenge

You observe: (i) 60% of minutes have 0 signups, 30% have 1, 10% have 2, almost none have 3+. Suggest a distribution and estimate its main parameter. Then, propose a quick check to validate your choice.

Hint

Rates per minute with many zeros often suggest a Poisson with lambda around the average count per minute; check mean vs variance.

Who this is for

  • Entry-level and aspiring Data Analysts who want to interpret data distributions clearly.
  • Professionals switching from reporting to analytical modeling.

Prerequisites

  • Basic arithmetic and percentages.
  • Comfort with averages, variance, and standard deviation.
  • Ability to read histograms and bar charts.

Learning path

  • Before: Descriptive statistics (mean, median, variance), data types (categorical vs numeric).
  • This subskill: Recognize and use basic distributions in EDA.
  • After: Sampling and the Central Limit Theorem, hypothesis testing, regression assumptions.

Next steps

  • Re-check your last project: Which distributions did you assume implicitly? Were they appropriate?
  • Build a small notebook/template to test Poisson vs. overdispersion and to visualize log-normal candidates.
  • Take the Quick Test below. You can take it for free; sign in to save your progress.

Practice Exercises

2 exercises to complete

Instructions

For each scenario, pick the best distribution and estimate parameters using the provided summaries.

  1. A. Number of app crashes per day (rare, independent events). Avg = 1.4
  2. B. Whether a user clicks a button in a single app session. p unknown
  3. C. Number of clicks out of 200 impressions with stable probability. n=200, p=0.05
  4. D. Time between consecutive signups. Median wait ≈ 2 minutes
  5. E. Daily revenue is positive and heavily right-skewed. mean(log) = 3.2, sd(log) = 0.6
Expected Output
A: Poisson(lambda=1.4); B: Bernoulli(p); C: Binomial(n=200,p=0.05); D: Exponential(rate≈ ln(2)/median ≈ 0.3466 per minute); E: Log-normal(mu_log=3.2, sigma_log=0.6)

Basic Distributions — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Basic Distributions?

AI Assistant

Ask questions about this tool