luvv to helpDiscover the Best Free Online Tools
Topic 7 of 13

Distributions Histograms Density Plots

Learn Distributions Histograms Density Plots for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

Understanding distributions shows how your data is actually shaped, not just its average. As a Data Analyst, you will:

  • Check if a metric is skewed before reporting a mean (e.g., order values, session duration).
  • Pick the right summary (median vs mean) and detect outliers quickly.
  • Compare groups (A/B variants, regions) to see real shifts, not noise.
  • Communicate uncertainty and typical ranges clearly to stakeholders.

Who this is for

Beginner to intermediate analysts who need to visualize data distributions and interpret them for real decisions.

Prerequisites

  • Basic descriptive statistics: mean, median, percentiles.
  • Comfort loading data into a tool (spreadsheets, Python, or SQL client).
  • Basic plotting knowledge helps but is not required.

Concept explained simply

A histogram counts how many values fall into each numeric interval (bin). A density plot is a smooth curve estimating how common values are across the range. Both show shape: center, spread, skew, and multimodality (multiple peaks).

Mental model

Imagine lining up all your data points on a number line and grouping them into equal-width boxes. The height of each box shows how many fall there. That’s a histogram. If you drape a flexible ribbon over the tops of those boxes and smooth it, you get a density curve. The ribbon highlights the general pattern without sharp bin edges.

Key choices: bins and bandwidth

  • Histogram bins: too few bins hide detail; too many bins add noise. Try 8–30 bins depending on sample size. For n ~ 200–2000, start with 15–25 bins.
  • Density bandwidth: too small shows spiky noise; too large oversmooths. Start with defaults, then adjust until main shape is clear without jaggedness.
Practical defaults
  • Histogram: start with 20 bins; adjust up/down and pick the clearest story.
  • Density: start with “Scott” or “Silverman” rule (most tools default) and tweak slightly.

How to read histograms vs density plots

  • Center: where most values cluster (peak area). Median is near the midpoint of area; mean pulls toward the tail.
  • Spread: how wide the mass of data stretches (IQR for middle 50%).
  • Skew: long tail to the right (right-skew) or left (left-skew).
  • Peaks: one peak (unimodal) or several peaks (multimodal) may indicate subgroups.
  • Outliers: isolated bars/long tails that may need investigation.

Worked examples

Example 1: E-commerce order values

Scenario: 5,000 orders; most are small, a few are very large.

  • Plot: Histogram with density overlay.
  • Observation: Strong right-skew. Median around $32; mean around $58 due to a tail of high spenders.
  • Decision: Report median for “typical order” and show 90th percentile to acknowledge big spenders.

Example 2: Session duration (minutes)

Scenario: 2,000 sessions.

  • Plot: Histogram with 20 bins, density on top.
  • Observation: Peak at 3–5 minutes, gradual right tail up to ~40 minutes.
  • Decision: Use median duration in dashboards; show distribution to UX to understand typical vs power users.

Example 3: A/B test conversion lift (user-level deltas)

Scenario: Distribution of per-user spend change vs baseline.

  • Plot: Overlaid densities for Variant A and B.
  • Observation: Both right-skewed. B shows a slight overall shift right but larger variance.
  • Decision: Report median lift and compare quantiles; caution stakeholders about volatile tail behavior.

How to make these plots in common tools

Spreadsheets (Excel/Google Sheets)
  1. Select your numeric column.
  2. Insert → Histogram (or use FREQUENCY/BIN ranges to build manually).
  3. Adjust bin width via Axis → Bins (set bin width or number of bins).
  4. To approximate density, add a smoothed line on top by computing a moving average of the normalized counts.
Python (pandas + seaborn/matplotlib)
import pandas as pd, seaborn as sns, matplotlib.pyplot as plt

# df['value'] is your numeric series
ax = sns.histplot(df['value'], bins=20, stat='density', kde=True, color='#4e79a7')
ax.set_xlabel('Value'); ax.set_ylabel('Density')
plt.show()

Notes: stat='density' scales the histogram to area 1 so it matches the density curve. Adjust bins or kde=True bandwidth via bw_adjust (e.g., sns.kdeplot(df['value'], bw_adjust=1.2)).

SQL (binning counts)
-- Generic binning by width (example: 5-unit bins)
SELECT FLOOR(value/5)*5 AS bin_start,
       COUNT(*) AS n
FROM your_table
WHERE value IS NOT NULL
GROUP BY 1
ORDER BY 1;

Export the results to chart as a column chart. To overlay two groups, compute counts for each group, then normalize by total to compare shapes.

Common mistakes and self-check

  • Mistake: Using mean only on a skewed metric. Fix: Show distribution; add median and percentiles.
  • Mistake: Too few or too many bins. Fix: Try several; pick the one that makes primary shape clear without jagged noise.
  • Mistake: Comparing groups with different scales (counts). Fix: Use density/relative frequency to compare shapes fairly.
  • Mistake: Ignoring outliers. Fix: Show with capped x-axis and note outlier counts separately.
  • Mistake: Over-interpreting tiny bumps. Fix: Confirm with sample size; smooth or aggregate if needed.
Self-check checklist
  • Did I label axes with units?
  • Is the typical value (median) clear?
  • Is the bin width/bandwidth reasonable after trying alternatives?
  • Did I annotate any notable peaks, tails, or outliers?
  • If comparing groups, did I normalize and use the same axis?

Exercises

Do these hands-on tasks. Then take the Quick Test. Note: Anyone can take the test for free; only logged-in users will have their progress saved.

Exercise 1: Build a histogram + density for session duration

Dataset (synthetic): 120 values in minutes. Approx pattern: 10 values near 0.5–1.5, 70 values in 2–8, 35 values in 8–20, 5 values in 20–45.

  • Task: Plot a histogram with 15–25 bins and overlay a density curve. Report median and 90th percentile.
  • Deliverable: A plot image and 2 numbers (median, p90).
Need a hint?
  • Ensure stat='density' when overlaying density in Python.
  • In spreadsheets, adjust bin width until the main peak is clear.

Exercise 2: Compare two distributions (pre vs post)

Dataset (synthetic): Pre: mean ~7, right-skew; Post: slight shift right, more spread. Sample sizes: 400 each.

  • Task: Plot overlaid densities or side-by-side histograms with the same bins. State whether the typical value improved and whether variance changed.
  • Deliverable: A comparative plot and a 2–3 sentence interpretation.
Need a hint?
  • Normalize to densities to compare shapes.
  • Report median and IQR for robustness.
Checklist to submit
  • Axes labeled with units.
  • Bin/bandwidth choices justified.
  • Median/IQR or percentiles reported.
  • Clear conclusion: shift, no shift, or inconclusive.

Mini challenge

You have two product categories with revenue per order. Category A shows two peaks; Category B shows a single peak with a long right tail. In 3 bullet points, explain what this suggests about customer segments and how you’d present typical revenue to stakeholders for each category.

Tip

Multimodality often signals subgroups (e.g., low-cost vs premium). Right-skew suggests median and upper percentiles tell a clearer story than the mean.

Practical projects

  • Customer spend distribution dashboard: histogram + density, median, p90, and outlier notes.
  • Engagement shape report: compare session durations by acquisition channel with normalized densities.
  • A/B distribution shift study: visualize control vs variant; report median shift and changes in IQR.

Learning path

  1. Distributions (this lesson): shapes, skew, peaks, outliers.
  2. Boxplots and percentiles: fast comparisons across many groups.
  3. Robust summaries: median, IQR, trimmed mean, Winsorization.
  4. Transformation tactics: log scale for heavy tails; when and why.
  5. Inference basics: how distribution shape affects tests and confidence intervals.

Next steps

  • Recreate two plots from recent work data and present 1-slide insights.
  • Adopt a standard: always pair a key metric with its distribution in reports.
  • Move on to boxplots and outlier detection to compare many groups quickly.

Practice Exercises

2 exercises to complete

Instructions

Use the synthetic dataset description: 120 session durations (minutes) roughly distributed as: 10 values near 0.5–1.5, 70 values in 2–8, 35 values in 8–20, 5 values in 20–45.

  • Create a histogram with 15–25 bins and overlay a density curve.
  • Report the median and the 90th percentile.
  • Briefly describe skewness and any outliers.
Implementation notes

Spreadsheets: Insert → Histogram; set bin width so the main peak is well-defined. Python: use seaborn histplot(..., stat='density', kde=True).

Expected Output
A plot showing a clear right-skewed distribution; median around 6–8 min; p90 around 18–22 min; note of long right tail.

Distributions Histograms Density Plots — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Distributions Histograms Density Plots?

AI Assistant

Ask questions about this tool