Why this matters
Business Analysts often need to summarize lots of numeric data quickly: order values, resolution times, session lengths, lead ages, or delivery delays. A histogram shows the shape of these numbers at a glance. It helps you spot typical ranges, skew, outliers, and subpopulations. Stakeholders use this to set thresholds, create SLAs, and choose realistic targets.
Concept explained simply
A histogram groups a numeric variable into consecutive intervals (bins) and shows how many observations fall into each. Bars touch because the bins are continuous ranges. The vertical axis can show counts or percentages.
- Bar chart: categories (e.g., product lines), bars are separated.
- Histogram: numeric ranges (e.g., delivery times), bars touch.
Mental model
Imagine pouring all your numbers into labeled boxes lined up on a shelf: 0–5, 5–10, 10–15, and so on. The more numbers a box gets, the taller that bar is.
How to create a good histogram
Examples: What is a typical first-response time? Are most orders under $50? Are there long delays?
Check min, max, and obvious outliers. Note if values are bounded at 0 (many time/cost variables are).
Pick start, end, and width. Aim for 8–20 bins for readability. Heuristics: about sqrt(n) bins, or adjust until the shape is stable. Use round, human-friendly edges when possible.
Use counts when a single group is shown. Use percentages when comparing groups with different sample sizes.
Bars should touch. Label axes with units (e.g., minutes, dollars). Consider marking the median or a target line.
If the distribution is very skewed, try a different bin width or a log-scaled x-axis. Avoid overlaying multiple histograms if they become hard to read—prefer small multiples (side-by-side panels) with identical bins and axes.
Worked examples
Example 1 — E‑commerce order values
Data: 3,200 orders; min $0, max $600; median ~$42; long right tail.
- Bins: $0–$200 with width $10 (20 bins), plus an overflow bin $200–$600 to show the tail.
- Scale: Percent, to compare to other weeks later.
- Annotation: Vertical line at median ($42) and a note: “10% of orders > $90”.
Insight: Most orders cluster $20–$60; a small fraction drives high revenue in the tail.
Example 2 — Time to first response (support tickets)
Data: minutes; many tickets resolved quickly, some take hours (right-skewed).
- Bins: 0–240 minutes, width 5 minutes (48 bins).
- Skew handling: Try a log x-axis if long tail hides structure near zero.
- Comparison: Two teams shown as small multiples with same bins and percent scale.
Insight: Team B has a thinner tail; fewer tickets over 120 minutes.
Example 3 — App sessions per user per day
Data: integers with many zeros.
- Bins: 0–20 sessions, width 1 (discrete-friendly). Combine 20+ into a final bin.
- Highlight: The 0 bin (inactive users) and 1–2 bins (light users).
Insight: A zero-inflated distribution; activation campaigns should target the 0 and 1 session groups.
Who this is for
- Business Analysts and Data Analysts summarizing numeric metrics.
- Product/Operations/Marketing stakeholders needing quick shape insights.
Prerequisites
- Basic familiarity with numeric data (continuous vs. discrete).
- Understanding of mean, median, outliers.
- Comfort creating simple charts in a spreadsheet or BI tool.
Common mistakes and how to self-check
- Using a bar chart for numeric data. Bars shouldn’t be separated; use a histogram with touching bars.
- Too many/few bins. If the chart looks spiky or overly smooth, adjust until patterns are stable and readable.
- Different bin edges across panels. For comparisons, keep the same start, end, and width.
- Counts vs. percentages mismatch. When group sizes differ, use percent on the y-axis.
- Unclear labels and units. Always label axes and note if using log scale or overflow bins.
- Clipped tails without notice. If you cut off the axis, clearly state it and why.
Self-check
- Are bins contiguous and labeled with ranges or readable ticks?
- Are axes labeled with units (minutes, $, sessions)?
- If comparing groups, are binning and scales identical and y-axis set to percent?
- Does the chosen bin width reveal stable structure without noise?
Exercises
Do these hands-on tasks. Then compare with the solutions below each exercise.
Exercise 1 — Plan your bins
You have 5,000 ride-sharing trip durations (minutes), ranging from 0.5 to 120. You want a single histogram for an internal report.
- Propose: start, end, bin width, estimated number of bins.
- Choose counts vs. percent. Explain your choice.
- Note any annotation you would add (e.g., service target at 15 minutes).
Show solution
Start at 0, end at 120, width 5 minutes → about 24 bins. Use percent to make later comparisons easier. Add a line at the 15-minute target and annotate the share over 30 minutes. If early bins are too tall to see shape, try width 3 minutes or a log x-axis for exploration (but keep the final chart linear for interpretability unless stakeholders are comfortable with logs).
Exercise 2 — Build one from sample data
Copy these 40 values (minutes) into your tool and create a histogram.
Data: 2, 3, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10, 10, 12, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 25, 26, 28, 30, 32, 35, 38, 40, 45, 50, 55, 60, 75, 90, 110
- Bins: 0–120, width 5 minutes.
- Label axes and add a vertical line at 15 minutes.
Show solution
You should see a concentration between 3 and ~20 minutes, with a sparse right tail. The 15-minute line sits near the upper part of the main cluster. Most observations fall under 30 minutes; a few extend beyond 60 with a maximum near 110.
Exercise 3 — Interpret a histogram
Imagine a histogram of delivery delays (days) from 0 to 20 with width 1 day shows a strong spike at 0–2 days, a gentle decline to 7 days, and a thin tail to 20 days. About 12% are over 7 days.
- Where is the median likely to be (roughly)?
- Is the distribution symmetric or skewed?
- What operational follow-up would you suggest?
Show solution
The median is likely around 2–3 days given the strong early spike and gradual decline. The distribution is right-skewed. Suggest investigating the causes of the >7 day tail (supplier, region, product type) and set a KPI to reduce that tail share from 12% to a lower target.
Exercise checklist
- Picked human-friendly bin edges and widths.
- Chose count vs. percent appropriately.
- Used the same binning when comparing groups.
- Labeled axes and units clearly.
- Considered skew, tails, and annotations (median/targets).
Practical projects
- Analyze order values for last quarter. Produce a histogram, annotate the median and the 90th percentile, and write 3 insights for pricing/discount strategy.
- Support SLA review. Build percent-scale histograms for two teams with identical binning. Recommend a realistic SLA threshold that covers at least 80% of tickets.
- User engagement snapshot. Create small-multiple histograms of daily sessions by user segment (new vs. returning) and highlight differences in low-activity bins.
Learning path
- Before: Basic chart types, numeric data types, summary statistics.
- Now: Histograms for distribution shape and quick comparisons.
- Next: Density plots, box plots, and cumulative distributions for deeper comparisons.
Next steps
- Try different bin widths to test stability of the pattern.
- Create small multiples for 2–3 cohorts using identical bins and percent y-axis.
- Share one histogram with concise annotations: median line, tail percentage, and a clear takeaway sentence.
Mini challenge
You have two marketing campaigns with 1,000 and 350 orders respectively. You want to compare order value distributions. Design the comparison:
- Choose bin edges and width.
- Pick y-axis scale and explain why.
- Decide: overlay or small multiples?
- List one annotation that helps a decision-maker.
Suggested approach
Use identical bins (e.g., $0–$200, width $10, plus $200+). Use percent y-axis due to different sample sizes. Prefer small multiples over overlay for clarity. Annotate each panel with median and share of orders >$90.
Quick Test
Take the quick test below to check your understanding. Everyone can take it for free; only logged-in users will have their progress saved.