Who this is for
- Aspiring and junior Data Analysts who summarize datasets for reports and dashboards.
- Professionals switching into analytics who need a reliable way to describe "typical" values.
- Students preparing for analytics interviews or case studies.
Prerequisites
- Basic arithmetic (sum, division) and comfort with sorting numbers.
- Familiarity with data types: numeric vs categorical.
- Optional: Knowing spreadsheet basics (e.g., functions like AVERAGE, MEDIAN, MODE) helps but is not required.
Why this matters
Central tendency helps you answer: "What is typical here?" Data Analysts use it to:
- Summarize monthly revenue or daily active users in executive dashboards.
- Choose a fair price or "typical" delivery time shown to customers.
- Detect when outliers are skewing reports (e.g., one huge order distorting averages).
- Compare performance across versions, branches, or time periods quickly.
Concept explained simply
Mean (average)
Add up all values and divide by how many values there are. Mental model: a balance point — if the data were weights on a line, the mean is where the seesaw balances.
Formula: mean = (sum of values) / (count of values).
Median (middle)
Order the values. The median is the middle one. If there are two middle values (even count), take their average. Mental model: a robust midpoint that ignores how big the extremes are.
Mode (most frequent)
The value that appears most often. Works for numbers and categories. Mental model: the peak of the distribution — the most common choice.
Quick rules of thumb
- Symmetric data (no heavy outliers): mean ≈ median ≈ mode. Report the mean.
- Skewed data (e.g., incomes, time-to-resolution): report the median.
- Categorical data (e.g., colors, payment methods): report the mode.
- If there are two top frequencies: bimodal. If 3+: multimodal.
Mental model
- Mean: pulled by outliers like a magnet; one extreme value can move it a lot.
- Median: stands firm in the middle; resistant to extremes.
- Mode: reflects the crowd’s favorite; may be multiple favorites.
Worked examples
Example 1: Clean numeric data
Data: [2, 3, 5, 5, 9]
See the solution
- Mean = (2+3+5+5+9)/5 = 24/5 = 4.8
- Median = 5 (middle of ordered list)
- Mode = 5 (most frequent)
- Use: Mean or median are both fine here; distribution looks fairly balanced.
Example 2: Skew due to an outlier
Monthly incomes (k): [40, 45, 50, 50, 55, 60, 60, 62, 65, 200]
See the solution
- Mean = 687/10 = 68.7
- Median = average of 5th and 6th = (55 + 60)/2 = 57.5
- Mode = 50 and 60 (bimodal)
- Report the median (57.5) for a better "typical" income under skew.
Example 3: Categorical data (mode only)
Favorite color survey: [Red, Blue, Blue, Green, Blue, Red]
See the solution
- Mode = Blue (appears most)
- Mean/median do not apply meaningfully to categories.
Example 4: Weighted mean (ratings)
Ratings counts: 5★: 120, 4★: 60, 3★: 15, 2★: 5, 1★: 0
See the solution
- Weighted mean = (5*120 + 4*60 + 3*15 + 2*5 + 1*0) / (120+60+15+5+0)
- = (600 + 240 + 45 + 10 + 0) / 200 = 895/200 = 4.475 ≈ 4.48
How to compute quickly
- Check data type and quality: numeric vs categorical; handle missing values; note outliers.
- For mean: sum values; divide by count. For weighted mean: multiply each value by its weight, sum results, divide by total weight.
- For median: sort values; pick middle (or average the two middles).
- For mode: count frequencies; pick the highest. Multiple modes are possible.
Handling missing values
- Exclude true missing values (NA/null) from both sum and count when computing mean.
- Document what you excluded; report the final sample size (n).
- If missingness is systematic, note the possible bias.
Exercises
These mirror the exercises section below. Try them now; then expand the solutions.
Exercise ex1: Tickets in a week
Dataset: [3, 5, 4, 6, 50, 5, 4]
- Compute mean, median, and mode.
- Which would you report as the "typical" number of tickets per day? Why?
Show solution
- Mean = (3+5+4+6+50+5+4)/7 = 77/7 = 11.0
- Sorted = [3,4,4,5,5,6,50]; Median = 5
- Mode = 4 and 5 (bimodal)
- Report median (5) because the outlier 50 skews the mean.
Exercise ex2: Weighted product rating
Ratings counts: 5★: 120, 4★: 60, 3★: 15, 2★: 5, 1★: 0
- Compute the weighted average rating to two decimals.
Show solution
Weighted mean = 895/200 = 4.475 ≈ 4.48
Self-check checklist
- I sorted data before computing the median.
- I verified whether outliers are present and decided accordingly (mean vs median).
- For weighted averages, I divided by total weight (not count of categories).
- I reported the sample size (n) when sharing results.
Common mistakes and how to self-check
- Mistake: Using the mean on heavily skewed data. Fix: Inspect distribution; prefer median.
- Mistake: Forgetting to sort before taking the median. Fix: Always sort first.
- Mistake: Dividing weighted sums by the number of groups. Fix: Divide by total weight.
- Mistake: Reporting mode for continuous data with no repeats. Fix: State "no clear mode" or bin sensibly.
- Mistake: Not handling missing values consistently. Fix: Exclude missing from both sum and count; report n.
Practical projects
- Support analytics: Determine the typical resolution time across tickets; compare mean vs median and explain the difference.
- E-commerce: Compute median order value by traffic source; flag sources with strong skew.
- App ratings: Build a simple report that shows mode rating and weighted average rating over time.
- Logistics: Compare median delivery time before and after a route change; quantify the percent change.
Mini challenge
You have delivery times (minutes): [18, 19, 20, 21, 22, 120]. Your manager asks for a single typical value for the homepage. Which measure do you use and what value do you report?
Hint
Outlier present. Which measure resists outliers best?
Learning path
- Master central tendency (this lesson): compute mean, median, mode; choose the right one based on data shape.
- Then learn variability: range, IQR, variance, standard deviation — they give context to the typical value.
- Move to distributions and skewness/kurtosis to better choose summary measures.
- Apply in dashboards: include both typical value and variability to avoid misleading summaries.
Next steps
- Take the Quick Test below to lock in the concepts. Test is available to everyone; only logged-in users will have progress saved.
- Apply to a small dataset you own (sales, usage, or survey). Report mean vs median and explain differences in one paragraph.
Quick Test
Ready? Take the quick test below. Everyone can take it; sign in to save your progress.