Why this matters
Scatter plots are the fastest way to show how two numeric variables move together. As a Business Analyst, you will:
- Validate hypotheses like “Do discounts increase order volume?”
- Spot drivers of KPIs (e.g., time-on-site vs. conversions)
- Find outliers that distort averages
- Communicate patterns clearly to non-technical stakeholders
Who this is for
- Aspiring and practicing Business Analysts
- Product, marketing, and ops analysts who need quick relationship insights
- Anyone building evidence-based stories from data
Prerequisites
- Basic comfort with numeric data (mean, median)
- Know how to create a basic chart in Excel/Sheets or a BI tool
- Optional: Familiarity with correlation (r) and trend lines
Concept explained simply
A scatter plot places each observation as a dot using two numeric variables: X on the horizontal axis, Y on the vertical axis.
- If dots rise from left to right, X and Y tend to increase together (positive relationship).
- If dots fall from left to right, X increases while Y decreases (negative relationship).
- If dots look like a cloud without a direction, there’s likely no linear relationship.
Mental model
Think of a scatter plot as a map of pairs. Each dot = one record. Clusters are neighborhoods, the trend line is the main road, and outliers are houses far from everyone else. Your job: name the road (trend), point out the neighborhoods (segments), and mention the outliers (risks or opportunities).
How to build a reliable scatter plot
Optional add-ons (when helpful)
- Display correlation coefficient (r) for linear strength (–1 to +1).
- Use log scales when values span orders of magnitude.
- Facet small multiples by segment to compare patterns side by side.
Worked examples
Example 1: Marketing spend vs sign-ups
Data: Weekly ad spend (X) and sign-ups (Y). The dots rise left-to-right; the trend line slope is positive.
Insight: “Greater spend generally yields more sign-ups; marginal returns flatten after $40k (curve visible). Recommend testing channel mix at high spend.”
Example 2: Delivery distance vs delivery time
Data: Distance in km (X) vs delivery time in minutes (Y). A strong positive pattern with a few dots far above the line.
Insight: “Time increases with distance, but outliers suggest operational issues (traffic spikes or courier gaps) on specific routes. Investigate those orders.”
Example 3: Session duration vs purchases, colored by device
Data: Session duration (X) vs purchases per user (Y), color by device (mobile/desktop).
Insight: “Longer sessions correlate with more purchases. Desktop users convert more at equal duration. Prioritize mobile UX to close the gap.”
Learning path
- Start: Make a basic scatter plot with two numeric variables
- Next: Add trend line, interpret slope and outliers
- Then: Use color/size for a third variable and fix overplotting
- Stretch: Compare segments via faceting; experiment with log scales
Exercises
Complete the exercises below. Anyone can do them; progress is saved for logged-in users.
Exercise 1: Build and interpret a scatter plot
Dataset (10 rows):
Ad_Spend_USD: 5, 8, 10, 15, 20, 25, 30, 35, 40, 45 (thousands) Signups: 50, 70, 85, 110, 150, 155, 180, 205, 210, 215 Channel: S, S, S, S, S, D, D, D, D, D (S=Search, D=Display)
Exercise 2: Fix overplotting and scaling
Scenario: You plotted Order_Value (USD) vs Items_Per_Order. Values cluster at Order_Value 10–30 with Items 1–3, many points stacked on top of each other. A few orders are $500–$800 with 1–2 items.
Exercise checklist
- I set the suspected driver on the X-axis and outcome on the Y-axis.
- I added a trend line and looked for slope direction and fit.
- I checked and handled overplotting (jitter/transparency) and scaling.
- I annotated at least one outlier and stated a clear takeaway.
Common mistakes and self-check
- Confusing correlation with causation: Self-check: Can another factor explain both X and Y? Did the pattern persist across segments/time?
- Truncated or inconsistent axes: Self-check: Do axes start at sensible baselines or clearly indicate truncation?
- Overplotting hides structure: Self-check: Did you try transparency, jitter, or smaller points?
- Ignoring non-linear patterns: Self-check: Would a curve fit (polynomial) better capture the shape?
- Unlabeled outliers: Self-check: Are exceptional points identified and explained or investigated?
Practical projects
- Customer value drivers: Plot visit frequency vs annual spend; color by segment; recommend a retention action.
- Operational efficiency: Plot staff hours vs tickets resolved; identify diminishing returns and staffing guidance.
- Pricing impact: Plot discount % vs conversion rate; simulate expected lift at 5% and 10% discount using trend line.
Mini challenge
You have these pairs (X=Response time in seconds, Y=CSAT out of 5):
(2, 4.8), (3, 4.6), (5, 4.2), (8, 3.9), (12, 3.2), (20, 2.9)
- Would you expect a positive or negative slope? Why?
- Write a two-line executive summary of the relationship and one action.
Next steps
- Apply a scatter plot to one of your current KPIs this week.
- Share one annotated chart with a teammate for feedback.
- Take the quick test below to confirm understanding. Note: Anyone can take it; logged-in users have results saved.