luvv to helpDiscover the Best Free Online Tools
Topic 7 of 12

Showing Relationships With Scatter Plots

Learn Showing Relationships With Scatter Plots for free with explanations, exercises, and a quick test (for Business Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

Scatter plots are the fastest way to show how two numeric variables move together. As a Business Analyst, you will:

  • Validate hypotheses like “Do discounts increase order volume?”
  • Spot drivers of KPIs (e.g., time-on-site vs. conversions)
  • Find outliers that distort averages
  • Communicate patterns clearly to non-technical stakeholders

Who this is for

  • Aspiring and practicing Business Analysts
  • Product, marketing, and ops analysts who need quick relationship insights
  • Anyone building evidence-based stories from data

Prerequisites

  • Basic comfort with numeric data (mean, median)
  • Know how to create a basic chart in Excel/Sheets or a BI tool
  • Optional: Familiarity with correlation (r) and trend lines

Concept explained simply

A scatter plot places each observation as a dot using two numeric variables: X on the horizontal axis, Y on the vertical axis.

  • If dots rise from left to right, X and Y tend to increase together (positive relationship).
  • If dots fall from left to right, X increases while Y decreases (negative relationship).
  • If dots look like a cloud without a direction, there’s likely no linear relationship.

Mental model

Think of a scatter plot as a map of pairs. Each dot = one record. Clusters are neighborhoods, the trend line is the main road, and outliers are houses far from everyone else. Your job: name the road (trend), point out the neighborhoods (segments), and mention the outliers (risks or opportunities).

How to build a reliable scatter plot

Step 1: Pick two numeric variables with a plausible relationship (cause/effect or association).
Step 2: Put the suspected driver on X and the outcome on Y.
Step 3: Add a trend line (linear to start) and show its equation if available.
Step 4: Consider color by category (e.g., region) or size for a third numeric variable.
Step 5: Check for overplotting (use jitter or transparency) and axis scale issues.
Step 6: Annotate notable outliers and the main takeaway in one sentence.
Optional add-ons (when helpful)
  • Display correlation coefficient (r) for linear strength (–1 to +1).
  • Use log scales when values span orders of magnitude.
  • Facet small multiples by segment to compare patterns side by side.

Worked examples

Example 1: Marketing spend vs sign-ups

Data: Weekly ad spend (X) and sign-ups (Y). The dots rise left-to-right; the trend line slope is positive.

Insight: “Greater spend generally yields more sign-ups; marginal returns flatten after $40k (curve visible). Recommend testing channel mix at high spend.”

Example 2: Delivery distance vs delivery time

Data: Distance in km (X) vs delivery time in minutes (Y). A strong positive pattern with a few dots far above the line.

Insight: “Time increases with distance, but outliers suggest operational issues (traffic spikes or courier gaps) on specific routes. Investigate those orders.”

Example 3: Session duration vs purchases, colored by device

Data: Session duration (X) vs purchases per user (Y), color by device (mobile/desktop).

Insight: “Longer sessions correlate with more purchases. Desktop users convert more at equal duration. Prioritize mobile UX to close the gap.”

Learning path

  • Start: Make a basic scatter plot with two numeric variables
  • Next: Add trend line, interpret slope and outliers
  • Then: Use color/size for a third variable and fix overplotting
  • Stretch: Compare segments via faceting; experiment with log scales

Exercises

Complete the exercises below. Anyone can do them; progress is saved for logged-in users.

Exercise 1: Build and interpret a scatter plot

Dataset (10 rows):

Ad_Spend_USD: 5, 8, 10, 15, 20, 25, 30, 35, 40, 45 (thousands)
Signups:      50, 70, 85, 110, 150, 155, 180, 205, 210, 215
Channel:      S,  S,  S,  S,  S,  D,   D,   D,   D,   D   (S=Search, D=Display)
Task A: Plot Ad_Spend on X and Signups on Y. Color by Channel.
Task B: Add a linear trend line. Note the slope direction.
Task C: Describe the relationship in one sentence and flag any outliers.

Exercise 2: Fix overplotting and scaling

Scenario: You plotted Order_Value (USD) vs Items_Per_Order. Values cluster at Order_Value 10–30 with Items 1–3, many points stacked on top of each other. A few orders are $500–$800 with 1–2 items.

Task A: Reduce overplotting (choose: jitter, transparency, or small multiples). Explain your choice.
Task B: Decide if a log scale is appropriate on Order_Value and justify.
Task C: Write a one-sentence takeaway that a manager can act on.

Exercise checklist

  • I set the suspected driver on the X-axis and outcome on the Y-axis.
  • I added a trend line and looked for slope direction and fit.
  • I checked and handled overplotting (jitter/transparency) and scaling.
  • I annotated at least one outlier and stated a clear takeaway.

Common mistakes and self-check

  • Confusing correlation with causation: Self-check: Can another factor explain both X and Y? Did the pattern persist across segments/time?
  • Truncated or inconsistent axes: Self-check: Do axes start at sensible baselines or clearly indicate truncation?
  • Overplotting hides structure: Self-check: Did you try transparency, jitter, or smaller points?
  • Ignoring non-linear patterns: Self-check: Would a curve fit (polynomial) better capture the shape?
  • Unlabeled outliers: Self-check: Are exceptional points identified and explained or investigated?

Practical projects

  • Customer value drivers: Plot visit frequency vs annual spend; color by segment; recommend a retention action.
  • Operational efficiency: Plot staff hours vs tickets resolved; identify diminishing returns and staffing guidance.
  • Pricing impact: Plot discount % vs conversion rate; simulate expected lift at 5% and 10% discount using trend line.

Mini challenge

You have these pairs (X=Response time in seconds, Y=CSAT out of 5):

(2, 4.8), (3, 4.6), (5, 4.2), (8, 3.9), (12, 3.2), (20, 2.9)
  • Would you expect a positive or negative slope? Why?
  • Write a two-line executive summary of the relationship and one action.

Next steps

  • Apply a scatter plot to one of your current KPIs this week.
  • Share one annotated chart with a teammate for feedback.
  • Take the quick test below to confirm understanding. Note: Anyone can take it; logged-in users have results saved.

Practice Exercises

2 exercises to complete

Instructions

Use the dataset below to create a scatter plot. Then describe the relationship and any outliers.

Ad_Spend_USD: 5, 8, 10, 15, 20, 25, 30, 35, 40, 45 (thousands)
Signups:      50, 70, 85, 110, 150, 155, 180, 205, 210, 215
Channel:      S,  S,  S,  S,  S,  D,   D,   D,   D,   D   (S=Search, D=Display)
  • Plot Ad_Spend on X and Signups on Y; color by Channel.
  • Add a linear trend line and note the slope direction.
  • Write a one-sentence takeaway and flag any outliers.
Expected Output
A chart with an upward trend; a concise note stating positive relationship with slight flattening at higher spend; mention if any points deviate from the line.

Showing Relationships With Scatter Plots — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Showing Relationships With Scatter Plots?

AI Assistant

Ask questions about this tool