luvv to helpDiscover the Best Free Online Tools
Topic 8 of 13

Relationships Scatter Plots

Learn Relationships Scatter Plots for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

As a Data Analyst, you often need to show how two numeric variables move together: ad spend vs conversions, price vs demand, support wait time vs CSAT, sessions vs revenue, or feature usage vs retention. Scatter plots are your go-to to reveal patterns, strength and direction of relationships, outliers, and whether returns are diminishing. Done well, a simple scatter plot can drive decisions like budget allocation, pricing changes, or where to investigate data quality issues.

Who this is for

  • Beginner to intermediate analysts who need to visualize relationships between two numeric variables.
  • Anyone preparing analyses for stakeholders and dashboards.
  • Students practicing data storytelling and model sanity checks.

Prerequisites

  • Basic understanding of numeric variables and axes.
  • Comfort with a plotting tool (Excel/Google Sheets, Tableau/Power BI, Python matplotlib/Seaborn, or R ggplot2).
  • Know what correlation means at a high level (direction and strength).

Concept explained simply

A scatter plot places each observation as a dot by its x-value and y-value. Patterns of dots tell you about the relationship between the variables.

  • Direction: positive (up-right), negative (down-right), or none (cloud).
  • Shape: linear, curved (e.g., diminishing returns), clustered groups, or fan-shaped spread (heteroscedasticity).
  • Strength: tighter band = stronger relationship; quantify with correlation (Pearson for linear, Spearman for monotonic).
  • Context: encode a third variable via color/shape; a fourth via size. But keep it readable.
Mental model: dots, patterns, and questions

Think of each dot as a story. Where dots line up, a rule might exist. Where dots stray, an exception or new factor may be at play. Ask:

  • Is there a clear trend? If yes, how strong?
  • Are there groups that behave differently?
  • Any outliers that deserve investigation?
  • Does variance change with x (fan shape)?
  • Would a curved line fit better than a straight one?

Design and techniques that work

  • Overplotting fix: add transparency (20–50% alpha), jitter small integers, or use small markers. For very dense data, try binning (hex/rect) or sample points.
  • Trendline: add a linear fit to show direction and strength. If curved pattern, use a polynomial/LOESS trendline for exploration.
  • Scale: scatter plots do not need axes starting at zero. Use ranges that include all data and comparisons. Consider log scales for variables spanning orders of magnitude.
  • Encodings: use color for categorical groups; shape for colorblind-safe contrast; size only when it adds clear meaning (e.g., revenue). Keep legends concise.
  • Labels: label axes with units; add short subtitle to state the question; annotate key outliers or thresholds.
  • Ethics: correlation is not causation. Use language like “associated with” unless you have causal evidence.

Worked examples

Example 1 — Marketing spend vs sign-ups

Data: spend (k$) on x, sign-ups on y for 20 campaigns. Pattern: steep increase at low spend, flattening after ~60k. A linear trendline shows positive slope but residuals curve, suggesting diminishing returns. Action: propose a cap per campaign and shift extra budget to underfunded, efficient ranges.

Example 2 — Price vs units sold

Pattern: down-right slope (negative). Two clusters appear (Region A and B). Aggregating both hides a stronger negative trend within each region (Simpson's paradox risk). Action: facet by region or color points; analyze separately; avoid one-size-fits-all pricing.

Example 3 — Wait time vs CSAT

Pattern: gentle negative slope; strong outliers for very low CSAT at moderate wait times. Investigation shows days with an IVR outage. Action: annotate outage dates; present both the general trend and the outlier explanation.

How to build a great scatter plot (step-by-step)

  1. Question first: what relationship are you testing? Write a one-line subtitle capturing it.
  2. Select variables: both should be numeric. If one is categorical, consider jittered dot plot or box plot instead.
  3. Plot points: start with small circles, moderate transparency.
  4. Add context: encode group by color/shape; avoid using both unless necessary.
  5. Add trendline: begin with linear; if curvature is visible, try LOESS for exploration.
  6. Check residuals pattern: look for curves (model mismatch), fanning (heteroscedasticity), or clusters (hidden groups).
  7. Tune scales: consider log scale for skewed, multiplicative data.
  8. Annotate: mark key outliers and add a brief takeaway above or below the chart.

Quick checklist before sharing:

  • Axes labeled with units, readable ranges.
  • Legend is clear and minimal.
  • Overplotting handled (transparency/jitter/binning).
  • Trendline appropriate and not misleading.
  • Outliers investigated or explained.
  • Takeaway sentence is honest: “associated with”, not causal claims.

Common mistakes and how to self-check

  • Using line charts instead of scatter for unordered pairs. Self-check: Is there a meaningful sequence on x? If not, use scatter.
  • Declaring causation from correlation. Self-check: Could a third factor explain both variables?
  • Forcing axes to start at zero. Self-check: Does zero add meaning? If not, choose a tight but honest range.
  • Ignoring overplotting. Self-check: Zoom in; if points stack, add transparency or jitter.
  • Hiding group differences. Self-check: Color/facet by plausible groups (region, device, segment) and compare trends.
  • One-size trendline. Self-check: Are residuals curved or fan-shaped? Consider non-linear fit or transform.

Exercises

These mirror the exercises below. Do them in your preferred tool (Sheets/Excel, Python, R, BI tool) and sanity-check with the provided solutions.

Exercise 1 — Interpret a relationship

You receive a scatter plot description: Each point is a weekly campaign. X = marketing spend (k$). Y = sign-ups. The linear trendline slope is positive; correlation r ≈ 0.78. Three points near 55k spend have much lower sign-ups than neighbors.

  • Describe the relationship (direction, strength, form).
  • Identify likely outliers and give two possible reasons.
  • Write a single actionable recommendation.
Hints
  • Think diminishing returns and campaign quality.
  • Consider tracking issues or external events.
Expected output (what good looks like)

Clear positive relationship, moderately strong; likely diminishing returns; note low-performing ~55k points as outliers; recommend capping spend and investigating those campaigns.

Show solution

Direction positive, strength moderately strong (r ~0.78), likely slight curvature (flattening). Outliers around 55k could be mis-targeted audiences or tracking/landing page issues. Recommendation: cap per-campaign spend near the elbow and shift excess to campaigns in the efficient region; investigate outliers before increasing budgets.


Exercise 2 — Build and tune a scatter plot

Use the small dataset below.

Spend_kSignupsChannel
1095Search
20166Social
30210Email
40260Search
50295Affiliate
55160Social
60315Email
70325Search
80330Affiliate
  • Plot Signups vs Spend_k (x=Spend_k, y=Signups).
  • Color points by Channel; add slight transparency.
  • Add a linear trendline; report approximate correlation.
  • Annotate any outlier; give a 1–2 sentence takeaway.
Hints
  • Compute correlation across all points; expect a high positive value.
  • The 55k/160 point should stand out.
  • Use a short subtitle: “Signups rise with spend, with an outlier at 55k.”
Expected output

A readable scatter plot with color by Channel, transparency ~30–40%. Linear trendline with r ~0.85–0.95 overall. The 55k/160 point annotated as an outlier. Takeaway: strong positive association with possible diminishing returns at high spend; investigate the outlier campaign.

Show solution
  1. Plot points with x=Spend_k, y=Signups; size small, alpha ~0.3.
  2. Color by Channel (e.g., Search, Social, Email, Affiliate); keep a clear legend.
  3. Add linear trendline; correlation is roughly 0.9 (outlier reduces it slightly).
  4. Annotate (55,160) as underperforming. Takeaway: overall, signups increase with spend, but one campaign underperformed; check creative, audience, or tracking before scaling.

Self-check checklist:

  • Axes clearly labeled with units (k$ for spend).
  • Overplotting considered (transparency used).
  • Trendline present and appropriate.
  • Outlier identified and annotated.
  • Concise, honest takeaway included.

Mini challenge

Pick any dataset with two numeric variables you use at work or study (e.g., sessions vs revenue). Create two versions:

  • Version A: single-color, linear trendline.
  • Version B: color by a meaningful segment and use a LOESS curve.

Write one sentence on which version better supports a decision and why.

Practical projects

  • Diminishing returns report: Build a scatter plot of spend vs outcome across 3 months. Add a LOESS curve, annotate the elbow point, and recommend a budget cap.
  • Segment contrast: Create a faceted scatter plot by region or device. Compare slopes and add a short text panel with per-segment insights.
  • Outlier diary: For a product metric pair (usage vs retention), list top 5 outliers and, for each, a likely cause and next step to validate.

Learning path

  • Now: master scatter plots (this lesson, exercises, test).
  • Next: trendlines and simple regression diagnostics (residual checks).
  • Then: distributions (histograms, KDE) to understand variable shapes.
  • Later: segmentation and faceting for multi-group comparisons.
  • Finally: dashboard integration with consistent styling and annotations.

Next steps

  • Turn your best scatter into a reusable template (style, fonts, colors).
  • Create a short annotation library (outlier, elbow, cluster callouts).
  • Share one plot with a teammate and ask, “What decision would this support?” Iterate.

Quick Test

Take the short test below to check your understanding. Available to everyone; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

You receive a scatter plot description: Each point is a weekly campaign. X = marketing spend (k$). Y = sign-ups. The linear trendline slope is positive; correlation r ≈ 0.78. Three points near 55k spend have much lower sign-ups than neighbors.

  • Describe the relationship (direction, strength, form).
  • Identify likely outliers and give two possible reasons.
  • Write one actionable recommendation.
Expected Output
A concise summary noting a moderately strong positive association with slight curvature; 55k outliers identified; recommendation to cap spend near the efficient range and investigate underperformers.

Relationships Scatter Plots — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Relationships Scatter Plots?

AI Assistant

Ask questions about this tool