Menu

Topic 6 of 13

Feature Relationships

Learn Feature Relationships for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Why this matters

Feature relationships tell you which variables move together, which groups differ, and where hidden patterns or confounders exist. As a Data Analyst, you will:

  • Prioritize drivers of KPIs (conversion, churn, cost).
  • Detect non-linear and segmented patterns before modeling.
  • Explain your findings with evidence, not guesses.
Real tasks you might face
  • Which marketing channel shows the strongest lift in purchases after controlling for region?
  • Do premium users spend more because of the plan or because they use the app more?
  • Are device type and purchase independent, or is there a relationship worth acting on?

Quick test is available to everyone. Progress is saved only when you are logged in.

Concept explained simply

Feature relationships describe how two variables relate:

  • Numeric–numeric: Do values increase/decrease together? (e.g., ad spend vs. visits)
  • Categorical–numeric: Do group means differ? (e.g., plan type vs. revenue)
  • Categorical–categorical: Are categories independent? (e.g., device vs. purchase)
  • Time-based: Does the past predict the next value? (autocorrelation, seasonality)
Mental model: Ask the right question
  • Change together? Use scatterplot + correlation (Pearson or Spearman).
  • Groups differ? Use boxplots + mean differences (t-test/ANOVA), effect sizes.
  • Categories tied? Use contingency table + chi-square, CramĂ©r’s V.
  • Time matters? Check trends, seasonality, and lag plots.

Core methods and when to use them

Numeric–numeric
  • Scatterplot with optional smoother (LOESS) to catch non-linearity.
  • Pearson correlation (linear). Spearman correlation (monotonic, robust to outliers/rank-based).
  • Look for clusters and outliers; consider transformations (log, square root) when skewed.
Categorical–numeric
  • Box/violin plots; group means with confidence intervals.
  • Two groups: t-test and Cohen’s d. Multiple groups: ANOVA and "+" post-hoc comparisons; report practical differences, not just p-values.
Categorical–categorical
  • Contingency table; chi-square test of independence.
  • Effect size: CramĂ©r’s V for strength (0–1).
Time-aware checks
  • Line plot over time for trend/seasonality.
  • Autocorrelation (ACF) and lag plots to detect carryover effects.
Hidden structure: confounding, segmentation, interaction
  • Confounding: A third variable affects both X and Y (Simpson’s paradox risk). Check segmented plots and partial correlations.
  • Interaction: The effect of X on Y differs by a grouping variable. Compare slopes or group differences across segments.

Worked examples

Example 1 — Numeric–numeric with possible non-linearity

Scenario: Study hours vs. exam score for 40 students. Plot shows rising pattern that flattens beyond ~6 hours (diminishing returns).

  • Pearson r ≈ 0.75 (understates the curved relationship).
  • Spearman ρ ≈ 0.88 (captures monotonic increase).
  • Action: Consider a log or spline term; don’t assume a straight line.
Example 2 — Categorical–numeric: Plan vs. Monthly Spend

Plans: Basic, Standard, Premium. Mean spend: Basic 18, Standard 27, Premium 42 (USD/month).

  • ANOVA: Large between-group variance vs within-group variance → significant differences.
  • Effect sizes: Premium vs. Basic shows large Cohen’s d; Standard vs. Premium moderate-large.
  • Action: Report practical gap (e.g., +24 USD for Premium vs. Basic), not just p-values.
Example 3 — Categorical–categorical: Device vs. Purchase

Contingency table suggests mobile users purchase more often than desktop users.

  • Chi-square test rejects independence.
  • CramĂ©r’s V ≈ 0.22 → small-to-moderate association.
  • Action: Optimize mobile checkout; still investigate confounders (age, traffic source).
Example 4 — Simpson’s paradox and interaction

Overall, promo seems to reduce sales, but within Region A it helps, within Region B it hurts. The combined data reverses the sign. Segment before concluding.

How to analyze feature relationships (quick steps)

  1. Define the question: trend, difference, independence, or time effect?
  2. Plot first: scatter/box/contingency heatmap/line-over-time.
  3. Quantify: correlation, mean differences, chi-square, effect sizes.
  4. Probe pitfalls: outliers, non-linearity, confounders, interactions, time dependence.
  5. Summarize with a 1–2 line insight + simple chart.
Self-check before you report
  • Did I visualize and quantify?
  • Did I check for non-linearity/outliers?
  • Did I segment by key groups (e.g., region, device)?
  • Did I distinguish “significant” from “useful” (effect size)?

Exercises

Do these hands-on tasks. You can use a spreadsheet or any analytics tool.

  • Exercise 1: Correlation on a mini numeric–numeric dataset (id: ex1)
  • Exercise 2: Contingency table, chi-square, and CramĂ©r’s V (id: ex2)
  • Exercise 3: Segment to resolve a paradox (id: ex3)
Completion checklist
  • I plotted before computing statistics.
  • I reported both a test/statistic and an effect size.
  • I checked at least one potential confounder or interaction.
  • I wrote one clear takeaway per relationship.

Common mistakes and how to self-check

  • Relying only on Pearson r when the relationship is curved. Self-check: Add a smoother to the scatterplot and compare with Spearman.
  • Reporting p-values without effect sizes. Self-check: Add Cohen’s d or CramĂ©r’s V.
  • Ignoring segments. Self-check: Re-run analysis by key segments (region, device, plan).
  • Assuming causation from correlation. Self-check: Consider time order, potential confounders, or experimental evidence.
  • Not checking time dependence. Self-check: Inspect ACF or add lags for time series.

Practical projects

  • E-commerce funnel: Analyze relationships between traffic source, device, and checkout completion. Deliver 3 charts + 3 insights.
  • SaaS retention: Relate feature usage counts to churn by plan and region. Include an interaction finding.
  • Marketing mix snapshot: Correlate weekly spend per channel with leads, check lag-1 effects, and discuss non-linearity.

Learning path

  • Before this: Data cleaning, data types, basic visualization.
  • This lesson: Spot, visualize, and quantify feature relationships.
  • Next: Feature selection, multicollinearity checks, simple predictive baselines.

Who this is for

  • Aspiring and junior Data Analysts who want confident EDA skills.
  • Professionals switching from BI/reporting to analysis.

Prerequisites

  • Comfort with basic stats (mean, variance) and charts.
  • Ability to use a spreadsheet or a scripting tool (any).

Next steps

  • Run the exercises on your own dataset and compare insights.
  • Take the Quick Test below to check understanding.
  • Move on to feature selection and modeling-ready datasets.

Mini challenge

You receive a dataset with columns: visits, purchases, device, region, and week.

  • Task: Find one relationship that flips sign after segmenting.
  • Deliverable: 2 charts (overall vs. segmented) and a 2-line explanation.
Hint

Check purchases vs. visits overall, then segment by device and region. Look for different slopes.

Practice Exercises

3 exercises to complete

Instructions

Use the dataset below to compute Pearson r and Spearman ρ between ad_spend (k$) and site_visits (k). Plot a quick scatter (optional).

Data (10 rows)
ad_spend:    1, 2, 3, 4, 5, 6, 7, 8, 9, 10
site_visits: 100, 210, 290, 410, 520, 610, 720, 790, 910, 980
  • 1) Compute Pearson correlation.
  • 2) Compute Spearman correlation.
  • 3) In one sentence, interpret the relationship.
Expected Output
High positive relationship; Pearson r and Spearman ρ both near 0.95–1.00. Interpretation: more spend, more visits.

Feature Relationships — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Feature Relationships?

AI Assistant

Ask questions about this tool