How to learn Feature Relationships for Exploratory Analysis in Data Analyst for free

Why this matters

Feature relationships tell you which variables move together, which groups differ, and where hidden patterns or confounders exist. As a Data Analyst, you will:

Prioritize drivers of KPIs (conversion, churn, cost).
Detect non-linear and segmented patterns before modeling.
Explain your findings with evidence, not guesses.

Real tasks you might face

Which marketing channel shows the strongest lift in purchases after controlling for region?
Do premium users spend more because of the plan or because they use the app more?
Are device type and purchase independent, or is there a relationship worth acting on?

Quick test is available to everyone. Progress is saved only when you are logged in.

Concept explained simply

Feature relationships describe how two variables relate:

Numeric–numeric: Do values increase/decrease together? (e.g., ad spend vs. visits)
Categorical–numeric: Do group means differ? (e.g., plan type vs. revenue)
Categorical–categorical: Are categories independent? (e.g., device vs. purchase)
Time-based: Does the past predict the next value? (autocorrelation, seasonality)

Mental model: Ask the right question

Change together? Use scatterplot + correlation (Pearson or Spearman).
Groups differ? Use boxplots + mean differences (t-test/ANOVA), effect sizes.
Categories tied? Use contingency table + chi-square, Cramér’s V.
Time matters? Check trends, seasonality, and lag plots.

Core methods and when to use them

Numeric–numeric

Scatterplot with optional smoother (LOESS) to catch non-linearity.
Pearson correlation (linear). Spearman correlation (monotonic, robust to outliers/rank-based).
Look for clusters and outliers; consider transformations (log, square root) when skewed.

Categorical–numeric

Box/violin plots; group means with confidence intervals.
Two groups: t-test and Cohen’s d. Multiple groups: ANOVA and "+" post-hoc comparisons; report practical differences, not just p-values.

Categorical–categorical

Contingency table; chi-square test of independence.
Effect size: Cramér’s V for strength (0–1).

Time-aware checks

Line plot over time for trend/seasonality.
Autocorrelation (ACF) and lag plots to detect carryover effects.

Hidden structure: confounding, segmentation, interaction

Confounding: A third variable affects both X and Y (Simpson’s paradox risk). Check segmented plots and partial correlations.
Interaction: The effect of X on Y differs by a grouping variable. Compare slopes or group differences across segments.

Worked examples

Example 1 — Numeric–numeric with possible non-linearity

Scenario: Study hours vs. exam score for 40 students. Plot shows rising pattern that flattens beyond ~6 hours (diminishing returns).

Pearson r ≈ 0.75 (understates the curved relationship).
Spearman ρ ≈ 0.88 (captures monotonic increase).
Action: Consider a log or spline term; don’t assume a straight line.

Example 2 — Categorical–numeric: Plan vs. Monthly Spend

Plans: Basic, Standard, Premium. Mean spend: Basic 18, Standard 27, Premium 42 (USD/month).

ANOVA: Large between-group variance vs within-group variance → significant differences.
Effect sizes: Premium vs. Basic shows large Cohen’s d; Standard vs. Premium moderate-large.
Action: Report practical gap (e.g., +24 USD for Premium vs. Basic), not just p-values.

Example 3 — Categorical–categorical: Device vs. Purchase

Contingency table suggests mobile users purchase more often than desktop users.

Chi-square test rejects independence.
Cramér’s V ≈ 0.22 → small-to-moderate association.
Action: Optimize mobile checkout; still investigate confounders (age, traffic source).

Example 4 — Simpson’s paradox and interaction

Overall, promo seems to reduce sales, but within Region A it helps, within Region B it hurts. The combined data reverses the sign. Segment before concluding.

How to analyze feature relationships (quick steps)

Define the question: trend, difference, independence, or time effect?
Plot first: scatter/box/contingency heatmap/line-over-time.
Quantify: correlation, mean differences, chi-square, effect sizes.
Probe pitfalls: outliers, non-linearity, confounders, interactions, time dependence.
Summarize with a 1–2 line insight + simple chart.

Self-check before you report

Did I visualize and quantify?
Did I check for non-linearity/outliers?
Did I segment by key groups (e.g., region, device)?
Did I distinguish “significant” from “useful” (effect size)?

Exercises

Do these hands-on tasks. You can use a spreadsheet or any analytics tool.

Exercise 1: Correlation on a mini numeric–numeric dataset (id: ex1)
Exercise 2: Contingency table, chi-square, and Cramér’s V (id: ex2)
Exercise 3: Segment to resolve a paradox (id: ex3)

Completion checklist

I plotted before computing statistics.
I reported both a test/statistic and an effect size.
I checked at least one potential confounder or interaction.
I wrote one clear takeaway per relationship.

Common mistakes and how to self-check

Relying only on Pearson r when the relationship is curved. Self-check: Add a smoother to the scatterplot and compare with Spearman.
Reporting p-values without effect sizes. Self-check: Add Cohen’s d or Cramér’s V.
Ignoring segments. Self-check: Re-run analysis by key segments (region, device, plan).
Assuming causation from correlation. Self-check: Consider time order, potential confounders, or experimental evidence.
Not checking time dependence. Self-check: Inspect ACF or add lags for time series.

Practical projects

E-commerce funnel: Analyze relationships between traffic source, device, and checkout completion. Deliver 3 charts + 3 insights.
SaaS retention: Relate feature usage counts to churn by plan and region. Include an interaction finding.
Marketing mix snapshot: Correlate weekly spend per channel with leads, check lag-1 effects, and discuss non-linearity.

Learning path

Before this: Data cleaning, data types, basic visualization.
This lesson: Spot, visualize, and quantify feature relationships.
Next: Feature selection, multicollinearity checks, simple predictive baselines.

Who this is for

Aspiring and junior Data Analysts who want confident EDA skills.
Professionals switching from BI/reporting to analysis.

Prerequisites

Comfort with basic stats (mean, variance) and charts.
Ability to use a spreadsheet or a scripting tool (any).

Next steps

Run the exercises on your own dataset and compare insights.
Take the Quick Test below to check understanding.
Move on to feature selection and modeling-ready datasets.

Mini challenge

You receive a dataset with columns: visits, purchases, device, region, and week.

Task: Find one relationship that flips sign after segmenting.
Deliverable: 2 charts (overall vs. segmented) and a 2-line explanation.

Hint

Check purchases vs. visits overall, then segment by device and region. Look for different slopes.

Menu

Feature Relationships

Table of Contents

Why this matters

Concept explained simply

Core methods and when to use them

Worked examples

How to analyze feature relationships (quick steps)

Exercises

Common mistakes and how to self-check

Practical projects

Learning path

Who this is for

Prerequisites

Next steps

Mini challenge

Practice Exercises

Correlation on a mini numeric–numeric dataset

Instructions

Expected Output

Contingency table, chi-square, and Cramér’s V

Segment to resolve a paradox (interaction)

Feature Relationships — Quick Test

Have questions about Feature Relationships?

AI Assistant