Why this matters
Feature relationships tell you which variables move together, which groups differ, and where hidden patterns or confounders exist. As a Data Analyst, you will:
- Prioritize drivers of KPIs (conversion, churn, cost).
- Detect non-linear and segmented patterns before modeling.
- Explain your findings with evidence, not guesses.
Real tasks you might face
- Which marketing channel shows the strongest lift in purchases after controlling for region?
- Do premium users spend more because of the plan or because they use the app more?
- Are device type and purchase independent, or is there a relationship worth acting on?
Quick test is available to everyone. Progress is saved only when you are logged in.
Concept explained simply
Feature relationships describe how two variables relate:
- Numericânumeric: Do values increase/decrease together? (e.g., ad spend vs. visits)
- Categoricalânumeric: Do group means differ? (e.g., plan type vs. revenue)
- Categoricalâcategorical: Are categories independent? (e.g., device vs. purchase)
- Time-based: Does the past predict the next value? (autocorrelation, seasonality)
Mental model: Ask the right question
- Change together? Use scatterplot + correlation (Pearson or Spearman).
- Groups differ? Use boxplots + mean differences (t-test/ANOVA), effect sizes.
- Categories tied? Use contingency table + chi-square, CramĂ©râs V.
- Time matters? Check trends, seasonality, and lag plots.
Core methods and when to use them
Numericânumeric
- Scatterplot with optional smoother (LOESS) to catch non-linearity.
- Pearson correlation (linear). Spearman correlation (monotonic, robust to outliers/rank-based).
- Look for clusters and outliers; consider transformations (log, square root) when skewed.
Categoricalânumeric
- Box/violin plots; group means with confidence intervals.
- Two groups: t-test and Cohenâs d. Multiple groups: ANOVA and "+" post-hoc comparisons; report practical differences, not just p-values.
Categoricalâcategorical
- Contingency table; chi-square test of independence.
- Effect size: CramĂ©râs V for strength (0â1).
Time-aware checks
- Line plot over time for trend/seasonality.
- Autocorrelation (ACF) and lag plots to detect carryover effects.
Hidden structure: confounding, segmentation, interaction
- Confounding: A third variable affects both X and Y (Simpsonâs paradox risk). Check segmented plots and partial correlations.
- Interaction: The effect of X on Y differs by a grouping variable. Compare slopes or group differences across segments.
Worked examples
Example 1 â Numericânumeric with possible non-linearity
Scenario: Study hours vs. exam score for 40 students. Plot shows rising pattern that flattens beyond ~6 hours (diminishing returns).
- Pearson r â 0.75 (understates the curved relationship).
- Spearman Ï â 0.88 (captures monotonic increase).
- Action: Consider a log or spline term; donât assume a straight line.
Example 2 â Categoricalânumeric: Plan vs. Monthly Spend
Plans: Basic, Standard, Premium. Mean spend: Basic 18, Standard 27, Premium 42 (USD/month).
- ANOVA: Large between-group variance vs within-group variance â significant differences.
- Effect sizes: Premium vs. Basic shows large Cohenâs d; Standard vs. Premium moderate-large.
- Action: Report practical gap (e.g., +24 USD for Premium vs. Basic), not just p-values.
Example 3 â Categoricalâcategorical: Device vs. Purchase
Contingency table suggests mobile users purchase more often than desktop users.
- Chi-square test rejects independence.
- CramĂ©râs V â 0.22 â small-to-moderate association.
- Action: Optimize mobile checkout; still investigate confounders (age, traffic source).
Example 4 â Simpsonâs paradox and interaction
Overall, promo seems to reduce sales, but within Region A it helps, within Region B it hurts. The combined data reverses the sign. Segment before concluding.
How to analyze feature relationships (quick steps)
- Define the question: trend, difference, independence, or time effect?
- Plot first: scatter/box/contingency heatmap/line-over-time.
- Quantify: correlation, mean differences, chi-square, effect sizes.
- Probe pitfalls: outliers, non-linearity, confounders, interactions, time dependence.
- Summarize with a 1â2 line insight + simple chart.
Self-check before you report
- Did I visualize and quantify?
- Did I check for non-linearity/outliers?
- Did I segment by key groups (e.g., region, device)?
- Did I distinguish âsignificantâ from âusefulâ (effect size)?
Exercises
Do these hands-on tasks. You can use a spreadsheet or any analytics tool.
- Exercise 1: Correlation on a mini numericânumeric dataset (id: ex1)
- Exercise 2: Contingency table, chi-square, and CramĂ©râs V (id: ex2)
- Exercise 3: Segment to resolve a paradox (id: ex3)
Completion checklist
- I plotted before computing statistics.
- I reported both a test/statistic and an effect size.
- I checked at least one potential confounder or interaction.
- I wrote one clear takeaway per relationship.
Common mistakes and how to self-check
- Relying only on Pearson r when the relationship is curved. Self-check: Add a smoother to the scatterplot and compare with Spearman.
- Reporting p-values without effect sizes. Self-check: Add Cohenâs d or CramĂ©râs V.
- Ignoring segments. Self-check: Re-run analysis by key segments (region, device, plan).
- Assuming causation from correlation. Self-check: Consider time order, potential confounders, or experimental evidence.
- Not checking time dependence. Self-check: Inspect ACF or add lags for time series.
Practical projects
- E-commerce funnel: Analyze relationships between traffic source, device, and checkout completion. Deliver 3 charts + 3 insights.
- SaaS retention: Relate feature usage counts to churn by plan and region. Include an interaction finding.
- Marketing mix snapshot: Correlate weekly spend per channel with leads, check lag-1 effects, and discuss non-linearity.
Learning path
- Before this: Data cleaning, data types, basic visualization.
- This lesson: Spot, visualize, and quantify feature relationships.
- Next: Feature selection, multicollinearity checks, simple predictive baselines.
Who this is for
- Aspiring and junior Data Analysts who want confident EDA skills.
- Professionals switching from BI/reporting to analysis.
Prerequisites
- Comfort with basic stats (mean, variance) and charts.
- Ability to use a spreadsheet or a scripting tool (any).
Next steps
- Run the exercises on your own dataset and compare insights.
- Take the Quick Test below to check understanding.
- Move on to feature selection and modeling-ready datasets.
Mini challenge
You receive a dataset with columns: visits, purchases, device, region, and week.
- Task: Find one relationship that flips sign after segmenting.
- Deliverable: 2 charts (overall vs. segmented) and a 2-line explanation.
Hint
Check purchases vs. visits overall, then segment by device and region. Look for different slopes.