Who this is for
This subskill is for Data Scientists who need to explain models clearly to product, risk, marketing, or engineering stakeholders. If you build models and must justify which inputs matter (and how to act on them), this is for you.
Prerequisites
- Basic understanding of supervised ML (classification/regression).
- Familiarity with model training and evaluation (accuracy/AUC/MAE).
- Comfort with plotting simple charts (bar chart, scatter/violin).
Why this matters
- Prioritization: Product teams invest in features that drive performance.
- Debugging: Spot data leak or spurious drivers (e.g., an ID feature dominating).
- Fairness & compliance: Reveal sensitive or proxy attributes.
- Model iteration: Identify features to engineer, select, or drop.
- Trust-building: Clear visuals help non-technical stakeholders understand and act.
Concept explained simply
Feature importance tells you how much each input contributes to model predictions. Think of your model as a recipe: importance measures which ingredients change the taste the most.
Mental model
- Global view: On average, across all rows, which features move predictions the most?
- Local view: For one specific row, which features pushed the prediction up or down?
- Comparison rule: Only compare importances measured on the same model, dataset, and method.
When to use which method
- Coefficients (linear/logistic): Fast, direction-aware, but only comparable if features are on the same scale or standardized.
- Tree gain/MDI (Random Forest/GBDT): Fast; can be biased toward high-cardinality features; not model-agnostic.
- Permutation importance (model-agnostic): Robust idea; can underestimate importance when features are correlated; repeat with CV.
- Drop-column importance: Very interpretable but slow (retrain per feature).
- SHAP: Local and global explanations; great visuals; compute cost can be higher.
Types of importance and good choices
- Model-specific: Coefficients, tree gain/MDI.
- Model-agnostic: Permutation importance, drop-column, SHAP.
- Global vs local: Bar charts (global). Beeswarm/waterfall (local and global distribution).
- Correlation caution: If features are correlated, permutation may split importance between them. Consider grouping features or conditional permutation.
How to visualize importance effectively
- Pick the right method: Use permutation/SHAP for generality; coefficients for linear models; gain for GBDT as a quick cut.
- Sort and limit: Show top 10–15 features; long tails go in appendix.
- Use horizontal bars: Easier label reading; sort descending.
- Add uncertainty: Show error bars from cross-validation or multiple permutations.
- Handle sign: For linear models/SHAP, use color for positive/negative impact.
- Group features: Aggregate engineered variants (e.g., avg_7d, avg_30d → "Usage Avg").
- Be explicit: Title and captions should state method, data slice, metric, and date.
Design checklist
- Chart: Horizontal bar chart for global; beeswarm or violin for SHAP distribution; boxplots for permutation repeats.
- Axis: Normalize importance to max=1 or to percentage; label units (e.g., "accuracy drop").
- Caption: "Permutation importance on validation (AUC drop), 10 repeats, 5-fold CV".
- Color: Diverging palette when sign matters; single color when showing magnitude only.
Worked examples
Example 1 — Logistic regression (standardized coefficients)
Scenario: Churn model (binary). You trained a logistic regression.
- Coefficients (β):
- Age: −0.03
- Income: −0.00002
- ClicksLastWeek: +0.12
- EmailOpened (binary): −0.45
- Feature standard deviations (σ):
- Age: 12
- Income: 15000
- ClicksLastWeek: 4
- EmailOpened: 0.49
Compute comparable magnitudes with |β|×σ:
- ClicksLastWeek: 0.12×4 = 0.48 (positive)
- Age: 0.03×12 = 0.36 (negative)
- Income: 0.00002×15000 = 0.30 (negative)
- EmailOpened: 0.45×0.49 ≈ 0.22 (negative)
Visualization: Horizontal bars sorted by 0.48, 0.36, 0.30, 0.22. Use a diverging color: blue (+), red (−). Caption: "Standardized coefficient magnitude (|β|×σ)."
Example 2 — Random Forest permutation importance
Scenario: Pricing elasticity regression; metric = MAE. You compute permutation importance with 5-fold CV and 10 repeats.
- Baseline MAE (CV mean): 8.1
- After shuffling each feature (MAE, mean ± std):
- DiscountRate: 10.9 ± 0.5 → Δ = +2.8
- CompetitorPrice: 10.2 ± 0.4 → Δ = +2.1
- Season: 8.7 ± 0.2 → Δ = +0.6
- StoreID: 8.3 ± 0.1 → Δ = +0.2
Visualization: Bars of Δ (increase in MAE). Add error bars using the std across repeats/folds. Note the small Δ for StoreID suggests it’s not important; might be safe to drop.
Correlation check
If DiscountRate and Coupon are correlated, their importances may split. Consider grouping or conditional permutation; or test drop-column for the pair.
Example 3 — XGBoost with SHAP
Scenario: Credit risk classification. You compute SHAP values for validation data.
- Global view: SHAP beeswarm shows distribution of per-feature SHAP values across rows.
- Observations:
- DebtToIncome: wide spread (largest absolute SHAP) → most influential globally.
- CreditHistoryLength: medium spread but asymmetric → often reduces risk (negative SHAP).
- RecentInquiries: moderate; high values increase risk (positive SHAP).
Visualization: SHAP beeswarm (y: features sorted by mean |SHAP|; x: SHAP value; color by feature value). Add a caption explaining the sign: positive SHAP increases predicted risk.
Exercises
Do these to solidify skills. A simple notepad or spreadsheet is enough.
- Exercise 1: Compute permutation importance by hand (classification).
- Exercise 2: Standardize coefficients and create a signed bar chart (logistic regression).
Checklist before you move on
- You can explain the difference between model-specific and model-agnostic importance.
- You can choose a visualization (bar/beeswarm/boxplot) appropriate to the method.
- You can add uncertainty (error bars) and clear captions.
- You can spot correlation pitfalls and explain them.
Common mistakes and self-check
- Mixing methods: Comparing SHAP values to permutation deltas as if they are the same units. Self-check: Is your axis label explicit?
- Ignoring uncertainty: Showing one run of permutation only. Self-check: Do you have error bars or repeated runs?
- Scale blindness: Comparing raw coefficients with unscaled features. Self-check: Did you standardize or report standardized coefficients?
- Correlation trap: Declaring Feature A unimportant when A and B are highly correlated. Self-check: Try grouping or conditional permutation.
- Overcrowded charts: 50+ features in one figure. Self-check: Did you limit to top-N and provide appendix?
- Ambiguous sign: Using absolute values when stakeholders need direction. Self-check: Should you use diverging colors or separate chart for direction?
Practical projects
- Model report card: Create a one-page report for a production model including top-10 global importances, uncertainty bars, and a short action note per feature.
- Correlation-aware analysis: For a model with correlated inputs, produce two versions of permutation importance (standard vs conditional or grouped) and compare conclusions.
- Fairness spotlight: Visualize importances with and without sensitive attributes (or proxies) and write a 3-bullet risk note for stakeholders.
Mini challenge
You have a GBDT fraud model with 60 engineered features. You ran 5× repeated permutation importance (AUC drop). Choose a visualization plan and justify it in 3 sentences: which chart, top-N selection, uncertainty display, and how you’ll explain correlated device/browser features.
One possible approach
Use a horizontal bar chart of top 15 features, sorted by mean AUC drop with 95% CI error bars across repeats. Group related device/browser features as families. Add a footnote: "Permutation importance (validation), 5 repeats × 5 folds; correlated features may share importance."
Learning path
- Before: Data preprocessing → Feature engineering → Model training & evaluation.
- Now: Feature importance visualization (this lesson).
- Next: Partial dependence/ICE, SHAP deeper dive, and communicating model risk.
Next steps
- Apply an importance method to your current model and produce a stakeholder-ready chart with a clear caption.
- Run a correlation audit and re-check importances using grouping or an alternative method.
- Create a two-slide summary: "Top drivers" and "Actionable levers" with recommended experiments.
Quick Test
The quick test is available to everyone. Your progress is saved only if you are logged in.