luvv to helpDiscover the Best Free Online Tools
Topic 1 of 9

Linear Models Regression Classification

Learn Linear Models Regression Classification for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

Linear models are the workhorse of applied machine learning. As a Data Scientist, you will use them to quickly establish baselines, interpret feature effects, and ship reliable predictions when data is limited or stakeholders need transparency.

  • Forecasting: sales, demand, conversion rate (linear regression).
  • Binary outcomes: churn, default, click/no-click (logistic regression).
  • Operational analytics: fast models with clear coefficient-based explanations.

Who this is for

  • Aspiring and junior Data Scientists who need solid, explainable baselines.
  • Analysts moving into predictive modeling.
  • Engineers who want practical ML foundations.

Prerequisites

  • Comfort with basic algebra and averages/variances.
  • Python or R familiarity helps, but examples are language-agnostic.
  • Know train/validation/test splits and why they matter.

Concept explained simply

Linear regression predicts a continuous value by adding up weighted features plus an intercept. Example: price = 60 + 20×size.

Logistic regression predicts the probability of a class via a sigmoid of a linear score: probability = 1 / (1 + exp(−(w·x + b))). If probability ≥ threshold (often 0.5), predict class 1.

Regularization helps generalization by shrinking coefficients:

  • L2 (Ridge): pushes coefficients toward zero smoothly; great when many correlated features.
  • L1 (Lasso): can drive some coefficients exactly to zero; useful for feature selection.

Mental model

  • Regression: fit a straight line (or plane) that best passes through the cloud of points.
  • Classification: fit a straight boundary that separates classes; the sigmoid maps distance from the boundary to probability.
  • Regularization: a gentle elastic band (L2) or a sharp tug (L1) that prevents wild coefficients.

Worked examples

Example 1 — Univariate linear regression by hand

Dataset (x in hundreds of sqft, y in $k):

  • (6, 180), (8, 220), (10, 260), (14, 340)

Means: x̄ = 9.5, ȳ = 250. Slope w1 = cov(x,y)/var(x) = 700/35 = 20. Intercept b = ȳ − w1×x̄ = 250 − 20×9.5 = 60.

Model: y = 60 + 20×x. Prediction for x = 12 → ŷ = 60 + 240 = 300.

Example 2 — Logistic regression probability

Model: logit(p) = −4 + 0.1×usage_hours + 0.6×tickets.

For usage=20, tickets=3: logit = −4 + 2 + 1.8 = −0.2 → p = 1/(1+exp(0.2)) ≈ 0.45.

At threshold 0.5, predict class 0; with threshold 0.4, predict class 1.

Example 3 — Ridge vs Lasso with correlated features

Two correlated features (x1, x2). Unregularized model shows unstable coefficients: w = [50, −48].

  • Ridge (L2) with a moderate penalty yields smaller, more stable coefficients, e.g., [10, 8].
  • Lasso (L1) may set one to zero, e.g., [0, 17], effectively selecting features.

Both reduce variance, but Lasso also performs feature selection.

How to fit and evaluate (step-by-step)

  1. Define target and features. Remove obvious leakage (future or post-outcome features).
  2. Split data: train/validation/test (e.g., 60/20/20). For time series, split chronologically.
  3. Scale numeric features (standardize) especially if using regularization.
  4. Fit baseline model without regularization. Record metrics.
  5. Tune regularization (alpha/λ) via cross-validation. Compare validation metrics.
  6. Check residual plots (regression) or calibration/PR curves (classification).
  7. Refit on train+validation with best hyperparameters, then evaluate on test once.
  8. Package model with the same preprocessing steps for deployment.

Common mistakes and self-check

  • Not scaling features before L1/L2 → unfair penalties by scale.
  • Data leakage (using target or future info) → overly optimistic metrics.
  • Using accuracy on imbalanced data → prefer Precision/Recall, PR AUC, or F1.
  • Interpreting raw coefficients without noting feature scale or encoding.
  • Ignoring multicollinearity → unstable, high-variance coefficients.
  • Forgetting the intercept or mishandling dummy variables (dummy trap).
Self-check
  • Can you explain what one unit increase in a standardized feature means for the outcome?
  • Have you checked residual heteroscedasticity (variance vs. fitted values)?
  • Are your positive/negative classes reasonably calibrated?
  • Do coefficients change drastically across folds? Consider more regularization or feature pruning.

Practical projects

  • Regression: Predict weekly sales using price, promotions, and seasonality features. Evaluate with MAE and MAPE.
  • Classification: Predict churn using activity metrics. Track PR AUC and calibration; choose an operating threshold for business cost.
  • Regularization: High-dimensional (e.g., text-encoded) features → compare OLS, Ridge, and Lasso; plot coefficient paths as regularization increases.

Exercises

Do these before the quick test. They mirror the graded exercises below.

  1. ex1 — Compute predictions and MSE (linear regression)

    Model: y = 5 + 2×x1 − 3×x2

    Rows:

    • r1: x1=4, x2=1, y=12
    • r2: x1=0, x2=2, y=−1
    • r3: x1=1.5, x2=−1, y=12

    Task: Compute predictions for each row and the MSE across all three rows.

  2. ex2 — Probability and threshold (logistic regression)

    Model: logit(p) = −2 + 0.8×score − 1.2×is_premium + 0.02×age_scaled

    Sample: score=3, is_premium=0, age_scaled=20

    Task: Compute p and the prediction at thresholds 0.7 and 0.6.

  • Checklist: Did you show intermediate steps?
  • Did you round probabilities sensibly (2–3 decimals)?
  • Did you verify classification changes when threshold changes?

Learning path

  • Before: Data cleaning, feature encoding, train/validation/test splitting.
  • Now: Linear and logistic regression with L1/L2, evaluation, and interpretation.
  • Next: Nonlinear models (trees, ensembles), calibration, and model monitoring.

Mini challenge

You have 2,000 features with many near-duplicates and a small dataset. Build a robust baseline that generalizes well and surfaces key drivers.

  • Which regularization will you start with and why?
  • How will you tune the penalty and evaluate stability across folds?
  • What metric(s) will you report if the positive class is rare?
Suggested approach
  • Start with standardized features and Lasso to reduce dimensionality; compare to Ridge.
  • Use cross-validation to choose λ; track coefficient stability and PR AUC for rare positives.
  • Report calibration and a decision threshold aligned with business cost.

Next steps

  • Try polynomial features and interaction terms; compare performance and interpretability.
  • Plot learning curves to diagnose bias vs. variance and adjust regularization accordingly.
  • Document model assumptions, metrics, and chosen threshold for stakeholders.

Ready to test yourself?

Take the quick test below. It is available to everyone; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Model: y = 5 + 2×x1 − 3×x2

Rows:

  • r1: x1=4, x2=1, y=12
  • r2: x1=0, x2=2, y=−1
  • r3: x1=1.5, x2=−1, y=12

Task: Compute predictions for each row and the MSE across all three rows. Show your steps.

Expected Output
Predictions: [10, -1, 11]; MSE ≈ 1.67

Linear Models Regression Classification — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Linear Models Regression Classification?

AI Assistant

Ask questions about this tool