luvv to helpDiscover the Best Free Online Tools
Topic 6 of 9

Regression Basics

Learn Regression Basics for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Who this is for

  • Beginner to intermediate Data Scientists who need a practical start with linear regression.
  • Analysts and engineers who want to predict a numeric value (price, time, demand) using one or more features.
  • Anyone preparing for interviews that test regression intuition and basics.

Prerequisites

  • Comfort with averages, variance, and correlation.
  • Basic plotting skills (scatterplots).
  • Ability to work with small datasets in a spreadsheet or notebook.

Why this matters

Data Scientists use regression to:

  • Forecast sales or demand from predictors like price or ad spend.
  • Estimate the impact of a feature (e.g., experience on salary).
  • Create baseline models that are fast to build and easy to explain to stakeholders.
  • Check relationships before deploying complex models.

Concept explained simply

Linear regression fits a straight line that best predicts a numeric target y from a predictor x. In simple linear regression there is one x.

Mental model

Imagine placing a ruler through a cloud of points (x, y). You tilt and shift it until the total squared vertical distances between points and the line (residuals) are as small as possible. That is the "best-fit" line.

Key terms
  • Model: ŷ = b0 + b1 x
  • Intercept (b0): predicted y when x = 0
  • Slope (b1): average change in y when x increases by 1
  • Residual: e = y − ŷ
  • Loss: typically mean squared error (MSE); RMSE = sqrt(MSE); MAE = mean absolute error
  • R²: proportion of variance in y explained by the model (between 0 and 1)
Assumptions to keep in mind
  • Linearity: the relationship is roughly straight.
  • Independence: observations are not dependent sequences.
  • Homoscedasticity: residual spread is roughly constant across x.
  • Normality of residuals: helpful for inference/intervals; prediction can still work reasonably without perfect normality.

Quick formulas (simple linear regression)

Cheat sheet
  • Slope b1 = Σ(x−x̄)(y−ȳ) / Σ(x−x̄)²
  • Intercept b0 = ȳ − b1 x̄
  • Residual for i: eᵢ = yᵢ − (b0 + b1 xᵢ)
  • MSE = (1/n) Σ eᵢ²; RMSE = sqrt(MSE); MAE = (1/n) Σ |eᵢ|
  • R² = 1 − (Σ eᵢ² / Σ (yᵢ − ȳ)²)

Worked examples

Example 1: Fit a line and make a prediction

Data: x = [1, 2, 3, 4], y = [2, 3, 5, 4]

  • x̄ = 2.5, ȳ = 3.5
  • Σ(x−x̄)(y−ȳ) = 4.0; Σ(x−x̄)² = 5.0
  • b1 = 4/5 = 0.8; b0 = 3.5 − 0.8×2.5 = 1.5
  • Model: ŷ = 1.5 + 0.8x
  • Predict x=5: ŷ = 1.5 + 0.8×5 = 5.5
  • Residual at x=3: y−ŷ = 5 − (1.5 + 0.8×3) = 1.1
Example 2: Interpret coefficients and R²

Suppose Salary_k$ = 30 + 3.2 × Experience_years, with R² = 0.64.

  • Intercept 30: predicted salary is 30k when experience is 0 (may be outside data range; interpret cautiously).
  • Slope 3.2: each extra year adds about 3.2k on average.
  • R² = 0.64: 64% of salary variance is explained by experience in this simple model.

Note: Real salaries vary with many factors. Varies by country/company; treat as rough ranges.

Example 3: Compute RMSE

Observed y: [10, 8, 9, 11]; Predicted ŷ: [9, 9, 8, 12]

  • Residuals e: [1, −1, 1, −1]
  • e²: [1, 1, 1, 1] → MSE = (1+1+1+1)/4 = 1
  • RMSE = sqrt(1) = 1

Try it yourself: Exercises

Do these on paper, a spreadsheet, or a notebook. Match your answers with the solutions.

Exercise 1 (ex1): Compute slope and intercept

Data: x = [1, 2, 4, 5], y = [1, 3, 3, 4]. Find b1 and b0. Predict y when x = 3.

Exercise 2 (ex2): Interpret and predict

Given ŷ = 12 − 0.8x, compute the prediction at x = 5. Explain the slope in one sentence.

Exercise 3 (ex3): Residuals and RMSE

Observed y = [5, 7, 6, 8, 7]; Predicted ŷ = [5.5, 6.5, 6.0, 7.5, 7.0]. Compute residuals and RMSE.

Self-check checklist
  • I centered x and y around their means when computing the slope.
  • I used b0 = ȳ − b1 x̄ correctly.
  • I defined residuals as y − ŷ (not the other way around).
  • I squared residuals before averaging for MSE/RMSE.
  • My interpretations avoid claiming causation from correlation.

Common mistakes and how to self-check

  • Forgetting the intercept: Do not force the line through (0,0) unless justified. Self-check: Does x=0 exist in your data range and imply y≈0?
  • Confusing residual with error direction: Residual is y − ŷ. Self-check: Recompute for one point.
  • Using R² alone to judge models: R² can be high for the wrong reasons. Self-check: Inspect residual plots for patterns or funnel shapes.
  • Extrapolating far beyond data: Linear trend may not hold. Self-check: Mark min/max x and keep predictions within range for business decisions unless justified.
  • Assuming causation: Regression shows association. Self-check: Consider missing variables or confounders.

Practical projects

  • Price vs. size: Fit a simple model predicting house price from square footage. Report slope, RMSE, and two limitations.
  • Time-to-complete vs. experience: Predict task completion time from months of experience. Compare RMSE on train vs. holdout split.
  • Marketing spend vs. weekly sales: Fit the model for last 10 weeks and forecast next week. Include a residual plot and comment on linearity.

Learning path

  1. Start: Simple linear regression (this lesson) → compute slope/intercept by hand on small data.
  2. Next: Multiple linear regression (handling several predictors).
  3. Then: Regularization (Ridge/Lasso), interaction terms, and polynomial features.
  4. Finally: Model validation (cross-validation), robust metrics, and deployment basics.

Mini challenge

You have weekly data: AdSpend_k$ = [1, 2, 3, 4, 5], Sales_k = [8, 9, 11, 12, 14].

  • Fit ŷ = b0 + b1 x by hand (or spreadsheet).
  • Predict Sales when AdSpend = 2.5k.
  • In one sentence, explain the slope to a non-technical stakeholder.
Hint

Compute x̄, ȳ, then b1 = Σ(x−x̄)(y−ȳ) / Σ(x−x̄)² and b0 = ȳ − b1 x̄.

Next steps

  • Repeat these steps on a dataset you know (work or open data).
  • Create a simple residual plot and comment on linearity and variance.
  • Move on to multiple linear regression and check how coefficients change.

Quick Test

There is a short test below. Everyone can take it for free. If you sign in, your progress will be saved automatically.

Practice Exercises

3 exercises to complete

Instructions

Data: x = [1, 2, 4, 5], y = [1, 3, 3, 4].

  • Calculate x̄ and ȳ.
  • Compute b1 = Σ(x−x̄)(y−ȳ) / Σ(x−x̄)².
  • Compute b0 = ȳ − b1 x̄.
  • Predict y when x = 3.
Expected Output
b1 ≈ 0.6, b0 ≈ 0.95; prediction at x=3 is ≈ 2.75

Regression Basics — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Regression Basics?

AI Assistant

Ask questions about this tool