luvv to helpDiscover the Best Free Online Tools
Topic 8 of 8

Forecast Validation And Backtesting Basics

Learn Forecast Validation And Backtesting Basics for free with explanations, exercises, and a quick test (for Marketing Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Why this matters

Marketing forecasts drive budgets, campaigns, staffing, and inventory. Validating your forecast with backtesting helps you:

  • Estimate realistic campaign outcomes before spending.
  • Set safer targets for leads, signups, or revenue by channel.
  • Stress-test models against seasonality, promos, and holidays.
  • Choose models that beat simple baselines, not just look good on one period.

Who this is for

Marketing Analysts, Growth Analysts, and anyone turning historical marketing data (traffic, leads, conversions, revenue) into forward-looking plans.

Prerequisites

  • Basic time-series concepts (trend/seasonality).
  • Comfort with spreadsheets or analytics tools.
  • Ability to compute simple metrics (MAE, MAPE).

Learning path

  1. Define your forecast question and horizon.
  2. Pick a baseline and evaluation metric(s).
  3. Set up rolling backtests (forward-chaining splits).
  4. Compare model vs baseline across folds.
  5. Refine, re-test, and monitor after deployment.

Concept explained simply

Validation checks if your forecast would have worked on the past. Backtesting simulates making forecasts from earlier dates (forecast origins) and compares them with what actually happened later.

Mental model

Imagine walking forward on a path. At each step, you can only see what is behind you. You guess what the next step looks like, then you take the step and see if you were right. Repeat. That is rolling-origin backtesting.

Time-series splits vs random splits

Do this: Train on earlier dates, validate on later dates (rolling/forward-chaining). Do not do this: Random shuffling that mixes future and past β€” it leaks information.

Common metrics (quick guide)
  • MAE: Average absolute error. Good when zeros exist and units matter.
  • RMSE: Penalizes large errors more. Useful if big misses are costly.
  • MAPE/sMAPE: Percent error. Easy to interpret but avoid when actuals can be zero or near-zero.
  • MASE: Scale-free vs seasonal-naive. Great for comparing across series.

Tip: Always compare against a simple baseline. If you cannot beat a naive forecast, don’t ship.

How many folds do I need?

Prefer 5–10 folds if feasible. Use at least 3. Ensure folds cover typical seasonality (e.g., include holidays) and your target horizon.

Step-by-step: Validating a marketing forecast

  1. Define the decision: What will change if the forecast is accurate? Horizon (e.g., 1, 7, or 30 days ahead).
  2. Choose baselines: Last value (naive), moving average, or seasonal naive (value from same period last year/week).
  3. Select metrics: One primary (e.g., MAE) + one secondary (e.g., sMAPE). Match to business cost of errors.
  4. Set rolling windows: Pick expanding or sliding window. Define number of folds, horizon, and step size.
  5. Avoid leakage: Only use features known at forecast time. Lag, aggregate, or drop future-only data.
  6. Run folds: For each forecast origin, fit model on the training window, forecast the next horizon, record errors.
  7. Review stability: Look for consistent improvements across folds, not just one lucky period.
  8. Decide and document: Keep metrics vs baseline, rationale, and guardrails.
  9. Refit and monitor: Refit on full history to deploy; track live errors and compare to baseline.

Worked examples

Example 1: Weekly sessions (1-week horizon)

Goal: Forecast next week’s sessions. Baseline: Last value (naive). Metric: MAPE.

  • Weeks 1–4 actuals: 9400, 9700, 9900, 10000
  • Weeks 5–8 actuals: 10000, 10500, 9800, 10200
  • Model forecasts for weeks 5–8: 9500, 10300, 10100, 10000

Model MAPE:

  • W5: |10000βˆ’9500|/10000 = 5.0%
  • W6: |10500βˆ’10300|/10500 β‰ˆ 1.9%
  • W7: |9800βˆ’10100|/9800 β‰ˆ 3.1%
  • W8: |10200βˆ’10000|/10200 β‰ˆ 2.0%
  • Avg β‰ˆ 3.0%

Naive baseline predictions (use last observed):

  • W5: 10000 β†’ 0.0%
  • W6: 10000 β†’ β‰ˆ 4.8%
  • W7: 10500 β†’ β‰ˆ 7.1%
  • W8: 9800 β†’ β‰ˆ 3.9%
  • Avg β‰ˆ 4.0%

Conclusion: Model (β‰ˆ3.0%) beats baseline (β‰ˆ4.0%).

Example 2: Daily conversions (1-day horizon)

Goal: Next-day conversions during paid experiments. Metric: MAE (avoids divide-by-zero). Baseline: 7-day moving average.

  • Fold errors (Model MAE): 2, 2, 2 β†’ Avg 2
  • Fold errors (Baseline MAE): 3, 4, 2 β†’ Avg 3

Conclusion: Model consistently reduces absolute error by ~1 conversion per day.

Example 3: Monthly leads with yearly seasonality (1-month horizon)

Goal: Predict next month leads. Baseline: Seasonal naive (same month last year). Metric: MASE.

  • In-sample seasonal naive MAE (denominator): 25 leads
  • Model out-of-sample MAE across folds: 21 leads
  • MASE = 21 / 25 = 0.84

Conclusion: MASE < 1 means the model beats seasonal naive.

Common mistakes and self-check

  • Random splits: Using standard cross-validation. Self-check: Are all validation dates after training dates?
  • Leakage: Features peek into the future (e.g., final monthly totals). Self-check: Could I know this feature at the forecast time?
  • No baseline: Hard to judge value. Self-check: Do I beat last value or seasonal naive?
  • Wrong metric: MAPE with zeros, or only RMSE when large misses aren’t costlier. Self-check: Does the metric reflect business cost?
  • Too few folds: One lucky window. Self-check: Do results hold across multiple seasons and promotions?
  • Not refitting per fold: Overestimates performance. Self-check: Do I refit before each fold prediction?
  • Mixing horizons: Comparing 1-day and 14-day errors. Self-check: Is the horizon fixed during evaluation?

Practical projects

  • Channel lead forecast: Backtest 1-month-ahead leads by channel using seasonal naive baseline and MASE.
  • Promo uplift forecast: Forecast incremental revenue during promos; evaluate MAE and RMSE; include a non-promo baseline.
  • Traffic forecast for staffing: 1-week-ahead sessions; compare last-value vs moving-average vs model; report mean and worst-fold errors.

Exercises

Exercise 1 β€” Compute model vs baseline MAPE

Use the weekly sessions data from Example 1. Calculate MAPE for the model and for the last-value baseline, then state which wins and by how much.

  • Actuals (weeks 5–8): 10000, 10500, 9800, 10200
  • Model forecasts: 9500, 10300, 10100, 10000
  • Baseline forecasts (last value): 10000, 10000, 10500, 9800
Hint

MAPE = mean(|Aβˆ’F|/A). Compute per week, then average.

Expected outcome

Model MAPE β‰ˆ 3.0%; Baseline MAPE β‰ˆ 4.0%; Model wins by ~1.0 percentage point.

Exercise 2 β€” Design a rolling backtest plan

Scenario: You have 180 days of daily conversions and need a 14-day-ahead forecast for planning. Draft a plan: window type (expanding or sliding), number of folds, step size, baseline(s), metric(s), and leakage checks.

Hint
  • Prefer 5–8 folds that cover weekends and seasonality.
  • Pick MAE if zeros exist; add sMAPE for scale-free comparison.
Expected outcome

Example: Expanding window; 6 folds; each fold predicts next 14 days; step size 14 days; baselines: last value and 7-day moving average; metrics: MAE (primary), sMAPE (secondary); leakage checks: only features known at the origin, lag all aggregates.

  • [Checklist] Before you finalize:
    • Fixed horizon and step size
    • At least 3–5 folds
    • Baseline(s) included
    • Right metric(s) for business cost
    • Leakage prevented

Mini challenge

Your team wants a 4-week-ahead forecast of qualified leads to set KPI targets. You have 3 years of monthly data with strong seasonality and occasional promos. Design a quick backtest: choose baseline(s), primary metric, window strategy, and how you will communicate uncertainty to stakeholders.

One possible approach
  • Baseline: Seasonal naive (same month last year)
  • Metric: MASE (primary), sMAPE (secondary)
  • Window: Expanding; at least 12 folds to span seasons
  • Uncertainty: Report median error, 80/95% empirical error bands from folds; highlight worst-case fold during promo months

Next steps

  • Add exogenous features available at forecast time (promo flags, holidays).
  • Estimate prediction intervals using empirical errors from backtests.
  • Monitor live: compare production errors vs backtest; trigger alerts if drift exceeds thresholds.

Ready for a quick test?

Take the quick test to check your understanding. Note: The quick test is available to everyone; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Using the weekly sessions data below, compute MAPE for both the model and the last-value baseline. Who wins and by how much?

  • Actuals (weeks 5–8): 10000, 10500, 9800, 10200
  • Model forecasts: 9500, 10300, 10100, 10000
  • Baseline forecasts (last value): 10000, 10000, 10500, 9800

Compute per-week percent errors, then average.

Expected Output
Model MAPE β‰ˆ 3.0%; Baseline MAPE β‰ˆ 4.0%; Model better by ~1.0 percentage point.

Forecast Validation And Backtesting Basics β€” Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Forecast Validation And Backtesting Basics?

AI Assistant

Ask questions about this tool