Why this matters
Marketing forecasts drive budgets, campaigns, staffing, and inventory. Validating your forecast with backtesting helps you:
- Estimate realistic campaign outcomes before spending.
- Set safer targets for leads, signups, or revenue by channel.
- Stress-test models against seasonality, promos, and holidays.
- Choose models that beat simple baselines, not just look good on one period.
Who this is for
Marketing Analysts, Growth Analysts, and anyone turning historical marketing data (traffic, leads, conversions, revenue) into forward-looking plans.
Prerequisites
- Basic time-series concepts (trend/seasonality).
- Comfort with spreadsheets or analytics tools.
- Ability to compute simple metrics (MAE, MAPE).
Learning path
- Define your forecast question and horizon.
- Pick a baseline and evaluation metric(s).
- Set up rolling backtests (forward-chaining splits).
- Compare model vs baseline across folds.
- Refine, re-test, and monitor after deployment.
Concept explained simply
Validation checks if your forecast would have worked on the past. Backtesting simulates making forecasts from earlier dates (forecast origins) and compares them with what actually happened later.
Mental model
Imagine walking forward on a path. At each step, you can only see what is behind you. You guess what the next step looks like, then you take the step and see if you were right. Repeat. That is rolling-origin backtesting.
Time-series splits vs random splits
Do this: Train on earlier dates, validate on later dates (rolling/forward-chaining). Do not do this: Random shuffling that mixes future and past β it leaks information.
Common metrics (quick guide)
- MAE: Average absolute error. Good when zeros exist and units matter.
- RMSE: Penalizes large errors more. Useful if big misses are costly.
- MAPE/sMAPE: Percent error. Easy to interpret but avoid when actuals can be zero or near-zero.
- MASE: Scale-free vs seasonal-naive. Great for comparing across series.
Tip: Always compare against a simple baseline. If you cannot beat a naive forecast, donβt ship.
How many folds do I need?
Prefer 5β10 folds if feasible. Use at least 3. Ensure folds cover typical seasonality (e.g., include holidays) and your target horizon.
Step-by-step: Validating a marketing forecast
- Define the decision: What will change if the forecast is accurate? Horizon (e.g., 1, 7, or 30 days ahead).
- Choose baselines: Last value (naive), moving average, or seasonal naive (value from same period last year/week).
- Select metrics: One primary (e.g., MAE) + one secondary (e.g., sMAPE). Match to business cost of errors.
- Set rolling windows: Pick expanding or sliding window. Define number of folds, horizon, and step size.
- Avoid leakage: Only use features known at forecast time. Lag, aggregate, or drop future-only data.
- Run folds: For each forecast origin, fit model on the training window, forecast the next horizon, record errors.
- Review stability: Look for consistent improvements across folds, not just one lucky period.
- Decide and document: Keep metrics vs baseline, rationale, and guardrails.
- Refit and monitor: Refit on full history to deploy; track live errors and compare to baseline.
Worked examples
Example 1: Weekly sessions (1-week horizon)
Goal: Forecast next weekβs sessions. Baseline: Last value (naive). Metric: MAPE.
- Weeks 1β4 actuals: 9400, 9700, 9900, 10000
- Weeks 5β8 actuals: 10000, 10500, 9800, 10200
- Model forecasts for weeks 5β8: 9500, 10300, 10100, 10000
Model MAPE:
- W5: |10000β9500|/10000 = 5.0%
- W6: |10500β10300|/10500 β 1.9%
- W7: |9800β10100|/9800 β 3.1%
- W8: |10200β10000|/10200 β 2.0%
- Avg β 3.0%
Naive baseline predictions (use last observed):
- W5: 10000 β 0.0%
- W6: 10000 β β 4.8%
- W7: 10500 β β 7.1%
- W8: 9800 β β 3.9%
- Avg β 4.0%
Conclusion: Model (β3.0%) beats baseline (β4.0%).
Example 2: Daily conversions (1-day horizon)
Goal: Next-day conversions during paid experiments. Metric: MAE (avoids divide-by-zero). Baseline: 7-day moving average.
- Fold errors (Model MAE): 2, 2, 2 β Avg 2
- Fold errors (Baseline MAE): 3, 4, 2 β Avg 3
Conclusion: Model consistently reduces absolute error by ~1 conversion per day.
Example 3: Monthly leads with yearly seasonality (1-month horizon)
Goal: Predict next month leads. Baseline: Seasonal naive (same month last year). Metric: MASE.
- In-sample seasonal naive MAE (denominator): 25 leads
- Model out-of-sample MAE across folds: 21 leads
- MASE = 21 / 25 = 0.84
Conclusion: MASE < 1 means the model beats seasonal naive.
Common mistakes and self-check
- Random splits: Using standard cross-validation. Self-check: Are all validation dates after training dates?
- Leakage: Features peek into the future (e.g., final monthly totals). Self-check: Could I know this feature at the forecast time?
- No baseline: Hard to judge value. Self-check: Do I beat last value or seasonal naive?
- Wrong metric: MAPE with zeros, or only RMSE when large misses arenβt costlier. Self-check: Does the metric reflect business cost?
- Too few folds: One lucky window. Self-check: Do results hold across multiple seasons and promotions?
- Not refitting per fold: Overestimates performance. Self-check: Do I refit before each fold prediction?
- Mixing horizons: Comparing 1-day and 14-day errors. Self-check: Is the horizon fixed during evaluation?
Practical projects
- Channel lead forecast: Backtest 1-month-ahead leads by channel using seasonal naive baseline and MASE.
- Promo uplift forecast: Forecast incremental revenue during promos; evaluate MAE and RMSE; include a non-promo baseline.
- Traffic forecast for staffing: 1-week-ahead sessions; compare last-value vs moving-average vs model; report mean and worst-fold errors.
Exercises
Exercise 1 β Compute model vs baseline MAPE
Use the weekly sessions data from Example 1. Calculate MAPE for the model and for the last-value baseline, then state which wins and by how much.
- Actuals (weeks 5β8): 10000, 10500, 9800, 10200
- Model forecasts: 9500, 10300, 10100, 10000
- Baseline forecasts (last value): 10000, 10000, 10500, 9800
Hint
MAPE = mean(|AβF|/A). Compute per week, then average.
Expected outcome
Model MAPE β 3.0%; Baseline MAPE β 4.0%; Model wins by ~1.0 percentage point.
Exercise 2 β Design a rolling backtest plan
Scenario: You have 180 days of daily conversions and need a 14-day-ahead forecast for planning. Draft a plan: window type (expanding or sliding), number of folds, step size, baseline(s), metric(s), and leakage checks.
Hint
- Prefer 5β8 folds that cover weekends and seasonality.
- Pick MAE if zeros exist; add sMAPE for scale-free comparison.
Expected outcome
Example: Expanding window; 6 folds; each fold predicts next 14 days; step size 14 days; baselines: last value and 7-day moving average; metrics: MAE (primary), sMAPE (secondary); leakage checks: only features known at the origin, lag all aggregates.
- [Checklist] Before you finalize:
- Fixed horizon and step size
- At least 3β5 folds
- Baseline(s) included
- Right metric(s) for business cost
- Leakage prevented
Mini challenge
Your team wants a 4-week-ahead forecast of qualified leads to set KPI targets. You have 3 years of monthly data with strong seasonality and occasional promos. Design a quick backtest: choose baseline(s), primary metric, window strategy, and how you will communicate uncertainty to stakeholders.
One possible approach
- Baseline: Seasonal naive (same month last year)
- Metric: MASE (primary), sMAPE (secondary)
- Window: Expanding; at least 12 folds to span seasons
- Uncertainty: Report median error, 80/95% empirical error bands from folds; highlight worst-case fold during promo months
Next steps
- Add exogenous features available at forecast time (promo flags, holidays).
- Estimate prediction intervals using empirical errors from backtests.
- Monitor live: compare production errors vs backtest; trigger alerts if drift exceeds thresholds.
Ready for a quick test?
Take the quick test to check your understanding. Note: The quick test is available to everyone; only logged-in users get saved progress.