How to learn Time Series Basics for Statistics in Data Scientist for free

Why this matters

As a Data Scientist, you will often forecast demand, detect anomalies, or evaluate experiments that unfold over time. Time series basics help you avoid leakage, choose the right split and baselines, and communicate uncertainty clearly.

Forecasting: sales, signups, traffic, demand, loads.
Monitoring: anomaly detection in metrics and sensors.
Causality/Impact: before/after analysis with seasonality and trends.

Real tasks you might face

Build a weekly forecast for active users with holidays and product launches.
Set a baseline alert rule that respects seasonality (e.g., weekdays vs weekends).
Estimate uplift of a campaign without mistaking trend for effect.

Concept explained simply

A time series is just a sequence of observations collected over time where order matters. Many series can be understood as:

value = level + trend + seasonality + noise

Mental model: Layers of signal

Imagine a moving walkway (trend) with a gentle up/down wave (seasonality) and small bumps (noise). Modeling separates these layers so you can predict the next steps more reliably.

Key ideas you will use

Stationarity: stable mean/variance/covariance over time. Often achieved via differencing or transformations (e.g., log).
Autocorrelation (ACF) and partial autocorrelation (PACF): tools to see how current values relate to past lags.
Seasonality: repeated patterns (7 days, 12 months, etc.).
Temporal splits: never shuffle; use time-aware train/validation/test.
Baselines: naive, seasonal naive, drift. Always benchmark before fancy models.
Metrics: MAE, RMSE, MAPE/sMAPE. Choose based on your context and scale.

Worked examples

Example 1: Spotting seasonality

Monthly values (units): 20, 25, 22, 28, 23, 29, 26, 32, 27, 33, 30, 36, 31, 37, 34, 40

Visual read: values generally increase (trend).
Peaks repeat about every 4 months (seasonality period ≈ 4).
Variance grows slightly with level; consider a log transform if needed.

Example 2: Correct split

You have data from Jan to Dec. You want to forecast Oct–Dec.

Do: train on Jan–Sep; validate on Oct–Nov; test on Dec (or use rolling/walk-forward).
Don’t: randomly split months (leaks future information and overestimates performance).

Example 3: Baselines first

Target is weekly. Start with:

Naive: forecast(t) = value(t-1).
Seasonal naive (period=7 for daily, 52 for weekly): forecast(t) = value(t-period).
Drift: line from first to last point in train, extrapolated.

If your model cannot beat seasonal naive, it likely needs better features or handling of seasonality.

How to do it (quick workflow)

1) Visual check

Plot the series; look for trend, seasonality, level shifts, outliers.
If seasonal amplitude grows with level, try a log transform.

2) Make it stationary (if needed)

First difference: x_t - x_{t-1} removes linear trend.
Seasonal difference: x_t - x_{t-s} removes seasonal component (s = period).
Check ACF/PACF after differencing; aim for quick decay in ACF.

3) Split without leakage

Time-based split only. Keep order.
Use walk-forward validation for robust evaluation: expand the training window step by step.

4) Establish baselines

Compute naive and seasonal naive. Log their MAE/RMSE/MAPE.
These set a minimum to beat with more advanced models.

5) Evaluate with the right metric

MAE: scale-dependent, robust to outliers.
RMSE: penalizes large errors more.
MAPE/sMAPE: scale-independent; avoid MAPE if zeros present.

Who this is for

Aspiring and junior Data Scientists who need reliable forecasting/evaluation practices.
Analysts/Engineers adding time-aware analysis to their toolkit.

Prerequisites

Descriptive stats (mean, variance) and basic regression intuition
Ability to keep data in correct time order and parse timestamps
Comfort with a tool of choice (Python, R, or spreadsheets)

Learning path

Time Series Basics (this page): decomposition, stationarity, splits, baselines, metrics.
Classical models: ARIMA/Seasonal ARIMA, ETS, Prophet-like approaches.
Feature engineering: calendar features, lags, rolling stats, holidays.
ML forecasting: tree-based regressors with lagged features.
Deep learning: RNN/LSTM/Temporal CNN/Transformer for long horizons.

Exercises

These mirror the graded exercises below. You can complete them here first.

Exercise 1: Decompose and diagnose

Monthly series: 20, 25, 22, 28, 23, 29, 26, 32, 27, 33, 30, 36, 31, 37, 34, 40

Smooth with a simple 3-point moving average to reveal trend.
Identify plausible seasonality period.
Apply first difference (x_t - x_{t-1}); describe if stationarity improves.
(Optional) Apply seasonal difference with period you chose; describe result.

Hint

Peaks at similar intervals suggest the period. Differencing should reduce slow-moving trends.

Exercise 2: Baseline forecast and metrics

Series: 120, 121, 123, 125, 126, 128, 129, 130, 128, 131, 129

Use the first 8 values as training and the last 3 as test.
Naive forecast each test point with the last training value.
Compute MAE and MAPE on the test set.

Hint

MAE = mean(|y - y_hat|). MAPE = mean(|(y - y_hat)/y|) × 100. Keep the chronological order.

Common mistakes and self-check

Shuffling time series before splitting. Self-check: did any training point occur after a validation point? If yes, fix split.
Comparing to no baseline. Self-check: report naive and seasonal naive metrics alongside your model.
Ignoring seasonality. Self-check: inspect ACF; strong spikes at seasonal lag suggest seasonal terms or differencing.
Using MAPE with zeros. Self-check: if any actuals are 0, choose sMAPE, MAE, or RMSE.
Overfitting to recent spikes. Self-check: walk-forward validation stability across folds.

Quick self-audit checklist

Time-based split only
Baselines computed and logged
Appropriate metric given zero/nonzero targets
Differencing/transform considered if non-stationary
Walk-forward validation tried for robustness

Practical projects

Retail weekly demand: build naive, seasonal naive, and one improved model. Compare MAE/MAPE across 6 months.
Website daily traffic with holidays: add calendar/holiday features, evaluate with walk-forward.
IoT sensor monitoring: compute rolling mean/STD and flag anomalies relative to seasonal baseline.

Mini challenge

You must forecast daily active users (DAU) for the next 14 days. The series has strong weekly seasonality and occasional product launches.

Choose a split strategy to validate your approach.
Select two baselines and one improved method.
Pick metrics and justify them.
Write 3 bullets explaining uncertainty sources.

Suggested approach (peek)

Use walk-forward with expanding window; baselines: naive and seasonal naive (7-day). Improved: add weekday and launch indicators with lagged features. Metrics: MAE and sMAPE (no zeros). Uncertainty sources: launch timing/impact, concept drift, holiday shifts.

Next steps

Study ARIMA/Seasonal ARIMA to formalize differencing + AR/MA terms.
Learn ETS (error-trend-seasonal) for level/trend/seasonality decomposition.
Practice feature engineering: lags, rolling windows, holiday calendars.

Quick test

Anyone can take the test for free. Logged-in learners have their progress saved automatically.

Menu

Time Series Basics

Table of Contents