luvv to helpDiscover the Best Free Online Tools
Topic 5 of 9

Time Based Features Lags Rolling Windows

Learn Time Based Features Lags Rolling Windows for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

Time-based features turn raw timestamps into predictive signal. As a Data Scientist, you will forecast demand, detect anomalies, predict churn, and plan capacity. Lags capture how the past influences the future; rolling windows summarize recent behavior. Getting these right avoids leakage, boosts model accuracy, and makes features interpretable for stakeholders.

  • Real tasks: sales forecasting, energy load prediction, fraud spikes, user retention trends, SLA breach risk.
  • Impact: more stable models, better early warning signals, and actionable operational insights.

Who this is for

  • Data Scientists and Analysts building predictive or monitoring models on time-stamped data.
  • MLOps/Analytics Engineers implementing feature pipelines.

Prerequisites

  • Comfort with pandas or SQL window functions.
  • Understanding of train/validation/test splits.
  • Basic stats: mean, std, median, percent change.

Concept explained simply

- Lag: the value from k steps ago (e.g., yesterday's sales).
- Rolling window: a moving slice of the past (e.g., last 7 days) summarized by functions (mean, sum, std, min/max).
- Expanding window: from the beginning up to now (cumulative stats).
- Horizon: how far ahead you predict (e.g., t+7 days). Choose lags/rolls that strictly use data before the forecast timestamp.

Mental model

Imagine a conveyor belt of time. At each moment t, you are allowed to look behind you (lags, rolling windows) but never ahead. Your features are little gauges summarizing what happened just behind you.

Leakage rules (open me)
  • Never compute features using future rows relative to the prediction timestamp.
  • When doing cross-validation for time series, fit scalers/encoders only on training folds.
  • For grouped data (e.g., per store/user), compute lags/rolls within each group.

Worked examples

Example 1: Daily sales forecasting

Goal: Predict sales for day t using past values.

# Pseudocode with pandas
# df columns: date, sales (daily)
df = df.sort_values('date')
df['lag_1'] = df['sales'].shift(1)
df['lag_7'] = df['sales'].shift(7)
# Right-aligned (uses previous 7 days, excludes today)
df['roll7_mean'] = df['sales'].shift(1).rolling(window=7).mean()
df['roll7_std']  = df['sales'].shift(1).rolling(window=7).std()

# Split by date to avoid leakage
train = df[df['date'] < '2023-07-01']
valid = df[(df['date'] >= '2023-07-01') & (df['date'] < '2023-08-01')]

Why shift before rolling? It ensures today's features never peek at today's target.

Example 2: Customer churn risk

Goal: Predict if a user will churn in the next 30 days using behavioral features.

  • days_since_last_event: difference between t and last event timestamp.
  • events_last_7d: rolling 7-day count of actions per user.
  • rolling_28d_amount_mean: average spend in last 28 days.
# df: user_id, ts, event_count, amount
# Ensure per-user time order
df = df.sort_values(['user_id','ts'])
# Grouped features
s = df.groupby('user_id')['event_count']
df['lag_events_1'] = s.shift(1)
df['events_last_7d'] = s.shift(1).rolling('7D', on=None).sum()  # alternative: resample per user then roll

With irregular timestamps, consider resampling to daily per user, then apply rolling windows on a regular grid.

Example 3: Anomaly detection with rolling z-score

Goal: Flag unusual spikes in metrics.

# df: timestamp, metric
m = df['metric'].shift(1)
mean = m.rolling(window=24).mean()
std  = m.rolling(window=24).std()
df['zscore_24'] = (df['metric'] - mean) / std
# High |z| suggests anomaly
Choosing window sizes
  • Match seasonality (7-day for weekly seasonality, 24-hour for daily hourly data).
  • Use multiple windows (short + long) to capture short trends and baseline.
  • Validate via time-based CV; do not trust in-sample gains.

Step-by-step workflow

  1. Sort data by time (and by entity for panel data).
  2. Define prediction horizon (e.g., t+1 day). All features must be strictly before t.
  3. Create lags (1, 7, 14, season length) per entity.
  4. Create rolling features (mean/sum/std/min/max) with right-aligned windows.
  5. Handle missing starts (NaNs) via careful imputation or allow model to learn "cold-start".
  6. Split using time-based folds; fit preprocessing only on training folds.
  7. Monitor feature drift over time.

Feature checklist

  • Data sorted by time within each group
  • Lags/rolls use shift before rolling
  • Grouped by entity (store/user) where applicable
  • Windows match natural seasonality
  • No future leakage in any transform
  • Validated with time-series CV

Exercises

Do these to solidify concepts. Everyone can take them; only logged-in users get saved progress.

Exercise 1 — Daily sales lags and rolling

You have daily sales:

date        sales
2023-01-01  10
2023-01-02  12
2023-01-03  11
2023-01-04  13
2023-01-05  15
2023-01-06  14
2023-01-07  16
2023-01-08  20
2023-01-09  18

Create features: lag_1, lag_7, roll3_mean (right-aligned, exclude today). Show the resulting table for the last three dates.

Show solution
For 2023-01-07:
- lag_1 = 14
- lag_7 = NaN (no 7-day history)
- roll3_mean = mean(2023-01-04..06) = (13+15+14)/3 = 14.0

For 2023-01-08:
- lag_1 = 16
- lag_7 = 12
- roll3_mean = mean(2023-01-05..07) = (15+14+16)/3 = 15.0

For 2023-01-09:
- lag_1 = 20
- lag_7 = 11
- roll3_mean = mean(2023-01-06..08) = (14+16+20)/3 = 16.67

Exercise 2 — Per-user features

Events for user U1 (timestamps):

2023-05-01 1 event
2023-05-02 2 events
2023-05-05 1 event
2023-05-10 3 events

Compute for each event row: days_since_last_event and events_last_7d (count of events in the previous 7 days, excluding the current row). Show values for the last two rows.

Show solution
Row 3 (2023-05-05):
- days_since_last_event = 3 days (from 2023-05-02)
- events_last_7d = events on 2023-05-01 and 2023-05-02 = 1 + 2 = 3

Row 4 (2023-05-10):
- days_since_last_event = 5 days (from 2023-05-05)
- events_last_7d = events between 2023-05-03..2023-05-09: only 2023-05-05 (1) => 1

Common mistakes and self-checks

Common mistakes
  • Leakage from improper rolling: computing rolling without shift includes the target day.
  • Mixing entities: forgetting groupby leads to cross-entity leakage.
  • Misaligned windows: left vs right alignment confusion.
  • Imputing with future knowledge: forward/back-filling across the split boundary.
  • Resampling pitfalls: aggregating using full-range stats before splitting.
Self-check
  • When you predict at time t, can any feature use data from t or later? Answer: No.
  • Do features change if you shuffle rows? If yes, your method depends on order and is risky.
  • Did you validate with time-based folds and freeze preprocessing to the training fold only?

Practical projects

  • Retail demand forecast: Build lag_1, lag_7, roll7_mean/std, holiday flags, and evaluate with time-series CV.
  • Energy load modeling: Hourly data with lag_1, lag_24, roll24/168 means; add temperature rolling stats.
  • Support ticket volume: Weekly forecast with expanding mean and recent-2-week momentum features.

Learning path

  1. Time-based features (this lesson).
  2. Time-aware cross-validation and backtesting.
  3. Stationarity, differencing, and seasonal decomposition.
  4. Feature selection and importance over time.
  5. Deployment: reproducible feature pipelines with scheduled recomputation.

Next steps

  • Apply lags/rolling to one of your datasets.
  • Run a simple model baseline; add features incrementally and track uplift.
  • Take the quick test to verify understanding.

Mini challenge

You must forecast daily volume 28 days ahead. Propose 6 features (mix of lags and rolling windows) that avoid leakage and capture weekly seasonality. Write one sentence explaining each feature's intent and why it respects the t+28 horizon.

Note: Tests are available to everyone; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Given daily sales for 9 days, create lag_1, lag_7, and roll3_mean (right-aligned; exclude current day by shifting before rolling). Provide the final three rows with these features filled.
Expected Output
For 2023-01-07: lag_1=14, lag_7=NaN, roll3_mean=14.0; For 2023-01-08: lag_1=16, lag_7=12, roll3_mean=15.0; For 2023-01-09: lag_1=20, lag_7=11, roll3_mean≈16.67

Time Based Features Lags Rolling Windows — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Time Based Features Lags Rolling Windows?

AI Assistant

Ask questions about this tool