How to learn Window Functions For Time Features for SQL in Data Scientist for free

Note: Quick Test and exercises are available to everyone; only logged-in users get saved progress.

Why this matters

Data Scientists frequently need time-based features for forecasting, anomaly detection, growth metrics, and user behavior modeling. Window functions let you compute rolling averages, running totals, time since last event, and percent changes efficiently in SQL—close to the raw data—before modeling.

Forecasting: build rolling means and lags as model inputs.
Product analytics: sessionize events and compute time-between-actions.
Finance/RevOps: running revenue, week-over-week or month-over-month change.
IoT/Monitoring: rolling minima/maxima and gaps to detect anomalies.

Who this is for

Data Scientists and Analysts preparing time features for models or dashboards.
Anyone comfortable with basic SQL who needs reliable time-based aggregations.

Prerequisites

Basic SQL SELECT, WHERE, GROUP BY, ORDER BY.
Comfort with timestamps/dates and simple aggregations (SUM, AVG, COUNT).
Understanding that window functions do not reduce rows like GROUP BY does.

Concept explained simply

A window function computes a value for each row, looking at other rows defined by a window (partition and order). For time features, we usually:

PARTITION BY an entity (user_id, product_id, store_id).
ORDER BY event time (event_time, order_date).
Pick a frame (e.g., last 6 rows plus current) to compute rolling stats.

Mental model

Imagine each row carries a small magnifying glass. The partition decides which rows are in the same group. The order aligns them by time. The frame says how far back/forward the magnifying glass can look. The function (AVG, SUM, LAG) inspects only the visible rows to compute the feature for the current row.

Key patterns you will use

LAG/LEAD: grab a previous/next value to compute differences or gaps.
Running totals: SUM(...) OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).
Rolling windows: AVG(...) OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN k PRECEDING AND CURRENT ROW).
Time bucketing: aggregate per day/week/month, then apply window logic at that grain.

Worked examples

Example 1 — Running daily revenue per store

First aggregate to daily grain, then compute a running total.

-- Given table orders(order_id, store_id, order_time, revenue)
-- 1) Aggregate to daily
WITH daily AS (
  SELECT
    store_id,
    CAST(order_time AS DATE) AS order_date,
    SUM(revenue) AS daily_revenue
  FROM orders
  GROUP BY store_id, CAST(order_time AS DATE)
)
SELECT
  store_id,
  order_date,
  daily_revenue,
  SUM(daily_revenue) OVER (
    PARTITION BY store_id
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_revenue
FROM daily
ORDER BY store_id, order_date;

Use this for cumulative KPIs and as a feature for growth trend.

Example 2 — 7-day rolling average of daily orders

On daily data, a row-based frame is simple and robust.

-- Given table daily_orders(store_id, order_date, orders_count)
SELECT
  store_id,
  order_date,
  orders_count,
  AVG(orders_count) OVER (
    PARTITION BY store_id
    ORDER BY order_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS orders_7d_avg
FROM daily_orders
ORDER BY store_id, order_date;

This uses last up to 7 rows (days), including today.

Example 3 — Percent change and time since last event

Compare current value to the previous value and measure gaps.

-- Given table prices(product_id, price_time, price)
SELECT
  product_id,
  price_time,
  price,
  LAG(price) OVER (
    PARTITION BY product_id
    ORDER BY price_time
  ) AS prev_price,
  CASE
    WHEN LAG(price) OVER (PARTITION BY product_id ORDER BY price_time) IS NULL OR
         LAG(price) OVER (PARTITION BY product_id ORDER BY price_time) = 0
      THEN NULL
    ELSE (price - LAG(price) OVER (PARTITION BY product_id ORDER BY price_time))
         / NULLIF(LAG(price) OVER (PARTITION BY product_id ORDER BY price_time), 0)
  END AS pct_change,
  (price_time - LAG(price_time) OVER (
     PARTITION BY product_id
     ORDER BY price_time
  )) AS time_since_last
FROM prices
ORDER BY product_id, price_time;

pct_change is safe against null/zero. time_since_last is an interval; convert to seconds/minutes if needed by your SQL engine.

Practice exercises

Do these in your SQL environment. Keep results deterministic by aggregating to daily grain first when needed.

Exercise 1 — 7-day rolling average of daily orders per store

Table: daily_orders(store_id, order_date DATE, orders_count INT)

Return store_id, order_date, orders_count.
Add a column orders_7d_avg: average of orders_count over the last 7 days (including current day) within the same store_id.
Order by store_id, order_date.

Hints

Use PARTITION BY store_id and ORDER BY order_date.
Frame: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW.

Expected output

Columns: store_id, order_date, orders_count, orders_7d_avg. For the first few days per store, the average uses fewer than 7 rows.

Exercise 2 — Percent change and gap since last price

Table: prices(product_id, price_time TIMESTAMP, price NUMERIC)

Compute prev_price using LAG(price) by product_id ordered by price_time.
Compute pct_change = (price - prev_price) / NULLIF(prev_price, 0).
Compute time_since_last = price_time - LAG(price_time) by the same window.

Hints

Use LAG twice: once for price, once for price_time.
Protect against division by zero with NULLIF.

Expected output

Columns: product_id, price_time, price, prev_price, pct_change, time_since_last. First row per product_id has NULLs for prev_* fields.

Checklist before you move on
- You used PARTITION BY correctly for each entity.
- You ordered by a proper time column.
- You chose a frame that does not peek into the future.
- Edge rows (first N) behave as expected with partial windows.

Common mistakes and self-check

Forgetting ORDER BY in the window: results become non-deterministic. Self-check: remove ORDER BY and see if results change—if yes, fix the window.
Using RANGE for row-count windows on daily data: duplicates per day can distort. Prefer ROWS for exact N-row windows; use RANGE only when you truly need a time interval frame and your warehouse supports it.
Leaking future data: frames like BETWEEN CURRENT ROW AND 6 FOLLOWING will look ahead. For modeling, use PRECEDING only.
Missing PARTITION BY entity: windows run across all entities and mix features. Always partition by user/product/store as needed.
Not aggregating to the right grain first: compute rolling averages on daily data, not raw events, if the feature is daily.
Division by zero in percent change: use NULLIF(prev, 0).

Quick self-checks

Do counts per entity remain the same before vs. after adding window columns? (They should.)
Is the first row per partition producing NULLs where expected (LAG)?
Does the rolling mean for a constant series equal the constant?
Do running totals never decrease?

Practical projects to reinforce learning

Build forecasting features: daily revenue rolling 7/14/28-day means, running totals, and lags for each store.
User activity model: compute time since last session, session length (gap-based), and rolling 7-day active-days per user.
Price monitoring: pct_change, rolling volatility (STDDEV over a window), and time to recovery after a drop.

Learning path

Before this: basic aggregations and date/time functions; grouping by day/week/month.
This lesson: window frames (ROWS vs RANGE), LAG/LEAD, rolling averages, running totals, percent change, time gaps.
After this: feature stores or ETL patterns for time features, windowed joins, and advanced time bucketing or sessionization at scale.

Next steps

Convert features into reusable views (daily grain) to keep modeling datasets fresh.
Add validation queries that check monotonicity and null rates for new features.
Benchmark frame sizes (7/14/28) to see which improves model performance.

Mini challenge

Given events(user_id, event_time, value), produce for each row: (a) value minus previous value, (b) 3-event rolling average of value, and (c) time_since_last. Ensure no future leakage. Keep per-user partitions.

Ready for the Quick Test?

When you feel confident, take the Quick Test below. You can retake it any time. Only logged-in users get saved progress.

Menu

Window Functions For Time Features

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Key patterns you will use

Worked examples

Practice exercises

Exercise 1 — 7-day rolling average of daily orders per store

Exercise 2 — Percent change and gap since last price

Common mistakes and self-check

Practical projects to reinforce learning

Learning path

Next steps

Mini challenge

Ready for the Quick Test?

Practice Exercises

7-day rolling average of daily orders per store

Instructions

Expected Output

Percent change and gap since last price

Window Functions For Time Features — Quick Test

Have questions about Window Functions For Time Features?

AI Assistant