How to learn Datetime And Time Series Handling for Python (pandas and numpy) in Data Scientist for free

Why this matters

As a Data Scientist, time is part of the data. You will clean messy timestamps, align logs from different systems, resample sensor data, compute rolling metrics, and handle time zones and daylight saving time. Strong datetime and time series handling in pandas and NumPy lets you build reliable features for forecasting, anomaly detection, cohort analysis, and product analytics.

Product analytics: Convert event timestamps to user sessions, daily active users (DAU), and retention.
Finance: Resample ticks to OHLC candles, compute moving averages and volatility.
IoT/sensors: Align irregular measurements, fill gaps, and create rolling features.
Operations: Aggregate logs by minute/hour, detect spikes, and time-align multiple sources.

Progress note

The quick test is available to everyone. If you are logged in, your progress is saved automatically.

Concept explained simply

Datetime handling turns raw strings like "2024-01-05 08:00" into typed timestamps. Time series handling means working with a timeline-aware index so operations respect time order and calendar rules.

Mental model

Think timeline: Your index is a ruler of time. If it’s a DatetimeIndex, pandas understands calendar frequencies.
Resample vs groupby: resample makes new time bins (e.g., daily), groupby with a time grouper aggregates over existing timestamps.
Windows: rolling('7D') uses a time span; rolling(7) uses a fixed number of rows. Choose based on how your data is spaced.
Time zones: tz_localize attaches a time zone to naive timestamps; tz_convert converts between zones.

Core tools and quick cheatsheet

Parsing: pd.to_datetime(s, format=..., errors='coerce')
Indexing: df = df.set_index('timestamp').sort_index()
Resampling: df.resample('D').mean(), asfreq('D') (no aggregation)
Filling gaps: asfreq + ffill()/bfill(), or interpolate()
Rolling: rolling('7D').mean(), rolling(7).sum(), ewm(alpha=...)
Time zones: dt.tz_localize('UTC'), tz_convert('US/Eastern')
Grouping by time: df.groupby(pd.Grouper(freq='M'))
Rounding: dt.floor('H'), dt.ceil('D'), dt.round('15T')

Frequency aliases (handy subset)

T = minute (e.g., '15T')
H = hour
D = day
B = business day
W = week (e.g., 'W-MON')
M = month end, MS = month start
Q = quarter end, A = year end

Worked examples

Example 1 — Parse, clean, index, fill

import pandas as pd

df = pd.DataFrame({
    'date': ['2024/01/01', '2024-01-02', '03-01-2024', '2024-01-05 08:00', 'bad'],
    'value': [10, None, 12, 15, 9]
})
# 1) Parse tolerant to mixed formats
s = pd.to_datetime(df['date'], errors='coerce', dayfirst=True)
# 2) Keep parsed rows only
clean = df[s.notna()].copy()
clean['timestamp'] = s[s.notna()]
# 3) Set index, sort, localize UTC
clean = clean.set_index('timestamp').sort_index()
clean = clean.tz_localize('UTC', nonexistent='shift_forward', ambiguous='NaT')
# 4) Create daily frame and forward-fill
out = clean[['value']].asfreq('D').ffill()
print(out)

What to notice

errors='coerce' drops bad rows safely.
dayfirst=True disambiguates 03-01-2024 -> 2024-01-03.
asfreq('D') creates daily rows; ffill copies last known value.

Example 2 — Resample minute ticks to daily OHLC

import pandas as pd

data = {
    'ts': pd.to_datetime([
        '2024-01-02 09:30', '2024-01-02 09:31', '2024-01-02 16:00',
        '2024-01-03 09:30', '2024-01-03 16:00'
    ]),
    'price': [100, 101, 98, 102, 104]
}
df = pd.DataFrame(data).set_index('ts').sort_index()
# Daily candles
ohlc = df['price'].resample('D').agg(['first','max','min','last']).dropna(how='all')
print(ohlc)

What to notice

resample('D') aligns by calendar day.
dropna removes empty days (no trades).

Example 3 — Time-based vs row-count rolling windows

import pandas as pd

ts = pd.to_datetime([
    '2024-01-01 00:00', '2024-01-01 12:00', '2024-01-01 23:00', '2024-01-02 00:00'
])
val = [0, 12, 23, 24]
df = pd.DataFrame({'value': val}, index=ts)

m_time = df.rolling('24H').mean().iloc[-1, 0]
m_rows = df.rolling(3).mean().iloc[-1, 0]
print(round(m_time, 2), round(m_rows, 2))  # 14.75 vs 19.67

What to notice

rolling('24H') includes all points in the last 24 hours of wall time.
rolling(3) includes the last 3 rows only; results differ when spacing is irregular.

Example 4 — Time zones: localize vs convert

import pandas as pd

df = pd.DataFrame({
    'local_ts': ['2024-03-10 01:30', '2024-03-10 03:30'],  # US DST change date
    'value': [1, 2]
})
# Parse as naive, then localize to US/Eastern (attach tz)
df['local_ts'] = pd.to_datetime(df['local_ts'])
df = df.set_index('local_ts').tz_localize('US/Eastern', nonexistent='shift_forward')
# Convert to UTC
utc_df = df.tz_convert('UTC')
print(utc_df.index.tz)

What to notice

tz_localize attaches a time zone interpretation to naive timestamps.
nonexistent='shift_forward' handles DST spring-forward gap safely.
tz_convert changes the reference zone without changing the moment in time.

How to do it step-by-step

Parse and validate: Use pd.to_datetime(..., errors='coerce'). Drop or handle NaT.
Set the index: set_index on the timestamp column and sort_index.
Choose frequency: resample for aggregation; asfreq for structure without aggregation.
Fill gaps: ffill/bfill for step signals; interpolate for continuous measurements.
Create features: rolling windows (time-based if irregular), ewm for exponentially weighted stats, dt.hour, dt.dayofweek, etc.
Handle time zones: tz_localize first, then tz_convert if needed.

Exercises

Mirror of the tasks below. Try them directly in a Python notebook or REPL.

Exercise ex1 — Clean mixed dates and create a daily series

Goal: Parse mixed date formats, remove bad rows, index by UTC time, and create a daily forward-filled series.

# See the Exercises section at the bottom of this page for full instructions and expected output.

Exercise ex2 — Build daily OHLC from minute prices

Goal: Resample minute-level prices into daily first, max, min, last.

Exercise ex3 — Compare rolling('24H') vs rolling(3)

Goal: Show different results when timestamps are irregular.

[ ] I parsed dates with errors='coerce' and inspected NaT.
[ ] My index is a sorted DatetimeIndex with a known time zone.
[ ] I used resample for aggregation and asfreq for structural gaps.
[ ] I tested both time-based and row-count rolling windows.
[ ] I verified results around DST boundaries.

Common mistakes and self-check

Using resample when you need asfreq: If you see unexpected means or sums, you probably aggregated unintentionally. Self-check: Compare asfreq('D') results first.
Forgetting to sort the index: Rolling and resample assume time order. Self-check: df.index.is_monotonic_increasing should be True.
Mixing naive and tz-aware timestamps: Operations will error or silently misalign. Self-check: df.index.tz is consistent across frames.
Row-count windows on irregular data: Can bias results. Self-check: Compare with a time-based window and explain differences.
Not handling DST: Gaps or duplicates appear. Self-check: Use nonexistent='shift_forward' or appropriate policy and inspect around DST dates.

Practical projects

Web analytics daily metrics: Ingest event timestamps, compute DAU/MAU, retention by cohort month, and a 7-day rolling average of sessions per user.
IoT sensor health: Align multiple sensor streams to 1-minute frequency, interpolate, compute rolling 30-minute mean and volatility, and flag anomalies.
Market data mini-pipeline: From minute trades, build daily OHLCV, 20D rolling volatility, and a timezone-aware report with business-day frequency.

Who this is for

Data Scientists and ML engineers working with logs, events, or financial/operational time series.
Analysts who need trustworthy time aggregations and rolling metrics.

Prerequisites

Comfort with Python basics (functions, lists, dictionaries).
Core pandas (Series/DataFrame, indexing, groupby) and basic NumPy arrays.

Learning path

Parsing and indexing timestamps.
Resampling vs asfreq and gap filling.
Rolling/expanding/ewm features.
Time zones and DST-safe pipelines.
Period- and business calendar-based aggregations.

Next steps

Apply to a real dataset (events or sensors) and build a daily and weekly report.
Automate checks: missing days, overlapping times, timezone consistency.
Proceed to seasonality and feature engineering for forecasting.

Mini challenge

Given 15-minute sensor data with gaps and a mix of UTC and US/Eastern timestamps, produce a single UTC series at 15-minute frequency with:

Forward-filled categorical states and linearly interpolated numeric readings.
A 24-hour rolling mean and standard deviation.
Flags for business hours only (B) and weekend segments.

Tip: Standardize time zones first, then set the index, then resample/asfreq and fill.

Take the quick test

Answer a few questions to check your understanding. Everyone can take it; logged-in users get saved progress.

Menu

Datetime And Time Series Handling

Table of Contents

Why this matters

Concept explained simply

Mental model

Core tools and quick cheatsheet

Worked examples

Example 1 — Parse, clean, index, fill

Example 2 — Resample minute ticks to daily OHLC

Example 3 — Time-based vs row-count rolling windows

Example 4 — Time zones: localize vs convert

How to do it step-by-step

Exercises

Common mistakes and self-check

Practical projects

Who this is for

Prerequisites

Learning path

Next steps

Mini challenge

Take the quick test

Practice Exercises

Clean mixed dates and create a daily series

Instructions

Expected Output

Build daily OHLC from minute prices

Compare rolling('24H') vs rolling(3)

Datetime And Time Series Handling — Quick Test

Have questions about Datetime And Time Series Handling?

AI Assistant