Who this is for
Data Analysts who work with time-based data like orders, sensors, web logs, or marketing campaigns and need clean, comparable time series.
Prerequisites
- Basic Python (lists, functions)
- pandas basics (DataFrame, Series, indexing)
- Comfort with reading CSVs into pandas
Learning path
- Parse and standardize datetimes
- Localize and convert timezones
- Resample to consistent frequencies (downsample/upsample)
- Aggregate, fill gaps, and handle DST edge cases
- Validate results and avoid common pitfalls
Why this matters
Real analyst tasks depend on accurate time handling:
- Align sales from different countries into a single daily timeline
- Turn irregular sensor pings into reliable 15-minute averages
- Compare week-over-week metrics correctly across daylight saving changes
- Prepare time-based features for forecasting/regression
Concept explained simply
Think of each timestamp as a note that may or may not say which clock it used. If it lacks a timezone, it is naive. You first state which clock it used (tz_localize), then translate it to a common clock like UTC (tz_convert), so all notes align.
Resampling is like putting timestamps into same-sized boxes (minutes, hours, days) and summarizing each box with count, sum, mean, etc. Downsampling reduces detail (e.g., minute to daily). Upsampling creates new boxes and fills missing values carefully.
Mental model
- Parse: turn text into real datetime objects.
- Localize: attach the original timezone (don’t shift the wall-clock time).
- Convert: translate to another timezone (the clock time shifts, the moment stays the same).
- Resample: group by equal time buckets and aggregate.
Core recipes (copy-friendly)
# 1) Parsing
import pandas as pd
# Parse a column to datetime
df['ts'] = pd.to_datetime(df['ts'], infer_datetime_format=True, dayfirst=False, errors='coerce')
# 2) Localize then convert
# Naive -> aware in US/Eastern (no clock shift yet)
df = df.set_index('ts').sort_index()
df = df.tz_localize('US/Eastern', nonexistent='shift_forward', ambiguous='infer')
# Convert to UTC for global alignment
df = df.tz_convert('UTC')
# 3) Resample
hourly = df.resample('H').sum(numeric_only=True)
weekly = df.resample('W-MON').agg({'amount':'sum', 'orders':'count'})
# 4) Upsample and fill
every_15m = df.resample('15T').mean(numeric_only=True).ffill(limit=2)
Worked examples
Example 1 — Localize US/Eastern, convert to UTC, resample hourly
data = {
'ts': ['2023-03-01 09:10','2023-03-01 09:45','2023-03-01 10:05','2023-03-01 10:50'],
'amount': [30, 20, 50, 10]
}
df = pd.DataFrame(data)
df['ts'] = pd.to_datetime(df['ts'])
df = df.set_index('ts').tz_localize('US/Eastern').tz_convert('UTC')
res = df.resample('H').sum()
print(res)
Why: standardize time to UTC so hours line up across regions.
Example 2 — Daily sales to monthly with multiple aggregations
# daily sales and order counts
daily = pd.DataFrame({
'date': pd.date_range('2023-01-01', periods=10, freq='D'),
'amount': [10,12,9,11,20,18,16,14,15,21],
'orders': [1,2,1,1,3,2,2,1,2,3]
}).set_index('date').tz_localize('UTC')
monthly = daily.resample('MS').agg({
'amount':['sum','mean'],
'orders':'sum'
})
print(monthly)
Why: monthly rollups for reporting and budgeting.
Example 3 — Irregular sensors to 15-minute averages
sensor = pd.DataFrame({
'ts': ['2023-05-15 09:00','2023-05-15 09:07','2023-05-15 09:28','2023-05-15 10:02'],
'value': [20,22,18,24]
})
sensor['ts'] = pd.to_datetime(sensor['ts'])
sensor = sensor.set_index('ts').tz_localize('Asia/Kolkata')
m15 = sensor.resample('15T').mean().ffill(limit=1)
print(m15)
Why: consistent intervals enable charts, anomaly detection, and feature engineering.
Tip: Label and closed
- label='left' (default) labels buckets by their start; label='right' labels by end.
- closed='left' includes left edge; closed='right' includes right edge. Be consistent across teams.
Exercises (do these in your notebook)
These mirror the exercises with checks further below.
- ex1: Localize sales to US/Eastern, convert to UTC, resample to hourly sums.
- ex2: Localize sensor data to Asia/Kolkata and resample to 15-minute means with limited forward fill.
Checklist before you move on
- Parsed text timestamps without errors (or handled them)
- Localized to original timezone before converting
- Used resample for equal time buckets (not groupby on strings)
- Picked correct aggregation and fill strategy
- Documented label/closed choices
Common mistakes and self-checks
- Localizing UTC twice: If data already has tz info, do not tz_localize again. Use tz_convert.
- Skipping localization: Converting a naive timestamp directly changes the moment incorrectly. Always localize first.
- Resampling without sorting index: Ensure a datetime index and sorted order before resampling.
- Using groupby on date strings: Loses timezone awareness and alignment. Prefer resample.
- Filling too aggressively: ffill/bfill can fabricate data. Set a limit and justify it.
- Ignoring DST edges: Use ambiguous='infer' or a clear policy; document nonexistent handling like 'shift_forward'.
Self-check
- Are all timestamps timezone-aware before cross-region comparisons?
- Do resampled totals match raw sums over the same period?
- Did you record fill rules and their impact on metrics?
Practical projects
- Global ecommerce dashboard: Localize by store time, convert to UTC, create hourly and weekly KPIs.
- IoT stability report: 15-minute averages with gaps flagged; export daily uptime percent.
- Marketing pacing: Resample ad impressions to daily, compare week-over-week with DST-safe windows.
Mini challenge
Given minute-level web sessions across two timezones, produce a UTC weekly time series of total sessions and a week-over-week percent change. State how you handled missing minutes and DST edges in one sentence.
Next steps
- Add rolling windows (rolling mean, rolling sum) on top of resampled data
- Create calendar-aware features (week number, is_month_end)
- Package your time-prep steps into reusable functions
Progress & test
The Quick Test below is available to everyone. If you are logged in, your progress will be saved.