Who this is for
This lesson is for data engineers who build or maintain streaming pipelines, real-time dashboards, or event-driven systems. If you work with clickstreams, sensors, logs, or payments, understanding event time vs processing time will save you from silent data errors.
Prerequisites
- Basic understanding of streams vs batches
- Familiarity with windows (tumbling, sliding, session) at a high level
- Comfort reading simple timelines and counts
Why this matters
- Accurate analytics: Business wants “What happened at 12:00–12:05?” not “When did our system see it?”.
- Late/out-of-order data: Mobile networks, retries, and clock skew are normal. Your pipeline must handle them.
- Stable backfills: Reprocessing yesterday should produce the same results as live processing.
Real tasks you will face
- Counting unique users per 5-minute window with late events
- Detecting fraud sessions even if transactions arrive out of order
- Running reliable hourly aggregates that match historical replays
Concept explained simply
Event time
The time an event actually happened at the source (e.g., device timestamp). Use it when your metric is about reality.
Processing time
The time your system saw the event. Use it when your metric is about the pipeline itself (throughput, lag) or when you need fast, approximate results.
Mental model
Think of events as letters with a date printed on them (event time). Your post office sorts them the day they arrive (processing time). If you group by the printed date, you need rules for late letters. Those rules are watermarks and allowed lateness.
Core terms (keep these handy)
- Watermark: The system’s guess of “we’ve probably seen all events up to time T.” Typically T = max(event_time_seen) − delay.
- Allowed lateness: How long after a window’s end you still accept late updates for that window.
- Triggers: When to emit results (on-time, early, late updates).
- Event-time windows: Group by when things happened. Robust to out-of-order events.
- Processing-time windows: Group by when data arrives. Simple and low-latency but order-sensitive.
Worked examples
Example 1: Clicks arriving late
1-minute tumbling event-time windows. Watermark = max(event_time) − 30s. Allowed lateness = 20s.
Events (ET = event time, AT = arrival time) - e1: ET 12:00:10, AT 12:00:11 -> window 12:00:00–12:00:59 - e2: ET 12:00:50, AT 12:01:05 -> late but same window - e3: ET 12:01:20, AT 12:01:21 -> window 12:01:00–12:01:59 - e4: ET 12:01:40, AT 12:01:41 -> advances watermark to 12:01:10 - e5: ET 12:00:55, AT 12:01:25 -> late update for 12:00 window (accepted) - e6: ET 12:02:10, AT 12:02:11 -> watermark to 12:01:40 (finalize 12:00 window) - e7: ET 12:00:20, AT 12:02:12 -> too late (dropped) Emissions for 12:00 window: - On-time firing when WM > 12:00:59 (after e4): count = 2 (e1,e2) - Late update (e5): count = 3 - Final (after e6): count = 3; e7 dropped
Example 2: Sensor clocks skewed
Ten sensors report temperatures with up to 2 minutes skew. If you use processing-time windows, values cluster by arrival bursts and misrepresent reality. Use event-time windows with a watermark delay a bit larger than skew (e.g., 2m30s) so readings land in their correct time buckets, with late updates allowed briefly.
Example 3: Sessionization
Fraud detection uses session windows with 30s gap by event time. Late transactions should still attach to the correct session if they arrive within allowed lateness. Processing-time sessions would split a single real-world session into many fragments.
Step-by-step: choosing the right time domain
- Define the question — Are you measuring reality (use event time) or system behavior (use processing time)?
- Estimate disorder — What’s the typical max delay/clock skew? Start with p95–p99 delay as watermark delay.
- Pick windows — Tumbling/sliding for periodic metrics; sessions for user flows.
- Set watermark and allowed lateness — Watermark delay slightly above expected skew; allowed lateness small but non-zero for corrections.
- Decide triggers — On-time mandatory; add early firings for fast guesses; allow late firings for corrections.
- Plan idempotency — Make outputs upsertable (keys + versions) so late updates don’t double-count.
Quick checklist before you ship
- Time zone normalized (UTC)?
- Event timestamp extracted and validated?
- Watermark delay justified by data?
- Allowed lateness documented with consumers?
- Outputs are upsert-friendly?
Exercises (do these now)
Note: Solutions are hidden under each exercise. The Quick Test at the end is available to everyone; only logged-in users get saved progress.
Exercise 1: Compute event-time window results with lateness
Mirror of exercise ex1 below.
What to produce
State the counts emitted for the 12:00:00–12:00:59 window at each firing and which events are dropped.
Exercise 2: Choose time domains per pipeline step
Mirror of exercise ex2 below.
What to produce
For each step, pick event time or processing time and justify briefly.
- Self-check: Did you explicitly state watermark delay and allowed lateness?
- Self-check: Would backfill produce identical results?
- Self-check: Is your consumer OK with late updates?
Common mistakes and how to self-check
- Using processing-time windows for business KPIs — Self-check: If the network paused for 5 minutes, would your KPI spike or dip incorrectly? If yes, switch to event time.
- Watermark too aggressive — Self-check: Compare late-drop rate vs p99 inter-arrival delay. If many drops before p99, increase delay.
- No allowed lateness — Self-check: Do you ever receive retries? If yes, allow a small lateness window.
- Non-idempotent sinks — Self-check: Can you upsert late corrections without duplicates? If not, redesign keys/merge logic.
- Ignoring timezones — Self-check: Are timestamps normalized to UTC before windowing?
Practical projects
- Real-time signups: Build a 5-minute event-time aggregate of signups per country with a 2-minute watermark and 1-minute allowed lateness. Emit on-time and late updates to a table keyed by (window_start,country).
- Sessionized clicks: Session window by event time with 30s gap. Write each session’s duration and click count. Verify with synthetic out-of-order data.
- Lag dashboard: Processing-time metric: records processed per minute and end-to-end latency percentiles. This validates your watermark choice empirically.
Mini tasks to extend
- Add an early trigger every 10s for quick approximations, then reconcile with late updates.
- Measure and log the fraction of events arriving after on-time firing but within allowed lateness.
- Run a 24h backfill and compare outputs to live: counts and distinct keys must match.
Mini challenge
You ingest mobile app events. 20% arrive up to 90s late; rare outliers up to 3 minutes late. Stakeholders want 1-minute accurate counts, visible quickly but correct eventually.
- Pick time domain for the metric
- Propose watermark delay
- Set allowed lateness
- Describe trigger strategy
Suggested answer
- Time domain: Event time (we care about when users acted).
- Watermark: 2 minutes (above 90s typical, below 3m outliers to limit latency).
- Allowed lateness: 1 minute to capture many outliers; accept tiny drop rate beyond that.
- Triggers: Early every 15s for fast UI; on-time at watermark; late updates until watermark passes window_end + 1m; final then.
Learning path
- Now: Event time vs processing time fundamentals (this lesson)
- Next: Watermarks and triggers in depth
- Then: Window types (tumbling, sliding, session) with trade-offs
- Finally: Exactly-once, idempotent sinks, and reprocessing
Next steps
- Write down your current pipeline’s watermark, allowed lateness, and trigger rules.
- Run a 1-hour experiment measuring late arrival distribution; adjust watermark.
- Take the Quick Test below to lock in concepts.