Why this matters
MLOps Engineers make features reliable, reproducible, and fast. Good features are the backbone of training pipelines and batch scoring. Your daily tasks include building point-in-time correct aggregates, handling late data, preventing leakage, and materializing features for both training and inference with identical logic.
- Own feature definitions as code and version them.
- Backfill features for historical training windows safely.
- Detect and fix training–serving skew.
- Monitor freshness, drift, and null spikes.
Mini task: Spot the risk
You compute "7-day purchases" using the latest table snapshot for both training and inference. Risk: data leakage in training because historical snapshots differed. Fix: compute strictly up to the training timestamp.
Concept explained simply
A feature is a computed column that summarizes raw data into a machine-usable signal. In batch pipelines, features are reproducible transformations with a time boundary.
Mental model: a kitchen with clear recipes. Each feature has a recipe card (contract):
- Name and business meaning
- Entities/keys (e.g., user_id, card_id)
- Time semantics (event_time, windows, timezone)
- Definition (SQL/pseudocode), null handling, units
- Validation checks and owner
Core steps for feature generation
Choose primary keys and the event_time used for point-in-time correctness.
Deduplicate, standardize types/timezones, fill or flag missing fields.
Join on keys with time constraints; avoid looking ahead of the reference time.
Rolling windows, counts, sums, ratios, one-hot/embeddings, text/numeric transforms.
All computations must use event_time <= reference_time (for training points or batch scoring time).
Apply stable imputation and scaling. Store parameters derived only from training data.
Run schema and value checks; version the definition and keep a changelog.
Write features to a registry/feature store with keys, timestamps, and backfill as needed.
Track freshness, null rates, distribution drift, and compute latency.
Worked examples
Example 1: E-commerce — user_7d_orders_count (point-in-time)
Tables:
orders(user_id, order_id, amount, order_ts)
training_points(user_id, reference_ts)
Goal:
For each training point, count orders in (reference_ts - 7d, reference_ts].
SQL (conceptual):
SELECT t.user_id,
t.reference_ts,
COUNT(o.order_id) AS user_7d_orders_count
FROM training_points t
LEFT JOIN orders o
ON o.user_id = t.user_id
AND o.order_ts > t.reference_ts - INTERVAL '7 day'
AND o.order_ts <= t.reference_ts
GROUP BY t.user_id, t.reference_ts;Notes: No future orders included; handles users with zero orders via LEFT JOIN.
Example 2: Fraud — card_24h_amount_sum (hourly materialization)
Tables:
transactions(card_id, amount, event_ts)
Approach:
1) Bucket by hour: bucket_ts = date_trunc('hour', event_ts)
2) For each card_id and hour, compute rolling 24h sum at the end of the hour.
SQL (conceptual):
WITH hourly AS (
SELECT card_id,
date_trunc('hour', event_ts) AS hour_ts,
SUM(amount) AS amt_hour
FROM transactions
GROUP BY 1,2
)
SELECT h1.card_id,
h1.hour_ts AS reference_ts,
(
SELECT SUM(h2.amt_hour)
FROM hourly h2
WHERE h2.card_id = h1.card_id
AND h2.hour_ts > h1.hour_ts - INTERVAL '24 hours'
AND h2.hour_ts <= h1.hour_ts
) AS card_24h_amount_sum
FROM hourly h1;Notes: Materialize per hour to a feature table keyed by (card_id, reference_ts).
Example 3: Text — title features (length, ratio, simple embedding)
Input:
items(item_id, title_text, created_ts)
Outputs (per item_id, created_ts):
- title_len_chars
- title_alpha_ratio = letters / total_chars
- title_embed_16 (placeholder: 16-dim avg of character code buckets)
Pseudocode:
for row in items:
t = normalize_unicode(row.title_text)
L = len(t)
alpha = count_regex(t, '[A-Za-z]')
alpha_ratio = safe_div(alpha, L)
embed = simple_avg_bucket_embedding(t, dim=16)
emit(item_id=row.item_id, ts=row.created_ts,
title_len_chars=L,
title_alpha_ratio=alpha_ratio,
title_embed_16=embed)Notes: Ensure deterministic normalization; store exact pre-processing rules.
Design decisions that matter
- Granularity: per-event, per-entity-per-time bucket (hour/day), or snapshot. Finer granularity increases storage but reduces leakage risk.
- Window size: shorter windows capture recency, longer windows improve stability. Consider multiple windows (7d, 30d).
- Imputation: constant vs model-based. Keep it stable between training and inference.
- Materialization cadence: balance latency (freshness) vs cost.
Time alignment and leakage
Rule: Every feature must only use data whose event_time <= reference_time of the row being scored/trained.
- Late-arriving data: either reprocess with watermarking or accept eventual consistency.
- Backfills: always recompute using historical snapshots or event logs, not today’s corrected state.
- Labels: compute labels with a lookahead window that starts strictly after reference_time.
Self-check: Is this definition safe?
Feature says “orders last 7 days relative to today()”. Unsafe for training. Must reference the per-row reference_time, not wall-clock now.
Feature quality checks
- Schema: types, ranges, allowed null rates.
- Statistical: distribution drift vs baseline, min/max sanity, cardinality caps.
- Freshness: max(ts_now - latest_materialized_ts) per feature.
- Skew: training vs inference distribution differences.
- Join coverage: rate of missing joins by entity and time.
Checklist — run before shipping
- Point-in-time filter verified
- Null handling documented and tested
- Windows and timezones clearly stated
- Backfill reproducible
- Unit tests with fixed fixtures
- Monitoring alerts configured
Implementation patterns (batch + serving)
- Single source-of-truth transformation: share the same logic between training backfills and batch inference.
- Windowed aggregates: precompute per entity/time-bucket for speed, then roll up at scoring.
- Idempotent writes: deterministic keys (entity, reference_ts, version) allow safe retries.
Monitoring features in production
- Freshness SLOs (e.g., 95% of hourly aggregates ready by T+20m).
- Null/zero spikes detection and alerting.
- Distribution drift: compare daily histograms to a moving baseline.
- Latency and cost per job run; detect regressions after definition changes.
Exercises
Do these to lock in the concepts. Anyone can do them; if you log in, your progress will be saved.
- Exercise 1 — Point-in-time SQL feature
Design a query for a 14-day rolling unique_items_count per user at a given reference_ts. See full instructions below in Exercises section. - Exercise 2 — Feature spec + validation
Write a feature definition (YAML-like) and create validation rules for nulls, ranges, and freshness.
Exercise 1 details
Match the instructions in the Exercises panel below (ID: ex1). Implement the SQL and verify the expected output.
Exercise 2 details
Match the instructions in the Exercises panel below (ID: ex2). Produce a spec and example validation checks.
Common mistakes and how to self-check
- Using now() in training features. Fix: reference per-row timestamp only.
- Joining on latest snapshot without time filter. Fix: add event_time <= reference_time in join condition.
- Forgetting timezone normalization. Fix: convert all timestamps to UTC and document it.
- Leaky imputation (using global stats from full history). Fix: compute stats on training window only and version them.
- Unbounded cardinality features (e.g., IDs one-hot). Fix: hash, bucket, or top-K with OOV bucket.
Self-check prompt
Can I reproduce the same feature value for a given (entity, reference_ts) today and next month? If not, identify which step is nondeterministic.
Practical projects
- Retail cohort features: 7/30/90-day spend and visit counts per user with a backfilled training set.
- Fraud hourly rollups: 1h/6h/24h card activity features with lateness watermark and monitoring.
- Content features: title/body text lengths, ratios, and simple embeddings materialized daily.
Who this is for
- MLOps Engineers implementing training and batch inference pipelines.
- Data Engineers collaborating on feature stores and ETL.
- ML Engineers needing reproducible feature definitions.
Prerequisites
- Comfort with SQL windowing and joins
- Basic Python or similar for transforms
- Familiarity with batch scheduling and data partitioning
Learning path
- Point-in-time correctness basics
- Windowed aggregation patterns
- Imputation and scaling consistency
- Backfilling and versioning
- Validation and monitoring
Next steps
- Complete the two exercises and run the Quick Test below.
- Add monitoring checks to a feature you already built.
- Plan a backfill and a safe rollout of a new feature version.
Mini challenge
You must compute user_30d_avg_order_value for batch scoring at 02:00 UTC daily. Yesterday, late orders arrived at 03:00. What do you do?
- Answer hint: materialize at 02:00 with a 2-hour watermark (exclude 02:00–04:00 late data), then run a correction backfill for the affected window later and version the output.