luvv to helpDiscover the Best Free Online Tools
Topic 7 of 9

Feature Freshness And SLAs

Learn Feature Freshness And SLAs for free with explanations, exercises, and a quick test (for Machine Learning Engineer).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

In production ML, features lose value as they age. Feature stores help you compute and serve features, but you must define how fresh those features must be and guarantee it via SLAs (service level agreements). Poor freshness can tank model performance, cause bad user experiences, and break real-time decisions.

  • Fraud detection: velocity features must reflect the last seconds of activity.
  • Recommendations: inventory and user activity features must update within minutes.
  • Forecasting: daily aggregates need consistent end-of-day arrival times.
  • A/B tests: comparable freshness across variants avoids biased evaluations.

Concept explained simply

Feature freshness is how up-to-date a feature value is at the time you read it. A simple way to think about it:

Freshness at read time = now_at_read - event_timestamp_used_to_compute_feature.

SLA (Service Level Agreement) is the target promise for freshness and availability. You usually track it through SLOs (objectives) and SLIs (indicators/metrics). Example: P99 freshness  5s for transaction velocity feature during business hours.

Mental model: the freshness pipeline

Imagine a conveyor belt with three delays:

  • Compute delay: time to aggregate or transform events.
  • Transport delay: time to write into the feature store/serve online.
  • Read delay: caching/serving time until the model reads it.

Your freshness budget is how much total delay you can tolerate. If your decision must be made within 300 ms, and model scoring takes 120 ms, network takes 80 ms, you have ~100 ms left for feature availability or a fallback strategy.

Key terms and practical definitions

  • Event time: when something actually happened (e.g., transaction time). Use this for accuracy.
  • Ingestion time: when data arrived in your system. Can be later than event time.
  • Freshness window: maximum allowed age for feature values at read time.
  • TTL (time-to-live): how long a feature value is served before it is considered stale and purged or replaced.
  • Point-in-time correctness: training features must be built using only data available at that historical moment (no leakage).

Worked examples

Example 1: Real-time fraud detection

Feature: number_of_transactions_last_60s per card.

  • Requirement: decisions in 300 ms after swipe.
  • Freshness target: at read, feature reflects all events up to 2 seconds ago.
  • SLA: 99% of reads see data no older than 2s; 99.9% no older than 5s. Availability  99.9%.
  • Design: streaming aggregation with watermarks; online store TTL 2 minutes; fallback to last known value if freshness > 5s.

Why it works: streaming keeps latency low; TTL prevents stale buildup; fallback preserves continuity during spikes.

Example 2: Daily churn model

Feature: sessions_last_7_days, updated once per hour.

  • Requirement: batch scoring nightly 01:00.
  • Freshness target: by 00:30, all prior day events included.
  • SLA: By 00:30, P99 of features reflect data up to 23:59:59 previous day; backfill late events by 04:00.
  • Design: hourly micro-batches + late-arrival backfill; offline store partitions by date; training uses point-in-time semantics.

Outcome: Consistent training-serving behavior without strict sub-second needs.

Example 3: Recommendations CTR features

Feature: rolling_ctr_15m.

  • Requirement: homepage loads in 200 ms.
  • Freshness target: feature includes clicks/impressions up to 5 minutes ago at P95.
  • SLA: P95 freshness  5m, P99  10m; availability 99.95%.
  • Design: incremental window updates via stream; online store with per-key TTL 30m; serve cached features if newer than 10m; otherwise degrade to category-level CTR.

Trade-off: tighter freshness raises infra cost; fallback keeps UX stable during spikes.

Designing SLAs in 5 steps

Step 1  Define decision need

What decision uses this feature and how quickly must it react?

Step 2  Set a freshness budget

Split your total latency among model scoring, networking, and feature readiness.

Step 3  Choose SLIs

Percentile freshness at read (P50/P95/P99), error rate, availability, and staleness rate (reads exceeding window).

Step 4  Write SLOs

Example: P99 freshness  5s over rolling 30d; staleness rate < 0.5%.

Step 5  Plan fallbacks and alerts

Define TTLs, backfills, fallback features, and paging thresholds.

Measuring freshness

Track the timestamp used to compute each feature value (event_time or watermark_time). On read, compute freshness_age = now() - feature_timestamp. Emit this as a metric with percentiles.

Implementation tips (tech-agnostic)
  • Include both event_time and produced_at in feature values for diagnostics.
  • Use watermarks for late data; update features idempotently.
  • Store last_update_time per key and expose it in online reads for optional client-side guards.
  • Backfill jobs should not break point-in-time correctness for training.

Common mistakes and self-check

  • Mistake: Measuring freshness from ingestion_time only. Self-check: verify you use event_time for business correctness.
  • Mistake: Using averages instead of percentiles. Self-check: ensure P95/P99 are tracked; spikes hide in averages.
  • Mistake: No fallback when freshness breached. Self-check: define and test a degraded but safe feature set.
  • Mistake: TTL too long. Self-check: simulate incident; confirm stale data is not served indefinitely.
  • Mistake: Training-serving skew from late arrivals. Self-check: enforce point-in-time joins for training and re-materialize training sets after backfills.

Practical projects

  • Build a streaming counter feature with per-key freshness metrics and a dashboard showing P50/P95/P99.
  • Create a batch aggregate (daily revenue by merchant) with a backfill job and a freshness SLA (delivery by 01:00 at P99).
  • Implement a fallback: when rolling_ctr_15m is older than 10m, serve category_ctr_1h and log the swap.

Exercises

Do the two exercises below to practice setting and validating freshness SLAs. Use the checklist as you work.

  • State the decision need and latency budget.
  • Write SLIs (what you measure).
  • Set SLO targets (percentiles and thresholds).
  • Define TTL, backfill, and fallback.
  • Describe your alerting and on-call boundaries.

Mini challenge

You manage a price_sensitivity_score feature updated from clickstream and purchases. Pages must load in 250 ms. Streaming updates add ~150 ms at P95; model scoring is 70 ms P95; network is 30 ms P95.

  • What freshness window can you afford at P95?
  • Draft a one-line SLO for P99 freshness.
  • Suggest a fallback if the freshness window is breached.
Suggested direction (spoiler)

Budget ~250 - 70 - 30 = 150 ms for data readiness at P95. Example SLO: P99 freshness  500 ms over 30d. Fallback: serve previous score if updated < 10m, else cohort-average.

Who this is for

  • Machine Learning Engineers owning online/offline features.
  • Data/Platform Engineers supporting feature pipelines.
  • Applied Scientists who need reliable real-time signals.

Prerequisites

  • Basic ML pipeline knowledge (training vs serving).
  • Comfort with streaming or batch data processing concepts.
  • Understanding of percentiles and latency metrics.

Learning path

  1. Understand feature lifecycle (ingest  compute  store  serve).
  2. Define SLIs/SLOs for freshness and availability.
  3. Implement measurement and dashboards.
  4. Add TTL, backfill, and fallback mechanisms.
  5. Run incident drills and tune thresholds.

Next steps

  • Instrument your current top-3 features with freshness timestamps.
  • Propose SLAs with P95/P99 targets and review with stakeholders.
  • Add a simple fallback and test it in staging with injected delays.

Quick Test is available to everyone; if you log in, your progress is saved.

Practice Exercises

2 exercises to complete

Instructions

You operate an online velocity feature: tx_count_last_60s. You log per-read values:

  • read_time (now): 12:00:05.000, 12:00:06.000, 12:00:07.000, 12:00:10.000
  • feature_event_time used for each read: 12:00:04.100, 12:00:04.400, 12:00:06.200, 12:00:04.900

SLO: P95 freshness  2s; P99  5s over the last 10 minutes.

  1. Compute freshness_age for each read.
  2. Estimate P95 and P99 from this tiny sample (explain your approach and limitations).
  3. State whether this sample suggests compliance or risk.
  4. List two remediation steps if risk is detected.
Expected Output
A short table of ages per read, estimated P95/P99, compliance statement, and two remediation ideas (e.g., increase consumer parallelism, reduce aggregation window).

Feature Freshness And SLAs — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Feature Freshness And SLAs?

AI Assistant

Ask questions about this tool