luvv to helpDiscover the Best Free Online Tools
Topic 5 of 8

Preprocessing At Inference

Learn Preprocessing At Inference for free with explanations, exercises, and a quick test (for MLOps Engineer).

Published: January 4, 2026 | Updated: January 4, 2026

Why this matters

In production, your model only works if the incoming data is transformed exactly like training data. Inconsistent preprocessing causes silent accuracy drops, outages, and costly debugging. As an MLOps Engineer, you will:

  • Package training-time transforms with the model so inference is deterministic.
  • Validate and coerce raw inputs to a stable schema (data contracts).
  • Handle missing, out-of-range, and unseen values safely (e.g., OOV buckets).
  • Optimize for latency and throughput under real traffic.
  • Version and monitor preprocessing so changes don’t break models.
Real-world failure example

Training used mean/std from January data; inference accidentally used February stats. Predictions drifted for 3 days before anyone noticed. Root cause: unversioned preprocessing artifact and no input-monitoring.

Concept explained simply

Inference-time preprocessing is the function that turns messy, real-world input into the exact features your model expects. Think of it as a contract: same inputs → same outputs, every time.

Mental model

Raw input → Validate → Clean/Transform → Feature tensor → Model. The transformation function must be:

  • Deterministic: no randomness or time-dependent differences without explicit control.
  • Versioned: tied to a specific model version and artifacts (vocabularies, scalers).
  • Stateless per request: no hidden accumulators that change outputs over time.
  • Idempotent: running it twice yields the same result.
Common building blocks
  • Numeric: missing value imputation, clipping, scaling (min-max/standard), log transforms.
  • Categorical: fixed vocabularies, OOV bucket, consistent encoders.
  • Text: normalization, tokenization, truncation/padding, special tokens.
  • Images: decode, resize/crop, color conversion, normalization.
  • Time series: resampling, windowing, forward/back fill.

Core patterns you’ll use

  • Bundle preprocessor with model: Assemble a single artifact so inputs are transformed identically across environments.
  • Validate requests: Enforce schema with types, ranges, and required fields; reject or coerce early.
  • Store artifacts: Keep scalers, vocabularies, tokenizers, and image parameters under version control.
  • Fail safe for unknowns: Map unseen categories to OOV, clip extreme numeric values, and sanitize text.
  • Optimize latency: Vectorize transforms, pre-load artifacts, avoid heavy CPU steps per request.
  • Monitor inputs: Track feature distributions and detect drift relative to training baselines.

Worked examples

Example 1 — Tabular (categorical + numeric)

Training decisions:

  • color vocabulary: {"red":0, "blue":1, "green":2, OOV=3}
  • value_z = (value − mean)/std with mean=10, std=2; missing value → use mean.

Inference for {"color":"purple","value":14}:

  • color_idx = OOV → 3
  • value_z = (14−10)/2 = 2.0
  • Final features: {"color_idx":3, "value_z":2.0}
Why this is right

We never recompute vocab or statistics at inference. Unknown categories map to OOV; numeric uses fixed mean/std from training.

Example 2 — Text classification (Transformer)

Training decisions:

  • Use a specific tokenizer version with lowercase disabled.
  • Max length=128, truncation from the right, pad to 128.

Inference for text: "Great!!! FREE offer":

  • Do not lowercase (to match training).
  • Apply the exact tokenizer version with the same special tokens.
  • Truncate/pad to length 128.

Output: token_ids length 128, attention_mask length 128.

Why this is right

If you change casing or tokenizer version at inference, token IDs shift and the model degrades even if the text looks similar.

Example 3 — Image classification

Training decisions:

  • RGB images only, resize shortest side to 256, center-crop 224×224.
  • Normalize channels using mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225].

Inference pipeline:

  1. Decode → convert to RGB.
  2. Resize shortest side to 256.
  3. Center-crop 224×224.
  4. Scale to [0,1], then normalize per channel.

Output: tensor with shape [3, 224, 224], float32.

Why this is right

Exact resize/crop and normalization must match training; even small differences can shift predictions.

Implementation blueprint

  1. Define a strict request schema (types, ranges, required fields, defaults).
  2. Persist training artifacts (vocabularies, scalers, tokenizer, image params) with version tags.
  3. Implement a pure function preprocess(raw) → features using only persisted artifacts.
  4. Bundle preprocess + model in one container or package; pin library versions.
  5. Add input validation and clear error messages (400 for bad requests).
  6. Measure latency for decode, preprocess, and model separately; set budgets.
  7. Log per-request metadata: schema version, preprocessing version, model version.
  8. Monitor feature distributions and OOV rates; alert on sudden shifts.
Security and PII

Never log raw PII. Redact or hash identifiers before logging. Validate image/text size limits to prevent abuse. Enforce strict timeouts.

Data contracts and schemas

Keep a human-readable schema that defines:

  • Field names, types, units, and allowed ranges.
  • Defaults for missing fields and how you coerce/clip.
  • Accepted categorical values and the OOV strategy.
  • Encoding details (e.g., z-score statistics, normalization means/stds).
  • Version numbers; new fields must be backward compatible.
Backward compatibility tip

When adding a new field, make it optional with a safe default so old clients continue working.

Latency budgets

Work backward from your p95 SLA. Example: SLA 120 ms, network overhead 20 ms, model 50 ms → preprocessing and validation must fit within 50 ms.

  • Warm-load artifacts in memory.
  • Vectorize operations; avoid Python loops where possible.
  • Use efficient image codecs and batch tokenization when appropriate.
  • Cache derived values if safe and beneficial.

Monitoring input health

  • Track per-feature distributions over time vs. training baseline.
  • Watch OOV rates, missing value rates, and clipping frequencies.
  • Correlate changes with retrains or upstream data changes.
Simple drift checks to start
  • Weekly histogram comparison for top features.
  • Alert if OOV rate jumps by >X% or missing rate doubles.

Exercises

These mirror the interactive tasks below. Solve here, then compare with the official solution provided in the exercise details.

  1. Exercise 1: Stable categorical encoding at inference
    You have a fixed color vocabulary and numeric z-score stats. Transform two incoming records. See the Exercises panel for inputs and expected outputs.
  • Checklist for completion:
    • Used only the provided artifacts (no recomputation).
    • Handled unseen categories with OOV.
    • Applied missing value defaults.
    • Produced the exact expected outputs.

Common mistakes

  • Recomputing statistics at inference (causes data leakage and drift).
  • Changing tokenizer or image normalization versions without bumping artifacts.
  • No OOV handling → runtime errors or wrong encodings.
  • Using local time or non-deterministic operations (e.g., current date in features).
  • Silent dependency upgrades changing preprocessing behavior.
  • Heavy per-request transforms that blow the latency budget.
Self-check
  • Can you reproduce a prediction locally from a production payload?
  • Are preprocess and model versions logged together?
  • Do you have alerting on OOV/missing spikes?

Practical projects

  • Build a small REST service that validates, preprocesses, and serves a tabular model with OOV handling and z-score scaling. Log preprocess and model versions per request.
  • Create an image pipeline artifact (resize, crop, mean/std). Package it with a CNN and measure per-stage latency.
  • Add input monitoring: compute OOV rate and basic histograms daily; raise an alert on threshold breach.

Mini challenge

Your upstream adds a new categorical value "teal" that wasn’t in training. You cannot retrain today. Update your preprocessing so: (1) the service stays online, (2) metrics quantify the impact, and (3) you can ship a retrain later. Write a short plan and list the exact changes in artifacts, code, and monitoring.

Who this is for

MLOps Engineers, ML Engineers, and Backend Engineers integrating ML models into production services.

Prerequisites

  • Basic Python and packaging familiarity.
  • Understanding of your model’s training transforms.
  • Comfort with JSON and request validation concepts.

Learning path

  1. Schema and validation basics → define the request contract.
  2. Artifact versioning → persist scalers/vocab/tokenizer.
  3. Implement preprocess → deterministic, stateless function.
  4. Bundle and deploy → container/package with pinned deps.
  5. Measure and monitor → latency budgets and input drift.

Next steps

  • Complete the exercise below and compare with the solution.
  • Take the quick test to check your understanding. Note: the test is available to everyone; only logged-in users get saved progress.
  • Apply concepts to a small service in your environment.

Practice Exercises

1 exercises to complete

Instructions

You must transform raw inputs using only the provided training artifacts. Do not recompute statistics or vocabularies.

  1. Artifacts (from training):
    • color vocabulary: {"red":0, "blue":1, "green":2, "__OOV__":3}
    • value_z = (value − mean)/std with mean=10, std=2; if value is missing, use mean.
  2. Transform these two records:
    • A: {"color":"purple", "value":14}
    • B: {"color":"blue", "value":null}
  3. Produce feature outputs as JSON objects with keys color_idx and value_z.
Expected Output
{"A":{"color_idx":3,"value_z":2.0},"B":{"color_idx":1,"value_z":0.0}}

Preprocessing At Inference — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Preprocessing At Inference?

AI Assistant

Ask questions about this tool