Menu

Topic 8 of 8

Input Validation And Sanitization

Learn Input Validation And Sanitization for free with explanations, exercises, and a quick test (for API Engineer).

Published: January 21, 2026 | Updated: January 21, 2026

Why this matters

As an API Engineer, you guard the gateway to data and services. Most real incidents start with unsafe input: injection, oversized payloads that exhaust memory, or fields that slip past weak checks. Solid input validation and context-aware sanitization stop these issues early, reduce incident risk, and keep your services predictable.

  • Real tasks you’ll do: define request schemas, implement validators, enforce size/type/format, ensure safe encoding for logs, HTML, SQL, and file paths.
  • Impact: fewer vulnerabilities, fewer crashes, clearer errors for clients, and simpler downstream code.

Concept explained simply

Validation = decide if input is acceptable. If not, reject with a clear error.
Sanitization = transform data so it’s safe for a specific context (e.g., HTML-encode before rendering).

Key ideas, short and sweet
  • Allowlist over blocklist: define exactly what is allowed (types, ranges, formats).
  • Normalize before validate: trim, Unicode normalize (e.g., NFKC), and canonicalize file paths.
  • Context matters: encode for the target sink (HTML, SQL, shell, logs). Prepared statements beat escaping for SQL.
  • Fail closed: when in doubt, reject.
  • Small, fast checks first: size limits before deep parsing to avoid DoS.

Mental model

Think of your API like airport security:

  • Check the ticket (schema/type/format).
  • Size limits (baggage size/body size).
  • Restricted items (allowlist values).
  • Route to the right gate (sanitize per context: HTML, SQL, logs).

Validation strategy checklist

  • Define a schema per endpoint (required/optional, types, min/max, enum).
  • Apply body size limit and per-field length caps.
  • Normalize input (trim, Unicode NFKC) before validation.
  • Use allowlists for enums and constrained strings.
  • Parameterize database queries; never build SQL with string concatenation.
  • Encode per sink: HTML-escape for UI, JSON-stringify for logs, safe filename rules for storage.
  • Return consistent 4xx errors with clear, non-sensitive messages.
  • Log validation failures safely (truncated, encoded), without leaking secrets.

Worked examples

Example 1 — Numeric range and enum allowlist
// POST /orders { quantity, currency }
// Rules:
// quantity: integer, 1..1000
// currency: enum from ISO-like allowlist ["USD","EUR","JPY"]

function validateOrder(input) {
  if (!Number.isInteger(input.quantity) || input.quantity < 1 || input.quantity > 1000) {
    return { ok: false, error: "quantity must be an integer between 1 and 1000" };
  }
  const allowed = new Set(["USD","EUR","JPY"]);
  if (!allowed.has(String(input.currency))) {
    return { ok: false, error: "currency must be one of USD, EUR, JPY" };
  }
  return { ok: true };
}
Example 2 — JSON schema (FastAPI/Pydantic)
# POST /users { email, password, age? }
from pydantic import BaseModel, EmailStr, Field, constr

class UserIn(BaseModel):
    email: EmailStr
    password: constr(min_length=8, max_length=128)
    age: int | None = Field(default=None, ge=13, le=120)

# FastAPI will enforce types/lengths and return 422 with details.
Example 3 — SQL safety + HTML display safety
// Vulnerable (don\'t do this):
// SELECT * FROM products WHERE name LIKE '%" + q + "%';
// and then res.send("<div>" + q + "</div>")

// Safe approach (Node + parameterized SQL + HTML-encoding):
const q = normalize(input.q) // trim + Unicode NFKC
if (typeof q !== 'string' || q.length > 100) return 400

const rows = await db.query(
  'SELECT * FROM products WHERE name ILIKE $1',
  ['%' + q + '%']
)
res.send('<div>Results for: ' + escapeHtml(q) + '</div>')

Practical patterns you can reuse

1) Normalize then validate
  • Trim leading/trailing whitespace.
  • Convert Unicode to NFKC to collapse visually-similar forms.
  • Lowercase where appropriate (e.g., emails), but preserve case when it carries meaning (passwords, product codes).
2) Enforce limits early
  • Global body size limits at gateway/server (e.g., 1–5 MB for JSON APIs).
  • Per-field length caps (e.g., name ≤ 100 chars, comment ≤ 2000).
  • Limit array lengths and object nesting to prevent parser blowups.
3) Validate structure with schemas
  • Use OpenAPI/JSON Schema, Pydantic, Joi, class-validator, Bean Validation — anything that enforces types and constraints.
  • Keep schemas next to handlers. Version them with your API.
4) Sanitize per sink
  • SQL: parameterized queries (placeholders). Do not escape manually.
  • HTML: encode special chars (& < > " '). Avoid injecting raw user input into HTML.
  • Logs: JSON-encode and truncate; never log secrets.
  • File paths: use generated IDs, fixed directories, whitelist extensions; never trust user-provided paths.
5) Consistent error responses
// Example 400 structure
{
  "error": "validation_failed",
  "details": [
    { "field": "email", "message": "invalid format" },
    { "field": "age", "message": "must be between 13 and 120" }
  ]
}

Exercises you can try

These mirror the exercises below. Try them here first, then open the solutions when stuck.

Exercise 1 — Design rules for POST /users
  1. Fields: email (string), password (string), age (optional int), role (enum: user/admin/support).
  2. Write precise validation rules: types, lengths, formats, ranges, allowlists.
  3. Specify normalization steps and error messages.
Exercise 2 — Fix a vulnerable search endpoint
  1. Given query param q used for DB search and echoed into HTML.
  2. Rewrite to: normalize + length-cap q, parameterize SQL, HTML-encode echo.
  3. Return safe 400 on bad input.

Common mistakes and self-check

  • Mistake: Relying on client-side checks only. Fix: enforce server-side validation always.
  • Mistake: Blocklists (e.g., forbidding certain substrings). Fix: use allowlists (type/format/range/enum).
  • Mistake: Validating before normalization. Fix: normalize (trim/Unicode) then validate.
  • Mistake: Escaping SQL manually. Fix: parameterized queries.
  • Mistake: Overly permissive lengths. Fix: realistic caps on each field.
  • Mistake: Detailed error leaks (e.g., revealing password rules too precisely). Fix: be clear but not exploitable; never include secrets.
  • Mistake: Logging raw payloads. Fix: truncate and JSON-encode; avoid sensitive fields.
Self-check mini list
  • Can every field be rejected for type/format/length violations?
  • Do oversized requests get blocked early?
  • Does each sink (SQL/HTML/logs/files) apply the right protection?
  • Are error messages consistent and safe?
  • Do tests cover boundary values and malicious payloads?

Practical projects

  • Build a validation middleware that reads OpenAPI/JSON Schema and enforces body/query/header rules with uniform errors.
  • Add global request size limits and per-field caps; add tests for 413 Payload Too Large and 400 errors.
  • Create a sanitizer module: htmlEncode, safeFilename, logSafeString; use it at sinks.
  • Write fuzz tests that try overlong strings, invalid Unicode, SQL meta chars, and nested arrays.

Quick Test and progress

Take the Quick Test below to confirm understanding. The test is available to everyone; if you log in, your progress will be saved automatically.

Who this is for

  • API Engineers and Backend Developers building or maintaining HTTP/JSON services.
  • Developers integrating third-party webhooks or user-generated content.
  • Teams hardening existing endpoints against common vulnerabilities.

Prerequisites

  • Basic HTTP and JSON knowledge.
  • Familiarity with at least one backend framework (e.g., Node/Express, Python/FastAPI, Java/Spring).
  • Basic database usage and prepared statements.

Learning path

  1. This subskill: validation vs sanitization fundamentals, schemas, and per-sink safety.
  2. Next: Authentication & Authorization basics (tokens, scopes).
  3. Then: Rate limiting & abuse prevention.
  4. Later: Secrets management, input canonicalization at scale, and observability for validation failures.

Next steps

  • Pick two critical endpoints; write or tighten their schemas and limits.
  • Add parameterized queries everywhere; forbid raw SQL concatenation in code review.
  • Introduce a shared sanitizer utility and require its use at HTML/log/file sinks.
  • Automate: unit tests for boundary values and malformed inputs.

Mini challenge

Harden a POST /comments endpoint that accepts { text, attachment? }:

  • text: string, trim + NFKC, length 1..2000, HTML-encode on display.
  • attachment: optional file id; allowlist extensions (e.g., .png/.jpg/.pdf) and cap size at 5 MB.
  • Reject if either rule fails; return consistent 400 details.

Practice Exercises

2 exercises to complete

Instructions

Define strict validation and normalization rules for:

POST /users
{
  "email": string,
  "password": string,
  "age": optional integer,
  "role": string (user|admin|support)
}
  1. Write exact type, length, range, and format rules for each field.
  2. List normalization steps (trim, Unicode normalization, case handling).
  3. Write example error responses for 3 invalid cases.
Expected Output
A clear list of rules (types, ranges, enums), normalization steps, and a sample 400 error body for invalid inputs.

Input Validation And Sanitization — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Input Validation And Sanitization?

AI Assistant

Ask questions about this tool