Menu

Topic 3 of 8

Input Sanitization

Learn Input Sanitization for free with explanations, exercises, and a quick test (for Backend Engineer).

Published: January 20, 2026 | Updated: January 20, 2026

Who this is for

  • Backend engineers who accept input from APIs, forms, webhooks, queues, or files.
  • Developers integrating databases, templates, and shell tools.
  • Anyone fixing security bugs like SQL injection, XSS, command injection, or path traversal.

Prerequisites

  • Basic experience with a backend language (e.g., JavaScript/Node, Python, Go, Java).
  • Know how to write database queries and return JSON/HTML responses.
  • Familiar with HTTP requests and typical user inputs (query/body/headers/files).

Why this matters

Most security incidents start with untrusted input. As a Backend Engineer, you will:

  • Process search strings, IDs, and filters to build database queries.
  • Render user-supplied content (comments, names) into HTML emails/pages.
  • Accept filenames, paths, or commands for background jobs and tools.
  • Store logs and analytics safely without breaking dashboards or alerts.

Input sanitization (done correctly) prevents injection, data corruption, and service outages.

Concept explained simply

Input sanitization is part of a broader flow:

  • Normalize: convert to a standard form (trim, lowercase if needed, decode percent-encodings, resolve paths).
  • Validate: apply strict allowlists for type, length, format, and range.
  • Reject: if it does not meet rules, stop early with a clear error.
  • Parameterize: never concatenate untrusted data into queries/commands; bind it as data.
  • Encode on output: when displaying in a context (HTML, JSON, shell), encode for that context.

Key idea: prefer validation + safe APIs over trying to "clean" everything. Sanitization is a last mile for specific contexts (e.g., HTML-encoding) — not a universal magic filter.

Mental model

Think of data flowing through a filter pipeline:

  1. Raw input arrives (could be hostile).
  2. Normalize so there is only one representation for the same thing.
  3. Validate strictly; if invalid, reject.
  4. Use safe APIs (parameterized queries, shell arg arrays).
  5. Right before output, encode for the specific sink (HTML, JSON, SQL parameters already encoded by driver).

Never "sanitize once" and reuse everywhere; encoding depends on where the data goes.

Secure input workflow (apply this each time)

  1. Identify the sink: database, HTML, file path, shell, log, or JSON.
  2. Normalize input (trim, lowercase where appropriate, canonicalize paths/URLs).
  3. Validate with allowlists (type, length, regex, enum, range).
  4. Reject on failure with a helpful, non-revealing error.
  5. Use safe APIs: prepared statements, ORM bindings, subprocess argument arrays, safe path join.
  6. Encode/escape at output boundary (HTML entities, attribute encoding, JSON encoding).

Worked examples

1) SQL injection: use parameters (not string concatenation)

Unsafe:

// Node.js (unsafe)
const username = req.query.username; // untrusted
const sql = "SELECT id FROM users WHERE username = '" + username + "'";
const rows = await db.query(sql); // attacker: alice' OR '1'='1

Safe:

// Node.js with parameters (pg-like interface)
const username = req.query.username;
const rows = await db.query(
  "SELECT id FROM users WHERE username = $1",
  [username]
);

Wildcard search safely:

// Build the wildcard as data, not SQL text
const term = req.query.term || '';
const pattern = '%' + term.replace(/%/g, '\\%').replace(/_/g, '\\_') + '%';
const rows = await db.query("SELECT * FROM users WHERE username ILIKE $1 ESCAPE '\\'", [pattern]);

2) XSS: validate length, store raw, encode on HTML output

Do not strip everything. Validate reasonable length; then HTML-encode when rendering.

// Example HTML encoding function
function escapeHtml(s) {
  return String(s)
    .replace(/&/g, '&')
    .replace(//g, '>')
    .replace(/\"/g, '"')
    .replace(/'/g, ''');
}

// Rendering a comment safely
const comment = getCommentFromDB(); // stored as submitted
res.send('

' + escapeHtml(comment.text) + '

');

If you allow limited HTML (e.g., <b>,<i>), use a strict allowlist sanitizer and still encode attributes/unsafe parts.

3) Command injection: never build shell strings

Unsafe:

# Python (unsafe)
user = request.args.get('user')
os.system(f"id {user}")  # attacker supplies; $(rm -rf /)

Safe:

# Python (safe)
import subprocess
user = request.args.get('user', '')
# Validate: only letters, digits, dash, underscore; enforce length
import re
if not re.fullmatch(r"[A-Za-z0-9_-]{1,32}", user):
    raise ValueError("Invalid user")
# Use arg array and shell=False
subprocess.run(["id", user], check=True)

Common mistakes and how to self-check

  • Blacklists: Removing a few characters (like ' or <) is bypassable. Self-check: can I encode payloads to slip through (URL-encode, UTF-7, double-encode)?
  • Sanitize once, use everywhere: Encoding depends on context. Self-check: do I HTML-encode only at the HTML sink, not when storing?
  • Manual SQL escaping: Error-prone. Self-check: does every query use prepared statements with bound parameters?
  • shell=True or backticks: Dangerous. Self-check: do all command invocations pass an argument array with shell disabled?
  • Missing normalization: Path traversal via mixed encodings. Self-check: do I resolve and compare canonical paths before allowing access?
  • Over-trusting libraries: Even with a sanitizer, you still need validation and output encoding. Self-check: are validation rules documented and tested?

Exercises you can do now

Mirror of the practice tasks below. Detailed solutions are in the exercise panels.

  1. Exercise 1 — Fix a SQL query: Replace string concatenation with a parameterized query so that an injection like alice' OR '1'='1 does not return all users.
  2. Exercise 2 — Encode HTML safely: Write a function that encodes & < > " ' so a comment like Hello <b>world</b> <script>alert(1)</script> renders safely as text.

Checklist before you submit

  • I used parameters/bound variables in database queries.
  • I avoided shell command strings; I passed an argument array.
  • I encoded only at the output boundary (HTML).
  • I validated input length/type using allowlists.

Practical projects

  • Secure Contact Form API: Validate fields, rate-limit, store safely, and render confirmation emails with HTML encoding.
  • File Upload Service: Enforce content-type, size limits, canonicalize filename/path, store outside web root, and serve via safe mapping.
  • Audit Log Pipeline: Accept JSON events, validate schema, escape control characters for logs, and prevent log injection.

Learning path

  1. Master validation: types, lengths, regex allowlists, enums.
  2. Apply parameterized queries across all data access.
  3. Encode correctly for each sink: HTML, URL, JSON, CSV, shell.
  4. Harden file and path handling (canonicalization and safe joins).
  5. Add automated tests for dangerous inputs and encodings.

Next steps

  • Refactor one real endpoint to use strict validation + parameters.
  • Add unit tests with malicious payloads for your top 3 endpoints.
  • Run the quick test below. Note: everyone can take it; sign in to save your progress.

Mini challenge

You accept a redirect query parameter and redirect users after login. How do you prevent open redirects and XSS? Describe your validation rules, normalization, and where you encode. Then implement in your stack.

Quick test

Ready when you are. Your score is calculated immediately. Everyone can take the test; sign in to save your progress.

Practice Exercises

2 exercises to complete

Instructions

You have a users table with rows for alice and bob. The current code uses string concatenation:

// Pseudocode (unsafe)
input = request.query.username
sql = "SELECT id, username FROM users WHERE username = '" + input + "'"
rows = db.query(sql)

Rewrite using a parameterized query. Keep exact match semantics. Assume your driver supports placeholders like $1 or ?.

Test with two inputs:

  • alice
  • alice' OR '1'='1
Expected Output
For input "alice": returns exactly 1 row (alice). For input "alice' OR '1'='1": returns 0 rows (no injection).

Input Sanitization — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Input Sanitization?

AI Assistant

Ask questions about this tool