Menu

Topic 4 of 7

Error Handling And Logging

Learn Error Handling And Logging for free with explanations, exercises, and a quick test (for Backend Engineer).

Published: January 20, 2026 | Updated: January 20, 2026

Why this matters

Backends run in messy, unpredictable environments. Requests time out, services fail, data is malformed, and users still expect reliability. Good error handling and logging turn chaos into clear signals, reduce downtime, and speed up incident resolution.

  • Map failures to predictable responses (e.g., 400 vs 500) so clients behave correctly.
  • Record structured logs to diagnose issues quickly.
  • Use timeouts, retries, and idempotency to make systems resilient.

Who this is for

  • Junior to mid-level backend engineers who ship APIs or background jobs.
  • Developers moving from scripts to production services.

Prerequisites

  • Basic programming in one backend language (e.g., Python, Node.js, Go, Java).
  • Familiarity with HTTP and JSON.
  • Comfort running a local server and reading logs.

Learning path

  1. Error taxonomy: Distinguish expected vs unexpected errors; map to HTTP status codes.
  2. Logging fundamentals: Levels, structured JSON logs, correlation IDs, redaction.
  3. Resilience patterns: Timeouts, retries with exponential backoff + jitter, idempotency keys, circuit breakers.
  4. Operationalize: Sampling, retention, alerts, and on-call friendly messages.

Concept explained simply

An error is a deviation from the happy path. Some are expected (invalid input), others are unexpected (database down). Error handling decides what to do next: fail fast, retry safely, or degrade gracefully. Logging is how you capture what happened in a machine- and human-friendly way.

Mental model

Think of your service as an airport:

  • Validation = security checkpoint (catch bad inputs early).
  • Routing decisions = air traffic control (send errors to the right place, keep IDs to track planes).
  • Logs = flight recorder (structured facts for post-incident analysis).
  • Resilience = weather procedures (timeouts, retries, cancellations, alternate routes).

Core principles

  • Error types:
    • Client/expected: validation errors, not found. Return 4xx; log at INFO or WARN with reason.
    • Server/unexpected: timeouts, null pointer, dependency down. Return 5xx; log at ERROR.
  • HTTP mapping (examples): 400 bad params; 401 unauthenticated; 403 unauthorized; 404 not found; 409 conflict; 422 semantic validation; 429 rate limited; 500 internal; 502/503/504 dependency issues.
  • Logging levels: DEBUG (dev detail), INFO (state change), WARN (recoverable oddity), ERROR (user impact), FATAL (process cannot continue).
  • Structured logs: Emit JSON with stable keys: timestamp, level, message, service, correlation_id, error.code, error.stack (if any), http.status, user.id (if safe).
  • Correlation IDs: Generate a request ID at the edge; pass to downstream calls; include in every log line.
  • PII redaction: Never log secrets, passwords, tokens, full credit cards. Mask with *** or hashes.
  • Timeouts: Bound every network call. A request without a timeout is a memory leak waiting to happen.
  • Retries with backoff + jitter: Retry only idempotent operations. Use exponential backoff (e.g., 100ms, 200ms, 400ms) plus random jitter to avoid thundering herds.
  • Idempotency: Ensure repeated requests produce the same result (e.g., idempotency keys for payments).
  • Circuit breaker: If a dependency keeps failing, stop calling it briefly to let it recover and protect your threads.
  • Observability quick wins: Add counters for error classes, latency histograms, and top error codes.

Worked examples

Example 1 — API validation with clean 4xx responses

Goal: Reject bad email quickly, log the reason, and return a clear 400.

// Pseudocode
function POST /signup(request):
  correlation_id = request.headers["X-Request-ID"] or generate()
  log(INFO, {
    "message": "signup_received",
    "correlation_id": correlation_id
  })

  if not isValidEmail(request.body.email):
    log(WARN, {
      "message": "validation_failed",
      "field": "email",
      "code": "invalid_email",
      "correlation_id": correlation_id
    })
    return 400, {"error":"invalid_email","message":"Email format is invalid","correlation_id":correlation_id}

  // continue happy path...

Client gets a helpful 400; logs are structured and correlated.

Example 2 — Safe retries with exponential backoff + jitter

Goal: Call inventory service with retries; ensure idempotency.

function reserveInventory(orderId, items):
  idempotencyKey = hash(orderId)
  for attempt in 1..3:
    try:
      return callInventoryAPI({items, idempotencyKey}, timeout=750ms)
    except TimeoutError as e:
      wait = jitter(expBackoff(base=100ms, factor=2, attempt))  // 100ms, 200ms, 400ms ± jitter
      log(WARN, {
        "message": "inventory_retry",
        "attempt": attempt,
        "wait_ms": wait,
        "error": {"type":"timeout"}
      })
      sleep(wait)
  log(ERROR, {"message":"inventory_failed","order_id":orderId})
  throw DependencyError("inventory")

Idempotency key prevents duplicate reservations across retries.

Example 3 — Background job with dead-letter queue and redacted logs
function processEmailJob(job):
  try:
    sendEmail(to=job.email, body=job.body, timeout=2s)
    log(INFO, {"message":"email_sent","to_domain":domain(job.email)})
  except PermanentError as e:
    log(ERROR, {"message":"email_permanent_fail","code":e.code,"to":"***@" + domain(job.email)})
    moveToDeadLetter(job)
  except TransientError as e:
    retryLater(job, delay=expBackoff(...))
    log(WARN, {"message":"email_transient_fail","retry_scheduled":true})

We never log full emails; we keep a dead-letter queue for manual inspection.

Before you ship — checklist

  • Every external call has a timeout set.
  • Retries only on idempotent paths and with backoff + jitter.
  • All responses map to correct HTTP codes and machine-readable error bodies.
  • Logs are structured JSON and include correlation_id.
  • PII and secrets are redacted or excluded.
  • Top 5 error classes are counted and visible.

Exercises

Note: Anyone can take the exercises and quick test. Only logged-in users see saved progress.

Exercise ex1 — Design an HTTP error strategy for a signup endpoint

Implement a POST /signup handler that:

  • Validates email and password (min length 8).
  • Returns 400 with a machine-readable error code for validation issues.
  • Returns 409 if the email already exists.
  • Returns 503 if the user database times out.
  • Logs structured JSON with correlation_id, error.code, and http.status.
Input/Output hints
  • Input JSON example: {"email":"bad","password":"short"}
  • Output on validation fail: 400 + {"error":"invalid_email"|"weak_password", "correlation_id": "..."}
  • Self-check: Do your logs show the same correlation_id across all lines for a single request?

Exercise ex2 — Implement retries with exponential backoff and idempotency

Wrap a call to POST /reserve of an external service:

  • 3 attempts max; base backoff 100ms, factor 2, with ±20% jitter.
  • Per-attempt timeout 700ms.
  • Include Idempotency-Key header derived from order_id.
  • Log each attempt with level WARN; final failure at ERROR with summary.
Tip

Retry on timeouts and 5xx only; never on 4xx.

Exercise ex3 — Redact sensitive data in logs

Given raw log messages, produce a redacted JSON log where:

  • Emails become ***@domain.
  • Credit cards become **** **** **** 1234 (only last 4 shown).
  • Authorization headers are removed.
Sample input
{
  "message":"charge_attempt",
  "email":"alex@example.com",
  "card":"4242424242424242",
  "authorization":"Bearer abcdef"
}

Common mistakes (and how to self-check)

  • Logging errors without context: Add correlation_id, error.code, http.status. Self-check: Can you trace a single user request end-to-end using one ID?
  • Over-retrying: Retrying non-idempotent actions causes duplicates. Self-check: If you replay the request 3 times, is the outcome unchanged?
  • Missing timeouts: Hanging threads pile up. Self-check: Search code for external calls; confirm timeout parameter set everywhere.
  • Logging secrets/PII: Compliance and security risk. Self-check: Scan for keys named password, token, authorization, card.
  • Ambiguous HTTP codes: 500 for validation errors confuses clients. Self-check: For each error code, write a one-liner client action.

Practical projects

  • Resilient User Service: Build a user signup/login API with structured logging, validation, rate limiting (429), and dependency timeouts.
  • Job Worker with DLQ: A background processor that retries transient failures and moves permanent failures to a dead-letter queue.
  • Log Enricher: A small library that injects correlation IDs and redacts PII, with unit tests.

Mini challenge

You receive an alert: error rate spiked from 0.5% to 4% after a deploy. You have 5 minutes.

  • Which 3 log fields do you filter on first, and why?
  • What single change would you roll back or feature-flag if logs show 503s from a dependency?
  • How do you prevent a retry storm while mitigating user impact?
Suggested direction
  • Filter by correlation_id count, error.code top offenders, and http.status distribution.
  • Enable circuit breaker + reduce concurrency to the failing dependency; serve cached or degraded responses where possible.

Next steps

  • Complete the exercises below to practice error mapping, retries, and redaction.
  • Take the quick test to confirm understanding. Aim for 80%+.
  • Integrate a correlation ID middleware and a logging formatter in your current project.

Practice Exercises

3 exercises to complete

Instructions

  1. Create a POST /signup handler that validates email and password (min 8 chars). Return:
    • 400 for invalid email or weak password with {"error":"invalid_email"|"weak_password","message":"...","correlation_id":"..."}
    • 409 if the email already exists (error=conflict_email)
    • 503 when DB times out (error=db_timeout)
  2. Log one line at request start (INFO), one line per error (WARN/ERROR), and one line at response (INFO). Include correlation_id and http.status.
Expected Output
HTTP/1.1 400 Bad Request with body {"error":"invalid_email","message":"Email format is invalid","correlation_id":"REQ-123"}. Logs include structured entries with the same correlation_id and http.status=400.

Error Handling And Logging — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Error Handling And Logging?

AI Assistant

Ask questions about this tool