How to learn Structured Logging for Observability And Operations in Backend Engineer for free

Why this matters

Logs are your fastest feedback loop in production. Structured logging turns messy text into consistent, queryable data so you can answer questions like: Which requests are slow? Which tenant is affected? How many retries happened before a failure? With structure, your log platform can filter, aggregate, and alert reliably.

Debugging: Reconstruct a failing request path with trace_id and span_id.
Incident response: Slice errors by service, endpoint, tenant, and deployment version.
Compliance and safety: Avoid leaking secrets by defining and enforcing allowed fields.
Cost control: Sample high-volume info logs without losing critical errors.

Concept explained simply

Structured logging means each log entry is a small record (often JSON) with consistent keys and types. Instead of a free-form sentence, you keep a short message and put facts into fields.

Mental model: Think of each log line as a table row. Columns are fields like timestamp, level, service, event, user_id, http.status_code. Because the columns are consistent, you can filter and aggregate fast.

Example: unstructured vs structured

Unstructured:
"User 42 checkout failed: payment declined code=51 order=abc123"

Structured (JSON):
{"timestamp":"2026-01-20T10:20:30Z","level":"error","service":"checkout","event":"payment_declined","user_id":42,"order_id":"abc123","payment.code":51,"message":"Checkout failed"}

Core principles

Single-line JSON per event. Avoid multi-line logs; serialize stacks as a single field.
Strong timestamps. Use UTC ISO-8601 (e.g., 2026-01-20T10:20:30Z).
Stable field names. Prefer namespaced keys like http.method, error.message, db.rows_affected.
Correlation. Include trace_id and span_id (or at least correlation_id/request_id).
Message templates. A short, stable message; details go in fields.
Privacy first. Never log secrets or raw PII. Redact or hash if needed.
Right level, right volume. DEBUG for local, INFO for state changes, WARN for unusual but non-fatal, ERROR for failures, FATAL for crash/exit.

Field naming guide

Required: timestamp, level, message, service, env
Correlation: trace_id, span_id, parent_span_id, request_id
HTTP: http.method, http.route, http.path, http.status_code, http.client_ip
User and tenancy: user.id, tenant.id (avoid names/emails), session.id
Errors: error.type, error.message, error.stack
Performance: duration_ms, bytes_in, bytes_out
Deployment: version, host, region
Sampling: sample_rate (if using probabilistic sampling)

Naming tips

Use lowercase with dots for namespaces.
Use consistent types: status_code is always a number; duration_ms is always a number.
Avoid ambiguous names like id; prefer user.id, order.id, request.id.

Log levels and message templates

DEBUG: Developer diagnostics; may include verbose fields. Usually off in prod.
INFO: Business or state changes: user_signed_in, order_created. High-volume; be selective.
WARN: Unexpected but handled: retry_scheduled, cache_miss_spike.
ERROR: Operation failed: payment_declined, db_write_failed.
FATAL: Process cannot continue; will exit or crashed.

Template pattern: message + stable fields.

{"level":"error","message":"Payment declined","event":"payment_declined","payment.code":51,"order.id":"abc123"}

Privacy and security

Never log: passwords, access tokens, private keys, full credit card numbers, raw personal data.
Redact or hash when needed: user.email_hash, card.last4.
Truncate long fields; store up to a safe length (e.g., 256 chars).
Mark redactions: redacted:true or field_name:"[REDACTED]" for clarity.

Safe error example

{
  "level":"error",
  "message":"OAuth exchange failed",
  "event":"oauth_exchange_failed",
  "error.type":"InvalidGrant",
  "error.message":"invalid_grant",
  "oauth.provider":"example",
  "user.id":null,
  "token":"[REDACTED]"
}

Sampling and volume control

Do NOT sample ERROR or FATAL.
Sample high-volume INFO (e.g., keep 10%): set sample_rate and compute a stable hash per request to avoid bias.
Throttle repetitive WARNs to prevent noise.

{"level":"info","message":"Request completed","sample_rate":0.1,"duration_ms":12}

Worked examples

Example 1 — Convert a noisy line to JSON (Python)

# From:
# "User 42 checkout failed: payment declined code=51 order=abc123"

record = {
  "timestamp": "2026-01-20T10:20:30Z",
  "level": "error",
  "service": "checkout",
  "event": "payment_declined",
  "user.id": 42,
  "order.id": "abc123",
  "payment.code": 51,
  "message": "Checkout failed"
}
print(json.dumps(record))

Example 2 — Go http handler with correlation

// Pseudocode using a structured logger interface
func Handler(w http.ResponseWriter, r *http.Request) {
    traceID := getOrCreateTraceID(r)
    start := time.Now()
    // ... do work
    log.Info("request_completed",
        Field("service", "catalog"),
        Field("trace_id", traceID),
        Field("http.method", r.Method),
        Field("http.route", "/items/{id}"),
        Field("http.status_code", 200),
        Field("duration_ms", time.Since(start).Milliseconds()),
    )
}

Example 3 — Node.js with pino-style JSON

const start = Date.now()
// ... handler work
console.log(JSON.stringify({
  timestamp: new Date().toISOString(),
  level: "info",
  service: "orders",
  event: "order_created",
  order: { id: "o_789" },
  duration_ms: Date.now() - start,
  message: "Order created"
}))

Example 4 — Serialize stack safely

try {
  // ... failing code
} catch (e) {
  console.log(JSON.stringify({
    timestamp: new Date().toISOString(),
    level: "error",
    service: "billing",
    event: "charge_failed",
    error: {
      type: e.name,
      message: e.message,
      stack: String(e.stack) // keep single-line; platform can pretty-print later
    },
    message: "Charge failed"
  }))
}

How to validate your logs

Every log line parses as JSON.
All timestamps are UTC ISO-8601.
Required fields are present: level, message, service, timestamp.
No secrets present; redactions applied.
Field types are stable over time.

Self-check mini task

Take five recent production lines. For each, check: parse success, required fields, no secrets, type correctness, has correlation id for request-scoped events.

Exercises

These exercises mirror the items in the Exercises panel below. Do them here first, then submit in the panel.

Exercise 1 — Design a JSON log schema and convert raw lines

Step 1: Propose a minimal schema for a web API (fields and types).
Step 2: Convert the three raw lines into JSON using your schema.
Step 3: Add correlation fields and fix any PII issues.

Raw lines

[WARN] 10:20:30 order abc123 delayed by 230ms user=42
[INFO] user 42 checkout started path=/checkout
[ERROR] payment declined code=51 order=abc123 email=jane@example.com

Checklist:
- Uses UTC timestamps.
- Has service, level, message, event.
- No plain emails or secrets.
- Consistent numeric types.

Exercise 2 — Instrument an HTTP handler with structured logs

Step 1: On request start, log event=request_started with trace_id.
Step 2: On success, log event=request_completed with duration_ms and http.status_code.
Step 3: On error, log level=error with error.type and error.message, include trace_id.
Step 4: Ensure each log is single-line JSON.

Checklist:
- Stable field names.
- No multi-line stack traces.
- ERROR not sampled.

Common mistakes

Logging sentences instead of fields. Fix: Keep message short; move facts to fields.
Inconsistent naming. Fix: Decide on a schema and lint against it.
Missing correlation. Fix: Generate and propagate trace_id for every request.
Leaking PII/secrets. Fix: Centralize redaction and add tests.
Multi-line logs. Fix: Serialize stacks into a single string; avoid newline characters.
Level misuse. Fix: Reserve ERROR for failures; keep INFO for meaningful state changes.

Self-check

Pick a random log. Can you answer who, what, where, when in 10 seconds?
Query last hour: top 5 error types. Does it work?
Search by one trace_id: do all related logs appear?

Practical projects

Add structured logging to a small service (one endpoint). Include start, success, and error events with trace_id.
Create a log schema JSON and a simple validator that rejects unknown fields or wrong types during CI.
Implement sampling for INFO logs with a stable hash of request_id and record sample_rate.

Learning path

Define your log schema and naming rules.
Instrument request lifecycle (start, db call, external call, finish).
Add correlation (trace_id/span_id).
Harden privacy and redaction.
Introduce sampling and retention policies.
Automate validation in CI.

Who this is for

Backend engineers adding or improving observability.
Developers migrating from print-style logs.
SREs who need consistent, queryable logs.

Prerequisites

Basic knowledge of at least one backend language.
Familiarity with HTTP APIs and error handling.

Next steps

Add metrics for key events (e.g., error counts, latency).
Introduce distributed tracing to connect spans across services.
Create runbooks triggered by specific error types.

Mini challenge

Pick one endpoint. Add three logs: request_started, db_query_completed, request_completed. Include trace_id, duration_ms for the DB query, and http.status_code on completion. Verify you can filter all three by the same trace_id.

Menu

Structured Logging

Table of Contents