Why this matters
Logs are your fastest feedback loop in production. Structured logging turns messy text into consistent, queryable data so you can answer questions like: Which requests are slow? Which tenant is affected? How many retries happened before a failure? With structure, your log platform can filter, aggregate, and alert reliably.
- Debugging: Reconstruct a failing request path with trace_id and span_id.
- Incident response: Slice errors by service, endpoint, tenant, and deployment version.
- Compliance and safety: Avoid leaking secrets by defining and enforcing allowed fields.
- Cost control: Sample high-volume info logs without losing critical errors.
Concept explained simply
Structured logging means each log entry is a small record (often JSON) with consistent keys and types. Instead of a free-form sentence, you keep a short message and put facts into fields.
Mental model: Think of each log line as a table row. Columns are fields like timestamp, level, service, event, user_id, http.status_code. Because the columns are consistent, you can filter and aggregate fast.
Example: unstructured vs structured
Unstructured:
"User 42 checkout failed: payment declined code=51 order=abc123"
Structured (JSON):
{"timestamp":"2026-01-20T10:20:30Z","level":"error","service":"checkout","event":"payment_declined","user_id":42,"order_id":"abc123","payment.code":51,"message":"Checkout failed"}
Core principles
- Single-line JSON per event. Avoid multi-line logs; serialize stacks as a single field.
- Strong timestamps. Use UTC ISO-8601 (e.g., 2026-01-20T10:20:30Z).
- Stable field names. Prefer namespaced keys like http.method, error.message, db.rows_affected.
- Correlation. Include trace_id and span_id (or at least correlation_id/request_id).
- Message templates. A short, stable message; details go in fields.
- Privacy first. Never log secrets or raw PII. Redact or hash if needed.
- Right level, right volume. DEBUG for local, INFO for state changes, WARN for unusual but non-fatal, ERROR for failures, FATAL for crash/exit.
Field naming guide
- Required: timestamp, level, message, service, env
- Correlation: trace_id, span_id, parent_span_id, request_id
- HTTP: http.method, http.route, http.path, http.status_code, http.client_ip
- User and tenancy: user.id, tenant.id (avoid names/emails), session.id
- Errors: error.type, error.message, error.stack
- Performance: duration_ms, bytes_in, bytes_out
- Deployment: version, host, region
- Sampling: sample_rate (if using probabilistic sampling)
Naming tips
- Use lowercase with dots for namespaces.
- Use consistent types: status_code is always a number; duration_ms is always a number.
- Avoid ambiguous names like id; prefer user.id, order.id, request.id.
Log levels and message templates
- DEBUG: Developer diagnostics; may include verbose fields. Usually off in prod.
- INFO: Business or state changes: user_signed_in, order_created. High-volume; be selective.
- WARN: Unexpected but handled: retry_scheduled, cache_miss_spike.
- ERROR: Operation failed: payment_declined, db_write_failed.
- FATAL: Process cannot continue; will exit or crashed.
Template pattern: message + stable fields.
{"level":"error","message":"Payment declined","event":"payment_declined","payment.code":51,"order.id":"abc123"}
Privacy and security
- Never log: passwords, access tokens, private keys, full credit card numbers, raw personal data.
- Redact or hash when needed: user.email_hash, card.last4.
- Truncate long fields; store up to a safe length (e.g., 256 chars).
- Mark redactions: redacted:true or field_name:"[REDACTED]" for clarity.
Safe error example
{
"level":"error",
"message":"OAuth exchange failed",
"event":"oauth_exchange_failed",
"error.type":"InvalidGrant",
"error.message":"invalid_grant",
"oauth.provider":"example",
"user.id":null,
"token":"[REDACTED]"
}
Sampling and volume control
- Do NOT sample ERROR or FATAL.
- Sample high-volume INFO (e.g., keep 10%): set sample_rate and compute a stable hash per request to avoid bias.
- Throttle repetitive WARNs to prevent noise.
{"level":"info","message":"Request completed","sample_rate":0.1,"duration_ms":12}
Worked examples
Example 1 — Convert a noisy line to JSON (Python)
# From:
# "User 42 checkout failed: payment declined code=51 order=abc123"
record = {
"timestamp": "2026-01-20T10:20:30Z",
"level": "error",
"service": "checkout",
"event": "payment_declined",
"user.id": 42,
"order.id": "abc123",
"payment.code": 51,
"message": "Checkout failed"
}
print(json.dumps(record))
Example 2 — Go http handler with correlation
// Pseudocode using a structured logger interface
func Handler(w http.ResponseWriter, r *http.Request) {
traceID := getOrCreateTraceID(r)
start := time.Now()
// ... do work
log.Info("request_completed",
Field("service", "catalog"),
Field("trace_id", traceID),
Field("http.method", r.Method),
Field("http.route", "/items/{id}"),
Field("http.status_code", 200),
Field("duration_ms", time.Since(start).Milliseconds()),
)
}
Example 3 — Node.js with pino-style JSON
const start = Date.now()
// ... handler work
console.log(JSON.stringify({
timestamp: new Date().toISOString(),
level: "info",
service: "orders",
event: "order_created",
order: { id: "o_789" },
duration_ms: Date.now() - start,
message: "Order created"
}))
Example 4 — Serialize stack safely
try {
// ... failing code
} catch (e) {
console.log(JSON.stringify({
timestamp: new Date().toISOString(),
level: "error",
service: "billing",
event: "charge_failed",
error: {
type: e.name,
message: e.message,
stack: String(e.stack) // keep single-line; platform can pretty-print later
},
message: "Charge failed"
}))
}
How to validate your logs
- Every log line parses as JSON.
- All timestamps are UTC ISO-8601.
- Required fields are present: level, message, service, timestamp.
- No secrets present; redactions applied.
- Field types are stable over time.
Self-check mini task
Take five recent production lines. For each, check: parse success, required fields, no secrets, type correctness, has correlation id for request-scoped events.
Exercises
These exercises mirror the items in the Exercises panel below. Do them here first, then submit in the panel.
Exercise 1 — Design a JSON log schema and convert raw lines
- Step 1: Propose a minimal schema for a web API (fields and types).
- Step 2: Convert the three raw lines into JSON using your schema.
- Step 3: Add correlation fields and fix any PII issues.
Raw lines
[WARN] 10:20:30 order abc123 delayed by 230ms user=42
[INFO] user 42 checkout started path=/checkout
[ERROR] payment declined code=51 order=abc123 email=jane@example.com
- Checklist:
- Uses UTC timestamps.
- Has service, level, message, event.
- No plain emails or secrets.
- Consistent numeric types.
Exercise 2 — Instrument an HTTP handler with structured logs
- Step 1: On request start, log event=request_started with trace_id.
- Step 2: On success, log event=request_completed with duration_ms and http.status_code.
- Step 3: On error, log level=error with error.type and error.message, include trace_id.
- Step 4: Ensure each log is single-line JSON.
- Checklist:
- Stable field names.
- No multi-line stack traces.
- ERROR not sampled.
Common mistakes
- Logging sentences instead of fields. Fix: Keep message short; move facts to fields.
- Inconsistent naming. Fix: Decide on a schema and lint against it.
- Missing correlation. Fix: Generate and propagate trace_id for every request.
- Leaking PII/secrets. Fix: Centralize redaction and add tests.
- Multi-line logs. Fix: Serialize stacks into a single string; avoid newline characters.
- Level misuse. Fix: Reserve ERROR for failures; keep INFO for meaningful state changes.
Self-check
- Pick a random log. Can you answer who, what, where, when in 10 seconds?
- Query last hour: top 5 error types. Does it work?
- Search by one trace_id: do all related logs appear?
Practical projects
- Add structured logging to a small service (one endpoint). Include start, success, and error events with trace_id.
- Create a log schema JSON and a simple validator that rejects unknown fields or wrong types during CI.
- Implement sampling for INFO logs with a stable hash of request_id and record sample_rate.
Learning path
- Define your log schema and naming rules.
- Instrument request lifecycle (start, db call, external call, finish).
- Add correlation (trace_id/span_id).
- Harden privacy and redaction.
- Introduce sampling and retention policies.
- Automate validation in CI.
Who this is for
- Backend engineers adding or improving observability.
- Developers migrating from print-style logs.
- SREs who need consistent, queryable logs.
Prerequisites
- Basic knowledge of at least one backend language.
- Familiarity with HTTP APIs and error handling.
Next steps
- Add metrics for key events (e.g., error counts, latency).
- Introduce distributed tracing to connect spans across services.
- Create runbooks triggered by specific error types.
Mini challenge
Pick one endpoint. Add three logs: request_started, db_query_completed, request_completed. Include trace_id, duration_ms for the DB query, and http.status_code on completion. Verify you can filter all three by the same trace_id.