Menu

Topic 5 of 8

Standard Observability Integration

Learn Standard Observability Integration for free with explanations, exercises, and a quick test (for Platform Engineer).

Published: January 23, 2026 | Updated: January 23, 2026

Why this matters

Platform Engineers enable teams to ship confidently. Standard observability integration makes every service easy to debug, measure, and operate. With shared conventions, developers spend less time wiring telemetry and more time building features.

  • Speed up incident response: traces, metrics, and logs correlate quickly across services.
  • Consistent dashboards and alerts: uniform naming and labels make org-wide views possible.
  • Lower onboarding friction: new services inherit defaults and best practices.
  • Better product decisions: reliable SLIs/SLOs from day one.

Concept explained simply

Standard observability integration means every service emits the same kinds of telemetry the same way. Default libraries, config, and naming conventions remove guesswork.

Mental model

Think of a “golden thread” running through your system:

  • Traces follow a request across services (trace_id).
  • Metrics quantify behavior (rates, errors, durations).
  • Logs describe events. Each log line references the same trace_id/span_id so everything lines up.

Standards ensure the thread is unbroken and easy to pull.

Standards to apply (ready-to-use)

  • Signals:
    • Traces: OpenTelemetry SDK, OTLP export, W3C trace context.
    • Metrics: Prometheus exposition format (/metrics) or OTLP -> collector -> Prometheus.
    • Logs: line-delimited JSON; include trace_id, span_id, request_id.
  • Resource attributes (OpenTelemetry):
    • service.name, service.version, deployment.environment
  • Metric naming:
    • snake_case, include unit suffix (e.g., request_duration_seconds)
    • Use low-cardinality labels: method, status_code, route_template
  • Sampling:
    • Default head sampling 1–10% for traces in production; 100% in dev
    • Keep it configurable via env vars
  • Config via env vars (examples):
    • OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT
    • OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod,service.version=1.2.3
    • LOG_FORMAT=json

Worked examples

Example 1: Service-level OpenTelemetry (Node.js)

Goal: emit traces and metrics; correlate logs.

// install: npm i @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node pino pino-http
import express from 'express';
import pino from 'pino';
import pinoHttp from 'pino-http';
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces'
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/metrics'
    })
  }),
  instrumentations: [getNodeAutoInstrumentations()]
});
await sdk.start();

const logger = pino({ level: 'info' });
const app = express();
app.use(pinoHttp({ logger }));

app.get('/health', (req, res) => {
  req.log.info({ event: 'health_check' }, 'ok');
  res.send('ok');
});

app.listen(3000, () => logger.info({ service: process.env.OTEL_SERVICE_NAME }, 'listening'));
What to check
  • Traces visible with service.name set
  • Logs in JSON contain trace correlation (via pino-http + OTel context)
  • Metrics exported or scraped via collector

Example 2: Kubernetes scrape and logs

# Deployment pod annotations (Prometheus Operator or similar)
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"

# Fluent Bit to ship JSON logs (conceptual snippet)
[INPUT]
    Name              tail
    Path              /var/log/containers/*.log
    Parser            docker

[FILTER]
    Name              kubernetes
    Match             kube.*

[OUTPUT]
    Name              stdout
    Match             *
What to check
  • Application prints JSON (no multiline stack traces without JSON)
  • /metrics endpoint exposes HELP/TYPE and metric samples
  • Labels like method, status_code are low cardinality

Example 3: HTTP latency histogram (Prometheus)

// Node.js using prom-client
import client from 'prom-client';
const register = new client.Registry();
const httpDuration = new client.Histogram({
  name: 'http_server_request_duration_seconds',
  help: 'HTTP server request latency',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.005,0.01,0.025,0.05,0.1,0.25,0.5,1,2,5]
});
register.registerMetric(httpDuration);

// In route handler
const end = httpDuration.startTimer({ method: req.method, route: '/users/:id' });
res.on('finish', () => end({ status_code: res.statusCode }));

// /metrics handler
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});
Expected /metrics lines
# HELP http_server_request_duration_seconds HTTP server request latency
# TYPE http_server_request_duration_seconds histogram
http_server_request_duration_seconds_bucket{method="GET",route="/users/:id",status_code="200",le="0.1"} 42
http_server_request_duration_seconds_sum{method="GET",route="/users/:id",status_code="200"} 3.14
http_server_request_duration_seconds_count{method="GET",route="/users/:id",status_code="200"} 50

How to implement (team playbook)

  1. Choose org-wide defaults
    • Signals: OTel traces/metrics, JSON logs
    • Protocols: OTLP, Prometheus
    • Naming and labels
  2. Ship language templates
    • SDK bootstrap, env vars, exporter config
    • Middleware for correlation IDs
  3. Kubernetes integration
    • Scrape annotations, /metrics
    • Log shipping DaemonSet
  4. Dashboards and alerts
    • Golden signals: latency, traffic, errors, saturation
    • SLO alerts (burn-rate)
  5. Rollout
    • Adopt on new services first
    • Backfill existing services incrementally

Exercises (practice)

These mirror the tasks below. Do them in a sample service or a playground app.

Exercise 1 — Instrument a service with OpenTelemetry and OTLP exporter

Goal: traces + metrics with standard resource attributes, configurable via env vars. See Exercises section for full details.

Exercise 2 — Structured JSON logging with trace correlation

Goal: every log line includes trace_id/span_id and request_id.

Exercise 3 — Prometheus HTTP latency histogram

Goal: expose http_server_request_duration_seconds with low-cardinality labels.

Self-check checklist

  • service.name, service.version, deployment.environment are set
  • Logs are JSON; include trace_id and span_id
  • /metrics has HELP/TYPE and expected histogram buckets
  • Prometheus can scrape without errors
  • Dashboards populate automatically for the new service

Common mistakes and how to self-check

  • Missing resource attributes
    • Fix: set OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES
  • High-cardinality labels (e.g., http.path raw)
    • Fix: use route templates (e.g., /users/:id)
  • Unstructured logs or mixed formats
    • Fix: enforce JSON logger and disable non-JSON console prints
  • Sampling too aggressive
    • Fix: start with 5–10% in prod; verify trace coverage during incidents
  • Forgot trace correlation in logs
    • Fix: inject trace_id/span_id from context on every log call
Quick self-diagnostics
  • Emit a test request and search by trace_id across logs, traces, and metrics
  • Verify cardinalities in Prometheus for labels (count distinct values)
  • Check for HELP/TYPE in /metrics output

Who this is for

  • Platform Engineers building paved roads
  • Backend Engineers integrating telemetry into services
  • Tech Leads defining reliability standards

Prerequisites

  • Basic service development in one language (Node, Go, Java, Python, etc.)
  • Familiarity with Kubernetes or your deployment stack
  • Basic Prometheus and OpenTelemetry concepts

Learning path

  • Before: Logging fundamentals → Metrics basics → Tracing basics
  • Now: Standard Observability Integration (this lesson)
  • After: SLOs and alerting → Incident response playbooks → Observability platform ops

Practical projects

  • Golden Template: create a repo template that boots OTel, JSON logs, /metrics
  • Org Baseline Dashboards: publish Grafana dashboards that auto-populate by service.name
  • Trace Correlation Library: tiny middleware to inject request_id and map to trace_id

Mini challenge

Create a minimal new service from your template. In under 30 minutes, prove you can:

  • See a trace in your backend
  • Find the exact log line with the same trace_id
  • View the latency histogram for the endpoint you hit

Quick test

Take the quick test to confirm understanding. The test is available to everyone; sign in to save your progress.

Next steps

  • Roll the standard into CI checks (lint metric names, ensure JSON logs)
  • Publish a short integration guide with copy-paste snippets
  • Schedule a brown-bag to walk teams through the template

Practice Exercises

3 exercises to complete

Instructions

Instrument any demo HTTP service with OpenTelemetry so it emits traces and metrics using org standards.

  • Set env vars: OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_RESOURCE_ATTRIBUTES (include deployment.environment and service.version).
  • Enable auto-instrumentations (HTTP server/client).
  • Export traces and metrics via OTLP (HTTP or gRPC).
  • Send a few requests and confirm telemetry appears in your backend/collector.
Tip: baseline env
export OTEL_SERVICE_NAME=checkout-api
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
export OTEL_RESOURCE_ATTRIBUTES=deployment.environment=dev,service.version=0.1.0
Expected Output
Traces visible with service.name=checkout-api; metrics exported; resource attributes include deployment.environment and service.version.

Standard Observability Integration — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Standard Observability Integration?

AI Assistant

Ask questions about this tool