How to learn Standard Observability Integration for Developer Experience DX in Platform Engineer for free

Why this matters

Platform Engineers enable teams to ship confidently. Standard observability integration makes every service easy to debug, measure, and operate. With shared conventions, developers spend less time wiring telemetry and more time building features.

Speed up incident response: traces, metrics, and logs correlate quickly across services.
Consistent dashboards and alerts: uniform naming and labels make org-wide views possible.
Lower onboarding friction: new services inherit defaults and best practices.
Better product decisions: reliable SLIs/SLOs from day one.

Concept explained simply

Standard observability integration means every service emits the same kinds of telemetry the same way. Default libraries, config, and naming conventions remove guesswork.

Mental model

Think of a “golden thread” running through your system:

Traces follow a request across services (trace_id).
Metrics quantify behavior (rates, errors, durations).
Logs describe events. Each log line references the same trace_id/span_id so everything lines up.

Standards ensure the thread is unbroken and easy to pull.

Standards to apply (ready-to-use)

Signals:
- Traces: OpenTelemetry SDK, OTLP export, W3C trace context.
- Metrics: Prometheus exposition format (/metrics) or OTLP -> collector -> Prometheus.
- Logs: line-delimited JSON; include trace_id, span_id, request_id.
Resource attributes (OpenTelemetry):
- service.name, service.version, deployment.environment
Metric naming:
- snake_case, include unit suffix (e.g., request_duration_seconds)
- Use low-cardinality labels: method, status_code, route_template
Sampling:
- Default head sampling 1–10% for traces in production; 100% in dev
- Keep it configurable via env vars
Config via env vars (examples):
- OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT
- OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod,service.version=1.2.3
- LOG_FORMAT=json

Worked examples

Example 1: Service-level OpenTelemetry (Node.js)

Goal: emit traces and metrics; correlate logs.

// install: npm i @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node pino pino-http
import express from 'express';
import pino from 'pino';
import pinoHttp from 'pino-http';
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces'
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/metrics'
    })
  }),
  instrumentations: [getNodeAutoInstrumentations()]
});
await sdk.start();

const logger = pino({ level: 'info' });
const app = express();
app.use(pinoHttp({ logger }));

app.get('/health', (req, res) => {
  req.log.info({ event: 'health_check' }, 'ok');
  res.send('ok');
});

app.listen(3000, () => logger.info({ service: process.env.OTEL_SERVICE_NAME }, 'listening'));

What to check

Traces visible with service.name set
Logs in JSON contain trace correlation (via pino-http + OTel context)
Metrics exported or scraped via collector

Example 2: Kubernetes scrape and logs

# Deployment pod annotations (Prometheus Operator or similar)
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"

# Fluent Bit to ship JSON logs (conceptual snippet)
[INPUT]
    Name              tail
    Path              /var/log/containers/*.log
    Parser            docker

[FILTER]
    Name              kubernetes
    Match             kube.*

[OUTPUT]
    Name              stdout
    Match             *

What to check

Application prints JSON (no multiline stack traces without JSON)
/metrics endpoint exposes HELP/TYPE and metric samples
Labels like method, status_code are low cardinality

Example 3: HTTP latency histogram (Prometheus)

// Node.js using prom-client
import client from 'prom-client';
const register = new client.Registry();
const httpDuration = new client.Histogram({
  name: 'http_server_request_duration_seconds',
  help: 'HTTP server request latency',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.005,0.01,0.025,0.05,0.1,0.25,0.5,1,2,5]
});
register.registerMetric(httpDuration);

// In route handler
const end = httpDuration.startTimer({ method: req.method, route: '/users/:id' });
res.on('finish', () => end({ status_code: res.statusCode }));

// /metrics handler
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

Expected /metrics lines

# HELP http_server_request_duration_seconds HTTP server request latency
# TYPE http_server_request_duration_seconds histogram
http_server_request_duration_seconds_bucket{method="GET",route="/users/:id",status_code="200",le="0.1"} 42
http_server_request_duration_seconds_sum{method="GET",route="/users/:id",status_code="200"} 3.14
http_server_request_duration_seconds_count{method="GET",route="/users/:id",status_code="200"} 50

How to implement (team playbook)

Choose org-wide defaults
- Signals: OTel traces/metrics, JSON logs
- Protocols: OTLP, Prometheus
- Naming and labels
Ship language templates
- SDK bootstrap, env vars, exporter config
- Middleware for correlation IDs
Kubernetes integration
- Scrape annotations, /metrics
- Log shipping DaemonSet
Dashboards and alerts
- Golden signals: latency, traffic, errors, saturation
- SLO alerts (burn-rate)
Rollout
- Adopt on new services first
- Backfill existing services incrementally

Exercises (practice)

These mirror the tasks below. Do them in a sample service or a playground app.

Exercise 1 — Instrument a service with OpenTelemetry and OTLP exporter

Goal: traces + metrics with standard resource attributes, configurable via env vars. See Exercises section for full details.

Exercise 2 — Structured JSON logging with trace correlation

Goal: every log line includes trace_id/span_id and request_id.

Exercise 3 — Prometheus HTTP latency histogram

Goal: expose http_server_request_duration_seconds with low-cardinality labels.

Self-check checklist

service.name, service.version, deployment.environment are set
Logs are JSON; include trace_id and span_id
/metrics has HELP/TYPE and expected histogram buckets
Prometheus can scrape without errors
Dashboards populate automatically for the new service

Common mistakes and how to self-check

Missing resource attributes
- Fix: set OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES
High-cardinality labels (e.g., http.path raw)
- Fix: use route templates (e.g., /users/:id)
Unstructured logs or mixed formats
- Fix: enforce JSON logger and disable non-JSON console prints
Sampling too aggressive
- Fix: start with 5–10% in prod; verify trace coverage during incidents
Forgot trace correlation in logs
- Fix: inject trace_id/span_id from context on every log call

Quick self-diagnostics

Emit a test request and search by trace_id across logs, traces, and metrics
Verify cardinalities in Prometheus for labels (count distinct values)
Check for HELP/TYPE in /metrics output

Who this is for

Platform Engineers building paved roads
Backend Engineers integrating telemetry into services
Tech Leads defining reliability standards

Prerequisites

Basic service development in one language (Node, Go, Java, Python, etc.)
Familiarity with Kubernetes or your deployment stack
Basic Prometheus and OpenTelemetry concepts

Learning path

Before: Logging fundamentals → Metrics basics → Tracing basics
Now: Standard Observability Integration (this lesson)
After: SLOs and alerting → Incident response playbooks → Observability platform ops

Practical projects

Golden Template: create a repo template that boots OTel, JSON logs, /metrics
Org Baseline Dashboards: publish Grafana dashboards that auto-populate by service.name
Trace Correlation Library: tiny middleware to inject request_id and map to trace_id

Mini challenge

Create a minimal new service from your template. In under 30 minutes, prove you can:

See a trace in your backend
Find the exact log line with the same trace_id
View the latency histogram for the endpoint you hit

Quick test

Take the quick test to confirm understanding. The test is available to everyone; sign in to save your progress.

Next steps

Roll the standard into CI checks (lint metric names, ensure JSON logs)
Publish a short integration guide with copy-paste snippets
Schedule a brown-bag to walk teams through the template

Menu

Standard Observability Integration

Table of Contents

Why this matters

Concept explained simply

Standards to apply (ready-to-use)

Worked examples

Example 1: Service-level OpenTelemetry (Node.js)

Example 2: Kubernetes scrape and logs

Example 3: HTTP latency histogram (Prometheus)

How to implement (team playbook)

Exercises (practice)

Self-check checklist

Common mistakes and how to self-check

Who this is for

Prerequisites

Learning path

Practical projects

Mini challenge

Quick test

Next steps

Practice Exercises

Instrument a service with OpenTelemetry and OTLP exporter

Instructions

Expected Output

Structured JSON logging with trace correlation

Prometheus HTTP latency histogram

Standard Observability Integration — Quick Test

Have questions about Standard Observability Integration?

AI Assistant