How to learn Instrumentation Libraries And SDKs for Observability Platform in Platform Engineer for free

Why this matters

Platform Engineers often need to standardize how services emit telemetry so teams can debug reliably and operate at scale. Instrumentation libraries and SDKs are the fastest, safest way to capture traces, metrics, and logs consistently across languages and frameworks.

Enable production-safe visibility with sampling and stable resource attributes.
Help teams adopt automatic instrumentation quickly, then add manual spans where it counts.
Standardize exporters and context propagation so requests are traceable across microservices.

Concept explained simply

Instrumentation libraries and SDKs are building blocks you add to code or runtimes to collect telemetry:

Traces: the journey of a request; made of spans (units of work).
Metrics: numeric time series (counters, gauges, histograms).
Logs: structured events with context (e.g., trace_id, span_id).

Automatic instrumentation hooks into frameworks (HTTP, DB) to emit spans/metrics without touching your code. Manual instrumentation adds custom spans, attributes, and metrics around critical business logic.

Mental model

Think of telemetry as a labeled package moving through a delivery network:

Resource: the return address (service.name, version, env).
Context propagation: the tracking number that follows the package (traceparent).
SDK: the packing machine that shapes and labels the package.
Exporter: the truck that sends it to your observability backend.

Core building blocks

Resource attributes: service.name, service.version, deployment.env.
Span: name, start/end time, attributes, status, events, links.
Trace: a tree of spans sharing a trace_id.
Context propagation: W3C Trace Context or B3; inject/extract into headers.
Metric types: counter (monotonic), updowncounter, gauge (observed), histogram (latency, sizes).
Logs: structured, include trace_id/span_id for correlation.
Exporters: OTLP, console, file; batch processors for performance.
Sampling: head (before creation) or tail (after collection); balance cost vs detail.

Tip: naming spans and metrics

Span names: verb + resource (e.g., GET /orders, db.query SELECT).
Attribute keys: stable and low-cardinality (http.route, db.system, user.tier).
Metric names: nouns with units (requests.duration.ms, queue.depth).

Worked examples

Example 1: Python API with auto + manual tracing

# Install: pip install opentelemetry-sdk opentelemetry-api opentelemetry-instrumentation-requests opentelemetry-exporter-console

import time
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.trace import Status, StatusCode

resource = Resource.create({
    'service.name': 'checkout-api',
    'service.version': '1.2.0',
    'deployment.environment': 'dev'
})

provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer('checkout-api')

# Manual span around a payment step
with tracer.start_as_current_span('payment.authorize') as span:
    span.set_attribute('payment.provider', 'demo')
    time.sleep(0.05)
    span.set_status(Status(StatusCode.OK))
    print('Authorized')

What you get: clear spans with attributes, correlated under one trace when called within a propagated context.

Example 2: Node.js worker metrics with histogram

// Install: npm i @opentelemetry/sdk-metrics @opentelemetry/api @opentelemetry/exporter-metrics-otlp-proto @opentelemetry/exporter-metrics-stdout
const { MeterProvider } = require('@opentelemetry/sdk-metrics');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
const { ConsoleMetricExporter } = require('@opentelemetry/exporter-metrics-stdout');

const provider = new MeterProvider({
  resource: { attributes: { 'service.name': 'billing-worker', 'deployment.environment': 'dev' } }
});
provider.addMetricReader(new PeriodicExportingMetricReader({ exporter: new ConsoleMetricExporter(), exportIntervalMillis: 2000 }));

const meter = provider.getMeter('billing');
const duration = meter.createHistogram('jobs.duration.ms', { description: 'Job execution time' });

function simulateJob() {
  const ms = Math.floor(Math.random() * 120) + 30;
  duration.record(ms, { job_type: 'invoice' });
}

setInterval(simulateJob, 300);

What you get: periodic histogram exports summarizing job durations, ready to alert on latency shifts.

Example 3: Java manual span with log correlation

// Gradle deps (conceptual): opentelemetry-sdk, opentelemetry-api, exporter-logging
Tracer tracer = GlobalOpenTelemetry.getTracer("orders");
Span span = tracer.spanBuilder("orders.recalculate")
  .setAttribute("tenant", "gold")
  .startSpan();
try (Scope s = span.makeCurrent()) {
  // include trace ids in logs
  String traceId = span.getSpanContext().getTraceId();
  String spanId = span.getSpanContext().getSpanId();
  System.out.println("trace_id=" + traceId + " span_id=" + spanId + " msg=Recalculation started");
  // work ...
  span.setStatus(StatusCode.OK);
} catch (Exception e) {
  span.recordException(e);
  span.setStatus(StatusCode.ERROR);
} finally {
  span.end();
}

What you get: logs that carry trace and span IDs, enabling click-through correlation in your backend.

Choosing libraries and SDKs

Support for your language/runtime and popular frameworks in your org.
Exporters you need (OTLP recommended), plus console/file for local use.
Performance: batch processors, async I/O, metrics readers.
Config via env vars for consistent deployments.
Stability and semantic conventions that match your standards.

Safe defaults to start

OTLP exporter over gRPC or HTTP.
BatchSpanProcessor with modest queue size.
Head sampling at 5–10% for web traffic; 100% in non-prod.
Resource attributes: service.name, service.version, deployment.environment.

Implementation steps

Define resource attributes
Decide on service.name, service.version, deployment.environment. Keep them consistent.
Enable auto-instrumentation
Attach language agents or initialize framework instrumentations for HTTP, DB, messaging.
Add manual spans/metrics
Wrap critical paths (checkout, payment, cache miss) with spans and attributes.
Configure exporters
Start with console exporter locally, then switch to OTLP for shared backends.
Set sampling
Pick head sampling rates; reserve tail sampling for advanced backends.
Propagate context
Use W3C Trace Context across services; test with cross-service requests.
Harden and ship
Add timeouts/retries on exporters, and confirm graceful shutdown flush.

Exercises

Complete these hands-on tasks. Solutions are collapsible. Aim to run locally with console exporters.

[ex1] Add basic tracing with an SDK and auto-instrumentation.
[ex2] Emit a custom histogram metric with exemplars or attributes.

[ex1] Add basic tracing with an SDK and auto-instrumentation

Goal: Produce a trace for an HTTP request with a custom child span and attributes.

Install Python packages: opentelemetry-sdk, opentelemetry-api, opentelemetry-exporter-console, opentelemetry-instrumentation-requests.
Create a simple handler that calls an external URL or sleeps.
Initialize a TracerProvider with resource attributes and a BatchSpanProcessor + ConsoleSpanExporter.
Create a parent span named 'http.request' and a child span 'work.step'.
Print the trace_id to stdout.

Expected output: console shows at least two spans in one trace with service.name=ex1-service and attributes step='parse'.

Show solution

# pip install opentelemetry-sdk opentelemetry-api opentelemetry-exporter-console opentelemetry-instrumentation-requests
import time
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

resource = Resource.create({
    'service.name': 'ex1-service',
    'deployment.environment': 'dev'
})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer('ex1')

with tracer.start_as_current_span('http.request') as parent:
    parent.set_attribute('http.route', '/demo')
    with tracer.start_as_current_span('work.step') as child:
        child.set_attribute('step', 'parse')
        time.sleep(0.02)
    tid = parent.get_span_context().trace_id
    print('trace_id=', format(tid, '032x'))

[ex2] Emit a custom histogram metric with attributes

Goal: Record request latency and export it to stdout.

In Node.js, install @opentelemetry/sdk-metrics and @opentelemetry/exporter-metrics-stdout.
Create a MeterProvider with resource attributes.
Create a histogram 'requests.duration.ms'.
Record three values with attribute http.route='/checkout'.
Verify console output shows histogram data with attributes.

Expected output: printed metric with name requests.duration.ms and attributes http.route=/checkout.

Show solution

// npm i @opentelemetry/sdk-metrics @opentelemetry/exporter-metrics-stdout
const { MeterProvider, PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
const { ConsoleMetricExporter } = require('@opentelemetry/exporter-metrics-stdout');

const provider = new MeterProvider({ resource: { attributes: { 'service.name': 'ex2-service', 'deployment.environment': 'dev' } } });
provider.addMetricReader(new PeriodicExportingMetricReader({ exporter: new ConsoleMetricExporter(), exportIntervalMillis: 1000 }));

const meter = provider.getMeter('ex2');
const h = meter.createHistogram('requests.duration.ms');
[45, 80, 120].forEach(v => h.record(v, { 'http.route': '/checkout' }));
setTimeout(() => process.exit(0), 1500);

Common mistakes and how to self-check

High-cardinality attributes (user_id, full URL query). Fix: keep only stable, low-cardinality keys.
Sampling misconfiguration (0% or 100% by accident). Fix: print current sampling rate at startup; add a health route that reports it.
Exporter blocking request threads. Fix: use batch/async exporters and timeouts.
Uncorrelated logs. Fix: inject trace_id/span_id into log context from the current span.
Duplicate spans from overlapping auto + manual instrumentation. Fix: prefer manual around business logic and disable overlapping auto hooks if needed.

Self-check checklist

Every service reports service.name, service.version, deployment.environment.
A cross-service call preserves the same trace_id end-to-end.
Metrics export without pausing request handling.
Logs include trace_id and span_id for sampled requests.
No attribute values explode in cardinality over time.

Practical projects

Monolith to microservices trace map: instrument two services, ensure a single trace flows across HTTP boundary. Deliverable: screenshot or description of a trace tree with at least 5 spans.
Latency SLO dashboard: emit requests.duration.ms histogram; compute p95 and alert when over threshold. Deliverable: metric output and alert condition YAML or description.
Log correlation rollout: inject trace ids into app logs across 2 languages. Deliverable: sample logs that share the same trace_id as a trace export.

Mini challenge

You see many spans named 'GET /{id}' with very high cardinality in an attribute 'user_id'. How do you fix this without losing useful detail?

Show hint/solution

Keep span name at route template (GET /items/{id}), not full path.
Remove user_id; replace with user.tier or auth.method (low cardinality).
If needed, add user_id only as an event on error spans, not as an attribute on all spans.

Learning path

Before: Observability fundamentals (traces, metrics, logs), W3C Trace Context basics.
Now: Instrumentation libraries and SDKs (this lesson).
Next: Collectors/agents, sampling strategies, semantic conventions, alerting and SLOs.

Who this is for

Platform Engineers standardizing telemetry across services.
Backend Engineers wiring visibility into APIs, jobs, and workers.

Prerequisites

Comfort with one programming language (Python, Node.js, Go, or Java).
Basic understanding of HTTP services and asynchronous jobs.
Familiarity with environment variables and service configuration.

Next steps

Instrument one service in dev using console exporters.
Add manual spans around a critical path.
Switch to OTLP exporter and verify traces flow through your collector/backend.
Roll out a standard resource attribute policy across repositories.

Quick Test — Access

Take the quick test to check your understanding. Everyone can take the test; only logged-in users get saved progress.

Menu

Instrumentation Libraries And SDKs

Table of Contents