luvv to helpDiscover the Best Free Online Tools
Topic 2 of 7

Logs Metrics Traces Setup

Learn Logs Metrics Traces Setup for free with explanations, exercises, and a quick test (for MLOps Engineer).

Published: January 4, 2026 | Updated: January 4, 2026

Why this matters

MLOps teams run real-time model APIs and scheduled batch pipelines. When things break or drift, you need to know what, where, and why. A solid setup for logs, metrics, and traces gives you fast detection, clear root-cause signals, and confidence to ship changes.

  • Real tasks you will face: investigate latency spikes, correlate model version with error rate, track feature retrieval time, prove SLOs to stakeholders, and debug a single user’s failing request.
  • Outcome of this lesson: you will be able to instrument an ML service and pipeline with structured logs, consistent metrics, and distributed traces.

Concept explained simply

Logs, metrics, and traces each answer a different question:

  • Logs: exact event details (what happened). Free-form text or structured JSON lines.
  • Metrics: numeric time series (how it trends). Low cost, power dashboards and alerts.
  • Traces: request journey (where time is spent). Spans show call graph and timing.

Mental model

Imagine an airport:

  • Logs are incident reports for specific flights.
  • Metrics are the dashboard: flights per hour, delays, cancellations.
  • Traces are a passenger’s path through check-in, security, boarding.

Together, you get both the big picture and the exact steps when you need to zoom in.

Minimal stack blueprint

  • Emit JSON logs to stdout from your service. Collect with an agent (e.g., a log shipper or OpenTelemetry Collector).
  • Expose Prometheus/OpenMetrics at /metrics with key counters/gauges/histograms.
  • Instrument distributed traces with OpenTelemetry SDK. Export to a collector and a backend that supports traces.
Choose a practical baseline
  • Collector/agent: OpenTelemetry Collector for logs, metrics, and traces.
  • Metrics scraping: Prometheus-compatible scraper.
  • Visualization: any dashboarding tool compatible with your metrics and traces backend.

Worked examples

Example 1 — Logs (structured, JSON, contextual)

Goal: structured JSON logs for inference requests with key fields.

{"ts":"2025-06-01T12:34:56Z","level":"INFO","service":"recommender","route":"/predict","model_name":"reco_v2","model_version":"2.3.1","request_id":"9f2c...","latency_ms":42,"status":"ok","features_count":128}

Tips:

  • Include model_name, model_version, request_id, user_segment (coarse), status, latency_ms.
  • Avoid high-cardinality PII like user_id; prefer hashed or segmented values.

Example 2 — Metrics (SLO-ready)

Expose these metrics at /metrics:

# HELP inference_requests_total Count of inference requests
# TYPE inference_requests_total counter
inference_requests_total{model_name="reco",model_version="2.3.1",status="ok"} 102934
inference_requests_total{model_name="reco",model_version="2.3.1",status="error"} 142

# HELP inference_latency_seconds Inference latency
# TYPE inference_latency_seconds histogram
inference_latency_seconds_bucket{model_name="reco",model_version="2.3.1",le="0.05"} 50231
inference_latency_seconds_bucket{model_name="reco",model_version="2.3.1",le="0.1"} 78900
inference_latency_seconds_bucket{model_name="reco",model_version="2.3.1",le="+Inf"} 104000
inference_latency_seconds_sum{model_name="reco",model_version="2.3.1"} 6100
inference_latency_seconds_count{model_name="reco",model_version="2.3.1"} 104000

# HELP feature_fetch_seconds Time to fetch features
# TYPE feature_fetch_seconds summary
feature_fetch_seconds{source="redis"} 0.012

# HELP in_flight_requests Current in-flight requests
# TYPE in_flight_requests gauge
in_flight_requests 3

Labels to include: model_name, model_version, status. Keep labels low-cardinality.

Example 3 — Traces (end-to-end)

Instrument spans across the path: HTTP request → feature store → model inference → post-processing.

Trace 7e1c... (root span: POST /predict, 120ms)
  ├─ Span: feature_store.get_features (Redis) 40ms
  ├─ Span: model.infer (ONNXRuntime) 55ms
  └─ Span: postprocess.serialize 10ms
Attributes: model_name=reco, model_version=2.3.1, request_id=9f2c..., user_segment=premium

Propagate context via headers such as traceparent so downstream services continue the trace.

Step-by-step setup (works locally, on VMs, or in Kubernetes)

Step 1 — Emit structured logs

# Python example (FastAPI): structured JSON to stdout
import json, time
from fastapi import FastAPI, Request
app = FastAPI()

@app.get("/predict")
async def predict(request: Request):
    start = time.time()
    # ... run inference ...
    latency_ms = int((time.time() - start) * 1000)
    log = {
      "ts": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
      "level": "INFO",
      "service": "recommender",
      "route": "/predict",
      "model_name": "reco",
      "model_version": "2.3.1",
      "request_id": request.headers.get("x-request-id", "-"),
      "latency_ms": latency_ms,
      "status": "ok"
    }
    print(json.dumps(log))
    return {"ok": True, "latency_ms": latency_ms}

Step 2 — Expose metrics

# Python Prometheus client
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from fastapi import Response

REQS = Counter('inference_requests_total','requests', ['model_name','model_version','status'])
LAT = Histogram('inference_latency_seconds','latency',['model_name','model_version'], buckets=[0.01,0.025,0.05,0.1,0.25,0.5,1,2,5])
INFLIGHT = Gauge('in_flight_requests','concurrent requests')

@app.get("/metrics")
def metrics():
    return Response(generate_latest(), media_type="text/plain; version=0.0.4")

@app.get("/predict")
async def predict(request: Request):
    INFLIGHT.inc()
    with LAT.labels('reco','2.3.1').time():
        # ... inference ...
        REQS.labels('reco','2.3.1','ok').inc()
    INFLIGHT.dec()
    return {"ok": True}

Step 3 — Add traces

# Python OpenTelemetry minimal setup
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
# Configure exporter to your collector endpoint

provider = TracerProvider()
trace.set_tracer_provider(provider)
# provider.add_span_processor(BatchSpanProcessor(YourExporter(...)))
FastAPIInstrumentor.instrument_app(app)

@app.get("/predict")
async def predict(request: Request):
    tracer = trace.get_tracer("recommender")
    with tracer.start_as_current_span("model.infer") as span:
        span.set_attribute("model_name","reco")
        span.set_attribute("model_version","2.3.1")
        # ... inference ...
    return {"ok": True}

Step 4 — Collector config (single pipeline for all signals)

# OpenTelemetry Collector (conceptual snippet)
receivers:
  otlp:
    protocols:
      http:
      grpc:
  prometheus:
    config:
      scrape_configs:
        - job_name: ml-service
          static_configs:
            - targets: ['ml-service:8000']
  filelog:
    include: [/var/log/containers/*.log]

processors:
  batch: {}
  attributes:
    actions:
      - key: service.name
        value: recommender
        action: insert

exporters:
  # Replace with your chosen backends for logs/metrics/traces
  otlp:
    endpoint: your-trace-metrics-endpoint:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes]
      exporters: [otlp]
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [otlp]
    logs:
      receivers: [filelog]
      processors: [batch, attributes]
      exporters: [otlp]

Step 5 — Sampling strategy

  • Start with head sampling (e.g., 5–10%) for traces to control cost.
  • Add tail-based sampling rules to keep slow or error traces.

Step 6 — SLOs and alerts

  • Availability: error rate ≤ 1% over 30d.
  • Latency: 95% of requests under 150 ms.
# Example alert rules (conceptual)
# Error burn-alert (fast)
- alert: HighErrorRateFastBurn
  expr: sum by (model_name) (rate(inference_requests_total{status="error"}[5m]))
        / sum by (model_name) (rate(inference_requests_total[5m])) > 0.05
  for: 5m
  labels: {severity: critical}
  annotations: {summary: "High error rate (5m)"}

# Latency: P95 beyond target
- alert: LatencyP95High
  expr: histogram_quantile(0.95, sum by (le, model_name) (rate(inference_latency_seconds_bucket[5m]))) > 0.15
  for: 10m
  labels: {severity: warning}
  annotations: {summary: "P95 latency > 150ms (10m)"}

Step 7 — Dashboards that answer questions

  • Overview: requests, error %, P95 latency by model_version.
  • Deep dive: feature_fetch_seconds vs model.infer duration (traces).
  • Release view: compare last 2 model versions side-by-side.

Common mistakes and self-check

  • Mistake: unstructured logs. Fix: emit JSON with consistent keys.
  • Mistake: too many labels (high cardinality). Fix: limit to model_name, model_version, status; aggregate user attributes.
  • Mistake: only averages. Fix: use histograms and quantiles (P95/P99).
  • Mistake: missing trace propagation. Fix: ensure traceparent is forwarded across hops.
  • Mistake: alert fatigue. Fix: SLO-based, multi-window burn-rate alerts.

Self-check:

  • [ ] Can you search a request by request_id across logs and traces?
  • [ ] Can you show P95 latency and error rate per model_version?
  • [ ] Do slow or error requests always produce a trace sample?

Exercises

Do these hands-on tasks. They mirror the exercises below. You can validate locally using any HTTP client and your metrics endpoint.

  1. Exercise 1: Instrument a toy FastAPI model API with JSON logs, Prometheus metrics, and OpenTelemetry traces. Prove that you can search logs by request_id, view inference_requests_total, and see spans for model.infer.
  2. Exercise 2: Create two alert rules: (a) fast-burn error alert using 5m rates; (b) P95 latency alert based on histogram quantile. Test by injecting errors and artificial latency.
  • [ ] Logs show model_name and model_version for every request
  • [ ] Metrics endpoint exports counters and histograms
  • [ ] Traces span across feature fetch and model inference
  • [ ] Alerts fire on induced errors/latency and then resolve

Practical projects

  • Blue/Green model rollout: compare metrics and traces between v1 and v2, auto-rollback if P95 latency regresses by 20%.
  • Data pipeline observability: instrument a batch job with logs for row counts and metrics for job duration; trace sub-steps (extract, transform, load).
  • Cost-aware sampling: implement head sampling at 10% plus tail rules for errors and spans > 500 ms; verify coverage.

Who this is for

  • MLOps engineers, platform engineers, and ML engineers operating models in production.
  • Data engineers adding observability to ML pipelines.

Prerequisites

  • Comfort with a service framework (e.g., FastAPI/Flask) and Python.
  • Basic understanding of containers or Linux services.
  • Familiarity with Prometheus-style metrics and OpenTelemetry basics helps.

Learning path

  1. Start: structured logging and consistent fields.
  2. Add: Prometheus/OpenMetrics with counters, gauges, histograms.
  3. Introduce: tracing with OpenTelemetry and context propagation.
  4. Define: SLOs and alerts (burn-rate, latency quantiles).
  5. Harden: dashboards, sampling strategy, cost control.

Mini challenge

Ship a new model version and prove with metrics and traces that it improves P95 latency by 10% and does not increase error rate. Provide one screenshot-equivalent description: metric query used, quantile chosen, and example trace showing the change in model.infer span duration.

Next steps

  • Automate: add instrumentation to your service template so every new model starts observable by default.
  • Governance: document required fields for logs/metrics/traces across all ML services.
  • Reliability: add synthetic checks that call your API and validate end-to-end spans.

Quick Test

This quick test is available to everyone. Log in to save your progress.

Practice Exercises

2 exercises to complete

Instructions

Create a small FastAPI service with:

  • JSON logs for each request including model_name, model_version, request_id, status, latency_ms.
  • Prometheus metrics at /metrics with counter inference_requests_total{status}, histogram inference_latency_seconds, and gauge in_flight_requests.
  • OpenTelemetry traces with spans for feature fetching and model.infer.

Trigger 20 requests, 2 of which return errors and 3 slowed with a 300 ms sleep.

Expected Output
1) Logs: 20 lines, each JSON, with fields present; 2 lines should show status=error; 3 lines should show latency_ms >= 300. 2) Metrics: inference_requests_total with counts matching ok/error; histogram shows increased buckets for slower requests. 3) Traces: each slow or error request has a retained trace with spans: feature_store.get_features and model.infer.

Logs Metrics Traces Setup — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Logs Metrics Traces Setup?

AI Assistant

Ask questions about this tool