Menu

Topic 3 of 8

Metrics RPS Errors Latency

Learn Metrics RPS Errors Latency for free with explanations, exercises, and a quick test (for API Engineer).

Published: January 21, 2026 | Updated: January 21, 2026

Why this matters

Every production API must answer three questions at all times: How much traffic are we serving (RPS)? How many requests fail (errors)? How fast are responses (latency)? These metrics guide on-call decisions, capacity planning, and user experience.

  • On-call triage: Spot traffic spikes or latency regressions before users complain.
  • Release safety: Compare RPS, error rate, and p95 latency before/after a deploy.
  • SLI/SLO tracking: Quantify reliability and speed with percentiles and budgets.
  • Capacity planning: Ensure headroom for peak RPS without tail latency explosions.

Who this is for

  • API Engineers and backend developers responsible for uptime and performance.
  • SREs adding alerts and SLOs to services.
  • Team leads needing crisp, shared language for incident review.

Prerequisites

  • Basic HTTP knowledge (status codes, request/response).
  • Comfort with simple math (rates, percentages, percentiles).
  • Familiarity with metrics concepts like counters, gauges, and histograms.

Concept explained simply

Three signals: the RED method

  • Rate (RPS): How many requests per second your API handles.
  • Errors: What fraction of requests fail (usually 5xx).
  • Duration (Latency): How long requests take, captured by percentiles (p50, p90, p95, p99).

Mental model

Imagine your service as a highway:

  • RPS = number of cars entering per second.
  • Errors = cars that break down on the road (5xx across the fleet).
  • Latency = travel time. Averages hide traffic jams; percentiles expose tail delays.
Deep dive: Counters, gauges, histograms, summaries
  • Counters: ever-increasing numbers (e.g., requests_total). Use rate() over time windows.
  • Gauges: current values (e.g., in-flight requests).
  • Histograms: bucketed observations; enable server-side percentiles and aggregation.
  • Summaries: compute percentiles locally; cannot be aggregated across instances.

Key definitions

  • RPS (requests per second): RPS = increase_in_requests / seconds_in_window.
  • Error rate: Usually 5xx / total_requests over a window. Treat 4xx separately (client behavior).
  • Latency percentiles: p50 = typical, p95 = slow-but-common, p99 = tail. Alerting on p99 is noisy; prefer multi-minute windows and burn-rate logic.
When to use which percentile?
  • p50: Median experience; useful for regressions, not for paging.
  • p95: Good balance for most end-user latency SLOs.
  • p99: Critical for low-latency products; use for dashboards and well-tuned alerts.

How to measure

From counters

// Over a 5-minute window (300s)
RPS = (requests_total[t_now] - requests_total[t_5m_ago]) / 300
Error rate = (requests_5xx_delta / requests_total_delta)

From histograms (percentiles)

Find the bucket that contains the desired percentile, then interpolate inside the bucket if possible.

// Example buckets (cumulative counts)
<=50ms: 1500
<=100ms: 4300
<=200ms: 9200
<=400ms: 9800
<=+Inf: 10000
// p95 target rank = 0.95 * 10000 = 9500
// 200ms bucket has 9200, 400ms has 9800 → p95 is between 200 and 400ms
fraction = (9500-9200)/(9800-9200) = 300/600 = 0.5
p95 ≈ 200ms + 0.5*(400-200) = 300ms

Worked examples

Example 1: Compute RPS and error rate

In 5 minutes, requests_total increased by 18,000; http_5xx increased by 270.

  1. RPS = 18,000 / 300 = 60 req/s.
  2. Error rate = 270 / 18,000 = 0.015 = 1.5%.

Example 2: Estimate p95 latency from buckets

Using the histogram in the previous section, p95 ≈ 300 ms.

Example 3: Check an SLO against observed data

SLO: p95 latency ≤ 250 ms over 30 minutes. Observed p95 = 300 ms for two consecutive 5-minute windows.

  • Result: SLO is not met for those windows; consider paging if breach is sustained (avoid single-window noise).

Common mistakes and self-check

  • Using average latency: Averages hide tail slowness. Prefer p95/p99.
  • Mixing 4xx with errors: 4xx reflect client behavior; usually exclude from server error rate.
  • Alerting on p99 spikes over 1 minute: Too noisy. Use multi-window burn-rate or sustained breaches.
  • Comparing different windows: Always state the window (e.g., 5m, 15m).
  • Percent vs percentage points: An increase from 1% to 2% is +1 percentage point, a 100% relative increase.
  • Not labeling units: Always include ms for latency, req/s for RPS.

Self-check

  • Do your metric names include units or are units documented?
  • Are counters correctly rate-converted over a fixed window?
  • Are you distinguishing 5xx (server) from 4xx (client) in alerts?
  • Do dashboards show p50, p95, and p99 side-by-side?

Exercises

These match the tasks in the exercise panel below. Try here first, then open the solutions when stuck.

Exercise 1: RPS and error rate

In the last 5 minutes: requests_total increased by 18,000; http_5xx increased by 270. Compute RPS and error rate.

Hint
  • 5 minutes = 300 seconds.
  • Error rate = 5xx_delta / total_delta.
Answer

RPS ≈ 60 req/s; error rate ≈ 1.5%.

Exercise 2: p95 from histogram

Cumulative bucket counts (ms): 50:1500, 100:4300, 200:9200, 400:9800, +Inf:10000. Estimate p95.

Hint
  • Find the bucket containing the 95th percentile rank.
  • Interpolate inside the bucket.
Answer

p95 ≈ 300 ms.

  • Checklist: Include units (ms, req/s).
  • Use fixed time windows (e.g., 5m) for rates.
  • Separate 5xx from 4xx.
  • Show p50, p95, p99 on one chart.

Practical projects

  • Instrument a simple HTTP endpoint with metrics: requests_total, requests_5xx_total, request_duration histogram (ms). Add labels for method, route, status.
  • Create a dashboard with three panels: RPS (5m rate), Error rate (5xx/total, 5m), Latency percentiles (p50/p95/p99).
  • Write two alerts: sustained error rate > 2x budget over 5m and 30m; p95 latency > 1.5x target over 10m.

Learning path

  • Next: Dashboards and alert rules (practical thresholds, burn-rate).
  • Then: Tracing basics to connect high latency to slow database calls.
  • Then: SLOs and error budgets for product-level commitments.
  • Optional: Capacity planning with queueing intuition for tail latency.

Next steps

  • Finish the exercises below.
  • Take the Quick Test (everyone can take it; only logged-in users get saved progress).
  • Apply one alert improvement to your service this week.

Mini challenge

In 10 minutes you served 120,000 requests; 5xx count = 600; p95 latency = 420 ms. Your targets: error rate ≤ 0.2%, p95 ≤ 350 ms. What’s your call?

Show solution
  • Error rate = 600/120,000 = 0.5% → above 0.2% target.
  • p95 = 420 ms → above 350 ms target.
  • Action: Page if both are sustained; roll back recent changes or enable feature flag mitigation.

Practice Exercises

2 exercises to complete

Instructions

You observed that over the last 5 minutes, requests_total increased by 18,000 and http_5xx increased by 270. Compute:

  • RPS (requests per second)
  • Error rate (percent)

Show your intermediate steps.

Expected Output
RPS ≈ 60 req/s; error rate ≈ 1.5%

Metrics RPS Errors Latency — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Metrics RPS Errors Latency?

AI Assistant

Ask questions about this tool