How to learn Metrics RPS Errors Latency for Observability And Monitoring in API Engineer for free

Why this matters

Every production API must answer three questions at all times: How much traffic are we serving (RPS)? How many requests fail (errors)? How fast are responses (latency)? These metrics guide on-call decisions, capacity planning, and user experience.

On-call triage: Spot traffic spikes or latency regressions before users complain.
Release safety: Compare RPS, error rate, and p95 latency before/after a deploy.
SLI/SLO tracking: Quantify reliability and speed with percentiles and budgets.
Capacity planning: Ensure headroom for peak RPS without tail latency explosions.

Who this is for

API Engineers and backend developers responsible for uptime and performance.
SREs adding alerts and SLOs to services.
Team leads needing crisp, shared language for incident review.

Prerequisites

Basic HTTP knowledge (status codes, request/response).
Comfort with simple math (rates, percentages, percentiles).
Familiarity with metrics concepts like counters, gauges, and histograms.

Concept explained simply

Three signals: the RED method

Rate (RPS): How many requests per second your API handles.
Errors: What fraction of requests fail (usually 5xx).
Duration (Latency): How long requests take, captured by percentiles (p50, p90, p95, p99).

Mental model

Imagine your service as a highway:

RPS = number of cars entering per second.
Errors = cars that break down on the road (5xx across the fleet).
Latency = travel time. Averages hide traffic jams; percentiles expose tail delays.

Deep dive: Counters, gauges, histograms, summaries

Counters: ever-increasing numbers (e.g., requests_total). Use rate() over time windows.
Gauges: current values (e.g., in-flight requests).
Histograms: bucketed observations; enable server-side percentiles and aggregation.
Summaries: compute percentiles locally; cannot be aggregated across instances.

Key definitions

RPS (requests per second): RPS = increase_in_requests / seconds_in_window.
Error rate: Usually 5xx / total_requests over a window. Treat 4xx separately (client behavior).
Latency percentiles: p50 = typical, p95 = slow-but-common, p99 = tail. Alerting on p99 is noisy; prefer multi-minute windows and burn-rate logic.

When to use which percentile?

p50: Median experience; useful for regressions, not for paging.
p95: Good balance for most end-user latency SLOs.
p99: Critical for low-latency products; use for dashboards and well-tuned alerts.

How to measure

From counters

// Over a 5-minute window (300s)
RPS = (requests_total[t_now] - requests_total[t_5m_ago]) / 300
Error rate = (requests_5xx_delta / requests_total_delta)

From histograms (percentiles)

Find the bucket that contains the desired percentile, then interpolate inside the bucket if possible.

// Example buckets (cumulative counts)
<=50ms: 1500
<=100ms: 4300
<=200ms: 9200
<=400ms: 9800
<=+Inf: 10000
// p95 target rank = 0.95 * 10000 = 9500
// 200ms bucket has 9200, 400ms has 9800 → p95 is between 200 and 400ms
fraction = (9500-9200)/(9800-9200) = 300/600 = 0.5
p95 ≈ 200ms + 0.5*(400-200) = 300ms

Worked examples

Example 1: Compute RPS and error rate

In 5 minutes, requests_total increased by 18,000; http_5xx increased by 270.

RPS = 18,000 / 300 = 60 req/s.
Error rate = 270 / 18,000 = 0.015 = 1.5%.

Example 2: Estimate p95 latency from buckets

Using the histogram in the previous section, p95 ≈ 300 ms.

Example 3: Check an SLO against observed data

SLO: p95 latency ≤ 250 ms over 30 minutes. Observed p95 = 300 ms for two consecutive 5-minute windows.

Result: SLO is not met for those windows; consider paging if breach is sustained (avoid single-window noise).

Common mistakes and self-check

Using average latency: Averages hide tail slowness. Prefer p95/p99.
Mixing 4xx with errors: 4xx reflect client behavior; usually exclude from server error rate.
Alerting on p99 spikes over 1 minute: Too noisy. Use multi-window burn-rate or sustained breaches.
Comparing different windows: Always state the window (e.g., 5m, 15m).
Percent vs percentage points: An increase from 1% to 2% is +1 percentage point, a 100% relative increase.
Not labeling units: Always include ms for latency, req/s for RPS.

Self-check

Do your metric names include units or are units documented?
Are counters correctly rate-converted over a fixed window?
Are you distinguishing 5xx (server) from 4xx (client) in alerts?
Do dashboards show p50, p95, and p99 side-by-side?

Exercises

These match the tasks in the exercise panel below. Try here first, then open the solutions when stuck.

Exercise 1: RPS and error rate

In the last 5 minutes: requests_total increased by 18,000; http_5xx increased by 270. Compute RPS and error rate.

Hint

5 minutes = 300 seconds.
Error rate = 5xx_delta / total_delta.

Answer

RPS ≈ 60 req/s; error rate ≈ 1.5%.

Exercise 2: p95 from histogram

Cumulative bucket counts (ms): 50:1500, 100:4300, 200:9200, 400:9800, +Inf:10000. Estimate p95.

Hint

Find the bucket containing the 95th percentile rank.
Interpolate inside the bucket.

Answer

p95 ≈ 300 ms.

Checklist: Include units (ms, req/s).
Use fixed time windows (e.g., 5m) for rates.
Separate 5xx from 4xx.
Show p50, p95, p99 on one chart.

Practical projects

Instrument a simple HTTP endpoint with metrics: requests_total, requests_5xx_total, request_duration histogram (ms). Add labels for method, route, status.
Create a dashboard with three panels: RPS (5m rate), Error rate (5xx/total, 5m), Latency percentiles (p50/p95/p99).
Write two alerts: sustained error rate > 2x budget over 5m and 30m; p95 latency > 1.5x target over 10m.

Learning path

Next: Dashboards and alert rules (practical thresholds, burn-rate).
Then: Tracing basics to connect high latency to slow database calls.
Then: SLOs and error budgets for product-level commitments.
Optional: Capacity planning with queueing intuition for tail latency.

Next steps

Finish the exercises below.
Take the Quick Test (everyone can take it; only logged-in users get saved progress).
Apply one alert improvement to your service this week.

Mini challenge

In 10 minutes you served 120,000 requests; 5xx count = 600; p95 latency = 420 ms. Your targets: error rate ≤ 0.2%, p95 ≤ 350 ms. What’s your call?

Show solution

Error rate = 600/120,000 = 0.5% → above 0.2% target.
p95 = 420 ms → above 350 ms target.
Action: Page if both are sustained; roll back recent changes or enable feature flag mitigation.

Menu

Metrics RPS Errors Latency

Table of Contents