Why this matters
As an API Engineer, you must ensure your services stay fast and reliable when traffic grows or spikes unexpectedly. Load testing helps you:
- Validate SLOs (e.g., p95 latency & error rate) before launches.
- Catch bottlenecks early (DB connections, caches, thread pools, rate limits).
- Right-size infrastructure to control cost without risking outages.
- Prevent regressions during releases and dependency upgrades.
Concept explained simply
Load testing sends many requests to your API to see how it behaves under expected and extreme traffic. You watch latency, errors, and resource usage to decide if it meets your goals.
- Throughput: requests per second (RPS/QPS).
- Concurrency: how many requests are in flight at once.
- Latency: p50/p95/p99 response times.
- Error rate: HTTP 5xx/4xx (as applicable), timeouts.
- Saturation: resource pressure (CPU, memory, DB pool, queues).
Mental model
Imagine a highway. Cars = requests; lanes = server capacity; toll booths = DB/cache calls. When too many cars arrive, queues build and travel time (latency) spikes. Adding lanes (capacity) or speeding up toll booths (optimizations) restores flow.
Key metrics and practical thresholds
- Pick SLO targets such as: p95 < 200–400 ms and error rate < 1% at expected load; p95 < 500–800 ms at 2× spike. Values vary by product and latency class.
- Track p50/p95/p99, not averages. p99 shows tail latency which users feel during peak.
- Watch CPU, memory, GC pauses, DB pool wait time, cache hit rate, and downstream timeouts.
- Warm-up the system (JIT, caches, connection pools) before measuring.
Designing a test plan
- Define goals: e.g., handle 300 RPS with p95 < 300 ms and < 0.5% errors.
- Pick scenarios:
- Load test: steady expected traffic.
- Spike test: sudden jump in RPS.
- Stress test: push until failure to find the limit.
- Soak test: hours-long run to catch leaks and slow drifts.
- Choose endpoints and data: include read/write mix, common error cases, and realistic payload sizes.
- Environment: staging close to prod capacity and configuration. Never overload shared environments unintentionally.
- Observability: enable metrics, logs, and traces. Correlate API latency with downstream calls.
- Acceptance criteria: write explicit pass/fail thresholds before testing.
Worked examples
Example 1 — Steady load at 200 RPS for 10 minutes
Goal: GET /products at 200 RPS, p95 < 250 ms, error rate < 0.5%.
// k6 script (example)
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
scenarios: {
steady: {
executor: 'constant-arrival-rate',
rate: 200, timeUnit: '1s', duration: '10m', preAllocatedVUs: 50,
},
},
thresholds: {
http_req_failed: ['rate<0.005'],
http_req_duration: ['p(95)<250'],
},
};
export default function () {
const res = http.get('http://localhost:8080/products');
check(res, { 'status is 200': r => r.status === 200 });
sleep(0.01);
}
What to inspect: p95/p99 during and after warm-up; CPU around 60–75% at steady state; DB pool utilization not pegged at 100%.
Example 2 — Spike from 50 → 500 RPS in 60s
Goal: API remains stable; p95 < 500 ms; no cascading 5xx burst.
// Pseudo-config
stages = [
{ targetRPS: 50, duration: '2m' },
{ targetRPS: 500, duration: '1m' },
{ targetRPS: 100, duration: '3m' },
]
What to inspect: autoscaling delay, queue length spikes, circuit breaker trips, GC pauses, cold caches. If errors rise, confirm backpressure or rate limiting behaves gracefully.
Example 3 — 2-hour soak at 150 RPS
Goal: Detect memory leaks, handle rotation, token refresh, and slow drifts.
- Memory: flat or periodic but bounded. No steady upward slope.
- Latency: stable p95; no long-term creep.
- Errors: near zero; retry logic not masking real failures.
A minimal local API to test against (optional)
If you need a target, you can run this simple Python Flask API locally:
from flask import Flask, jsonify, request
import time
app = Flask(__name__)
@app.get('/products')
def products():
time.sleep(0.05) # simulate 50ms work
return jsonify({'items': [1,2,3]})
@app.get('/healthz')
def health():
return 'ok', 200
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8080)
Exercises
Do these hands-on tasks. Note: the Quick Test at the end is available to everyone; only logged-in users have their progress saved.
Exercise 1 — Build a steady-load test with thresholds
Mirror of Exercise ex1 below.
- Run the minimal API above on port 8080 (or use your own).
- Create a 5-minute constant-arrival load at 100 RPS against GET /products.
- Set thresholds: p95 < 250 ms, error rate < 1%.
- Record p50/p95/p99, error rate, and CPU.
- Deliverable: a short note with measured p95 and whether thresholds passed.
Exercise 2 — Find the knee of the curve (capacity estimate)
Mirror of Exercise ex2 below.
- Run a step-load: 50 → 100 → 150 → 200 → 300 RPS, 3 minutes each.
- Mark the first step where p95 exceeds 500 ms or errors > 1%.
- Estimate sustainable RPS (last good step) and list top 3 bottlenecks you observed.
- Deliverable: a table or bullet list with each step’s p95, error rate, and your capacity estimate.
Pre-run checklist
- Clear goals and pass/fail thresholds are written down.
- Safe environment and limits confirmed (don’t overload shared systems).
- Realistic data sizes; authentication tokens valid; warm-up included.
- Metrics/tracing/logging dashboards open; time synchronized.
- Rollback or stop condition defined (e.g., error rate > 5% for 2 minutes).
Common mistakes and self-check
- Using concurrency as a proxy for RPS: Self-check by measuring actual RPS from the tool’s report.
- No warm-up: Run a short pre-test to prime caches and JIT; compare first-minute latency vs steady state.
- Unrealistic data: Use varied payload sizes and IDs to avoid cache-only hits.
- Ignoring downstreams: Observe DB, cache, and external APIs; slow dependencies dominate tail latency.
- Testing only GETs: Include writes and mixed traffic if your API does both.
- Chasing p50: Make decisions on p95/p99, not averages.
Practical projects
- Build a load test suite for your top 3 endpoints with steady, spike, and soak scenarios.
- Add automated thresholds to fail CI when p95 or error rate exceed limits.
- Create a runbook: how to execute tests, read metrics, and roll back safely.
Learning path
- Learn core metrics (RPS, latency percentiles, error classes).
- Design test plans with acceptance criteria.
- Run steady-load tests; add spike and stress scenarios.
- Introduce observability and tracing to localize bottlenecks.
- Automate in CI with thresholds and small smoke tests.
Who this is for
- API Engineers and Backend Developers building or operating HTTP/JSON or gRPC services.
- SREs and Platform Engineers validating performance and reliability goals.
Prerequisites
- Basic HTTP knowledge (methods, status codes, headers).
- Ability to run a local API service.
- Familiarity with any load tool (e.g., k6, Locust, JMeter) helps but is optional.
Next steps
- Parameterize tests with environment variables and datasets.
- Add distributed runs to simulate global traffic.
- Link test runs to build artifacts to catch regressions over time.
Mini challenge
In a single page, propose SLOs and a 3-scenario test plan (load, spike, soak) for a public read-heavy API and a separate write-heavy admin API. Include pass/fail thresholds and monitoring signals for each.
Quick Test
Take the quick test below to check your understanding. Anyone can take it; only logged-in users will have their results saved.