Why Performance and Reliability matter for API Engineers
Fast, dependable APIs unlock product features, protect uptime, and reduce costs. As an API Engineer, you translate business goals (like a 99.9% uptime or 300 ms p95 latency) into concrete engineering choices: timeouts, retries, caching, pooling, rate limits, and safe fallbacks. Done right, you prevent cascading failures, keep tail latency under control, and make on-call life calmer.
Who this is for
- API and backend engineers building or maintaining web services.
- Developers moving from single-service apps to distributed systems.
- Engineers preparing for production ownership and on-call.
Prerequisites
- Comfort with one backend language (e.g., Java, Go, Python, Node.js) and HTTP basics.
- Familiarity with REST or RPC, JSON, and API authentication.
- Basic database knowledge (transactions, indexes) and HTTP caching concepts.
Learning path
- Define goals: SLAs/SLOs, error budgets, latency budgets.
- Stabilize the request path: timeouts, retries with backoff + jitter, idempotency/dedup.
- Protect systems: rate limiting, circuit breaking, bulkheads.
- Reduce work: connection pooling, caching headers/ETags.
- Validate with load tests: baseline, find bottlenecks, tune, and re-test.
Tip: Start with budgets, not tools
Budgets force trade-offs. If you need p95 < 300 ms, you must give each hop and each dependency a slice of that time and set timeouts accordingly.
Key concepts at a glance
- Latency Budgets & SLAs: Translate top-level latency/availability goals into per-hop budgets.
- Caching & ETags: Save compute and bandwidth with correct cache headers and conditional requests.
- Request Deduplication: Prevent duplicate work from retries and client refreshes.
- Connection Pooling: Reuse TCP/DB connections to cut latency and resource churn.
- Rate Limiting: Protect upstreams and ensure fair use.
- Retries, Backoff, Timeouts: Fail fast, try again safely, and avoid thundering herds.
- Circuit Breakers: Stop hammering unhealthy dependencies and recover gracefully.
- Load Testing: Quantify throughput, latency, and failure modes before production.
Worked examples
1) Latency budget decomposition
Goal: p95 latency <= 300 ms for GET /orders/{id}
- Ingress + auth: 30 ms
- Cache lookup: 10 ms
- DB read: 120 ms
- Downstream service (pricing): 80 ms
- Serialization + network: 40 ms
- Total: 280 ms (p95), leaves ~20 ms headroom
Set timeouts per hop a bit under budget to prevent queue buildup:
- DB timeout: 100–120 ms
- Pricing timeout: 70–80 ms, with a default price fallback
Fallback shape
Return partial data with a degraded: true flag if pricing times out, and include a Retry-After hint if appropriate.
2) Exponential backoff with jitter (Node.js)
async function requestWithBackoff(fetchFn, {attempts=4, baseMs=100, maxMs=2000, timeoutMs=300}) {
function sleep(ms){ return new Promise(r => setTimeout(r, ms)); }
for (let i=0; i<attempts; i++) {
const controller = new AbortController();
const t = setTimeout(() => controller.abort(), timeoutMs);
try {
const res = await fetchFn({ signal: controller.signal });
clearTimeout(t);
if (res.ok) return res; // 2xx
if (res.status >= 500 || res.status === 429) { /* retryable */ }
else throw new Error(`Non-retryable status ${res.status}`);
} catch (e) {
clearTimeout(t);
const backoff = Math.min(maxMs, baseMs * 2 ** i);
const jitter = Math.random() * backoff * 0.3; // 30% jitter
if (i === attempts - 1) throw e;
await sleep(backoff + jitter);
}
}
}
Notes: set a per-attempt timeout; retry only on safe/Retry-After/5xx/429 responses; use jitter to avoid synchronized retries.
3) Conditional caching with ETag
// Pseudocode for a read endpoint
function getOrder(req, res) {
const order = db.find(req.params.id); // serialize deterministically
const body = JSON.stringify(order);
const etag = hash(body); // e.g., SHA-256 of body
const inm = req.headers['if-none-match'];
if (inm && inm === etag) {
res.statusCode = 304;
res.setHeader('ETag', etag);
return res.end();
}
res.setHeader('Cache-Control', 'public, max-age=60');
res.setHeader('ETag', etag);
res.setHeader('Vary', 'Authorization'); // if response may vary by user
res.statusCode = 200;
res.end(body);
}
Result: Clients can skip re-downloading unchanged content, reducing bandwidth and load.
4) Token bucket rate limiting (pseudocode)
class TokenBucket {
constructor(ratePerSec, burst){
this.rate = ratePerSec; this.burst = burst;
this.tokens = burst; this.last = Date.now();
}
allow(){
const now = Date.now();
const delta = (now - this.last) / 1000;
this.tokens = Math.min(this.burst, this.tokens + delta * this.rate);
this.last = now;
if (this.tokens >= 1) { this.tokens -= 1; return true; }
return false;
}
}
// Use per API key, IP, or user ID. On deny: return 429 with Retry-After.
5) Circuit breaker skeleton (pseudocode)
class Circuit {
constructor({failThreshold=5, halfOpenAfterMs=10000}){
this.state='CLOSED'; this.failures=0; this.openedAt=0;
this.failThreshold=failThreshold; this.halfOpenAfterMs=halfOpenAfterMs;
}
canPass(){
if (this.state==='OPEN') {
if (Date.now()-this.openedAt > this.halfOpenAfterMs) {
this.state='HALF_OPEN'; return true; // trial request
}
return false;
}
return true;
}
recordSuccess(){ this.failures=0; this.state='CLOSED'; }
recordFailure(){
this.failures++;
if (this.failures >= this.failThreshold) {
this.state='OPEN'; this.openedAt=Date.now();
}
}
}
Use with a fallback response and metrics to observe trips and recoveries.
6) DB connection pool tuning (example)
// Example: PostgreSQL Node.js (pg)
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // total connections in pool
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000
});
// Rule of thumb: total app pools across services should not exceed DB limits.
Rule of thumb
Start with 4–8 connections per CPU core per service instance, then load test and adjust to prevent DB saturation.
Drills and exercises
- [ ] Compute a latency budget for one endpoint (p95 = 300 ms). Assign budgets per hop and set timeouts.
- [ ] Implement retries with exponential backoff + full jitter. Verify it never retries on 4xx (except 409/429).
- [ ] Add ETag support to one read endpoint. Validate 304 flow with curl.
- [ ] Add idempotency keys to a POST endpoint. Prove duplicate requests return the same result.
- [ ] Introduce a token bucket rate limiter per API key with 429 + Retry-After.
- [ ] Configure DB/HTTP connection pooling and measure cold vs warm latency.
- [ ] Write a simple load script (curl loop or a small program) to hit 50–200 RPS. Chart p50/p95/p99.
Mini project: Resilient aggregator API
Build an endpoint GET /aggregate that calls two downstream services, merges results, and serves cached responses for 60s.
- Set a 300 ms p95 latency budget and per-hop timeouts.
- Use connection pools for outbound HTTP/DB where relevant.
- Implement retries with backoff + jitter on 5xx/429; use a circuit breaker for each downstream.
- Return partial results if one service times out; include
degraded: true. - Add ETag + Cache-Control to GET /aggregate responses.
- Protect with a token bucket rate limiter per client.
- Load test at increasing RPS until p95 exceeds budget. Tune pool sizes and timeouts, then re-test.
Simple bash load loop
for i in {1..2000}; do curl -s -o /dev/null -w "%{http_code}\n" http://localhost:3000/aggregate & done; waitRun several loops in parallel to approximate load. Capture latency with -w format options.
Common mistakes
- Missing timeouts on outbound calls, causing thread/connection starvation.
- Aggressive retries without jitter, amplifying outages (retry storms).
- Retrying non-idempotent operations without idempotency keys, creating duplicates.
- Using global cache for user-specific data without Vary headers or cache key scoping.
- Unbounded connection pools that overwhelm databases.
- Rate limiting only at the edge, not per-tenant, allowing noisy neighbors.
- Circuit breakers without clear fallbacks or observability, masking silent failures.
Debugging tips
- Check per-hop timing: add server-timing headers or structured logs with durations.
- Inspect saturation: CPU, DB connections in use, queue sizes, GC pauses.
- Look for correlation between traffic spikes and error/latency spikes—rate limits may be too loose.
- Verify cache hit rates and ETag 304 ratios after deployments.
- Trace sample requests end-to-end to find slow dependencies.
Practical projects
- Build a global read cache with ETag and 304 handling across two regions.
- Implement per-user and per-IP rate limits with separate token buckets and compare effectiveness.
- Add a circuit breaker with a metrics dashboard; run failure drills to observe open/half-open transitions.
- Create a load test harness that records p50/p95/p99, throughput, and error breakdown by status code.
Subskills
- Latency Budgets And SLAs: Turn business goals into per-hop budgets and timeouts.
- Caching Headers And ETags: Use Cache-Control, ETag, and 304 to reduce load.
- Request Deduplication: Idempotency keys and server-side dedup to prevent duplicate work.
- Connection Pooling Awareness: Right-size pools, avoid exhaustion, and minimize cold-start latency.
- Rate Limiting Implementation: Token bucket/leaky bucket to protect capacity and ensure fairness.
- Retries Backoff And Timeouts: Safe retry policies with jitter and per-attempt timeouts.
- Circuit Breakers Basics: Prevent cascading failures with open/half-open states and fallbacks.
- Load Testing For APIs: Baseline, bottleneck identification, and regression checks.
Next steps
- Finish all subskills and complete the exam below.
- Apply the mini project in a sandbox environment and measure improvements.
- Move on to the next skill in your API Engineering path and keep iterating on budgets and limits as load grows.