Why this skill matters for Backend Engineers
System Design Basics help you turn requirements into reliable, scalable services. You will estimate capacity, split work into synchronous and asynchronous paths, choose where to cache, balance traffic, and plan for failures. Mastering these lets you ship services that survive real traffic and change.
What this unlocks in your day-to-day
- Design APIs and services that scale as usage grows.
- Reduce latency with caching and smart routing.
- Prevent outages with timeouts, retries, and circuit breakers.
- Ship features faster by offloading heavy work asynchronously.
- Debug production issues using logs, metrics, and traces.
Who this is for
- Backend and platform engineers starting with distributed systems.
- Developers moving from single-service apps to microservices.
- Engineers preparing for system design interviews.
Prerequisites
- Comfortable with one backend language (e.g., Go, Java, Python, Node.js).
- HTTP basics, JSON, REST or RPC familiarity.
- Fundamental data structures and databases (SQL or NoSQL).
Learning path
1) Think in scalability
Estimate capacity (QPS, latency, throughput), identify bottlenecks, pick scale-up vs scale-out.
2) Build stateless frontends
Keep request state external (cookies, tokens, caches), enable horizontal scaling behind a load balancer.
3) Balance and route traffic
Use L4/L7 load balancing, health checks, sticky sessions only when needed.
4) Cache the hot paths
Choose client, CDN, reverse-proxy, or data cache. Define TTLs, invalidation rules, and cache keys.
5) Go async for heavy work
Queue long-running jobs, make handlers idempotent, monitor DLQs (dead-letter queues).
6) Design for failure
Apply timeouts, retries with backoff, circuit breakers, and bulkheads.
7) Observe everything
Emit structured logs, RED/USE metrics, and distributed traces. Add health and readiness checks.
Quick reference: core concepts
- Statelessness: any instance can serve any request.
- Load balancing: distribute requests, remove unhealthy nodes.
- Caching: trade memory for speed; invalidation is the hard part.
- Async processing: smooth spikes, isolate failures, improve latency.
- Reliability patterns: timeouts, retries, circuit breakers, idempotency.
- Observability: logs (what happened), metrics (how much/fast), traces (where time went).
Worked examples
Example 1: Read-heavy API with caching
Scenario: Product details API, 90% reads, 10% writes, target p95 latency < 150 ms under 2k RPS.
- Design: CDN or reverse-proxy cache for GETs, data cache (e.g., Redis) behind API, write-through or invalidate-on-write.
- Cache key:
product:{id}:v{version}. Bump version on write to avoid stale reads.
// Pseudocode (handler)
func GetProduct(id) {
val = cache.get("product:"+id+":v"+version())
if val != nil { return val }
val = db.query("SELECT ... WHERE id=?", id)
cache.set(key, val, ttl=60s)
return val
}
Why this works
Most traffic is reads. Caching reduces DB load and latency. Versioned keys avoid complex invalidation logic.
Example 2: Rate limiter (token bucket)
Goal: Limit each user to 100 requests/min with small bursts.
// Redis token bucket (simplified)
now = unix_ms()
state = redis.hgetall("bucket:"+user)
fill = (now - state.last_refill_ms) * rate_per_ms
state.tokens = min(capacity, state.tokens + fill)
if state.tokens >= 1 { state.tokens -= 1; allow }
else { deny with 429 }
redis.hmset(...)
Key points
- Use a shared store (e.g., Redis) for consistency across instances.
- Keep limiter stateless by externalizing state.
Example 3: Reliable background jobs
Move image processing off the request path.
- API stores metadata, enqueues job with
idempotency_key = image_id. - Worker pulls, processes, updates status, acknowledges.
- Failed jobs retried with exponential backoff; after N attempts, move to DLQ.
// Pseudocode
POST /upload -> enqueue({image_id, user_id})
worker:
msg = queue.receive()
if already_processed(msg.image_id): ack()
else: process(); mark_done(); ack()
Operational practices
- Monitor queue depth and processing latency.
- Keep processing idempotent; retries won't duplicate effects.
Example 4: Request timeouts, retries, circuit breaker
Service A calls Service B. Require p95 latency < 200 ms.
// Client to Service B
client:
timeout = 150ms
retries = 2 (exponential backoff, jitter)
circuitBreaker:
open after >=50% failures over last 20 calls; half-open to test recovery
Why this helps
Short timeouts prevent thread/connection exhaustion. Retries with backoff avoid thundering herds. Circuit breaker stops cascading failures.
Drills and exercises
- Compute capacity for an endpoint: expected RPS, avg and p95 latency, required concurrent workers.
- Sketch stateless deployment: N instances behind a load balancer with health/readiness checks.
- Design a cache key and TTL for a read-heavy resource. Write the invalidation rule.
- Add retries with exponential backoff and jitter to one outbound call in your service.
- Instrument one endpoint with a request log, RED metrics, and a trace span.
- Convert a slow endpoint step into an asynchronous job and ensure idempotency.
Common mistakes and debugging tips
- Overusing sticky sessions. Tip: keep services stateless; use sticky sessions only for protocols that require it.
- No timeouts on external calls. Tip: set per-call timeouts; budget total request time.
- Retrying everything. Tip: only retry safe, idempotent operations; use backoff + jitter.
- Cache without invalidation. Tip: choose explicit TTLs and versioned keys or write-through schemes.
- Ignoring partial failures. Tip: define fallbacks and degrade gracefully (serve stale cache, default responses).
- Missing observability. Tip: structured logs, cardinality control, RED metrics, and traces with consistent IDs.
Mini project: Scalable image metadata service
Build a service that stores image metadata and exposes:
- POST /images to submit metadata and enqueue thumbnail generation (async).
- GET /images/{id} to fetch metadata quickly (cached).
Requirements and guidance
- Stateless API behind a load balancer (mocked locally via multiple processes or ports).
- Data cache (in-memory or Redis) with TTL and versioned keys.
- Queue-backed worker to generate thumbnails (simulate work with sleep).
- Idempotency key for POST to avoid duplicate processing.
- Timeouts and retries on worker storage operations.
- Logs for each request, metrics counters for success/error, and a basic trace ID passed through.
- Define capacity assumptions (target RPS, latency budget).
- Implement GET cache with versioned keys and a 60s TTL.
- Enqueue work on POST; worker processes and updates status.
- Add timeouts, retries with exponential backoff and jitter.
- Expose health, liveness, and readiness endpoints.
Acceptance criteria
- Under a simple load test, p95 for GET remains under your budget.
- Stopping the worker causes DLQ growth; restarting drains it.
- Duplicate POSTs with same idempotency key do not create duplicates.
Additional practical projects
- Build a URL shortener with read-heavy caching and rate limiting.
- Create a pub/sub notification service with fan-out workers and DLQ.
- Implement a feature flag service with in-memory cache and periodic refresh from DB.
Subskills
- Designing For Scalability — Estimate load, identify bottlenecks, choose scale-up vs scale-out, and plan capacity.
- Stateless Services Principles — Externalize session/state, enable safe horizontal scaling.
- Load Balancing Concepts — L4/L7 routing, health checks, and when to use sticky sessions.
- Caching Layers — Client, CDN, reverse-proxy, and data caches; TTLs and invalidation patterns.
- Asynchronous Processing Basics — Offload slow/fragile tasks; define job contracts and DLQs.
- Message Queues Basics — At-least-once delivery, idempotency, backoff, and visibility timeouts.
- Handling Failures And Timeouts — Timeouts, retries with jitter, circuit breakers, bulkheads.
- Observability Concepts — Logs, metrics (RED/USE), traces, and health endpoints.
Next steps
- Pick one practical project and complete it end-to-end.
- Add observability to one existing service: logs, metrics, and traces.
- Prepare a short design doc for a service you own, including capacity, caching, async, and failure plans.
Skill exam
Take the exam to validate your understanding. Available to everyone. Log in to save progress and resume later.
Quick capacity math helper
Concurrent workers ≈ RPS × avg_latency_seconds. Example: 200 RPS × 0.1 s = 20 workers.