How to learn System Design Basics for Backend Engineer for free

Why this skill matters for Backend Engineers

System Design Basics help you turn requirements into reliable, scalable services. You will estimate capacity, split work into synchronous and asynchronous paths, choose where to cache, balance traffic, and plan for failures. Mastering these lets you ship services that survive real traffic and change.

What this unlocks in your day-to-day

Design APIs and services that scale as usage grows.
Reduce latency with caching and smart routing.
Prevent outages with timeouts, retries, and circuit breakers.
Ship features faster by offloading heavy work asynchronously.
Debug production issues using logs, metrics, and traces.

Who this is for

Backend and platform engineers starting with distributed systems.
Developers moving from single-service apps to microservices.
Engineers preparing for system design interviews.

Prerequisites

Comfortable with one backend language (e.g., Go, Java, Python, Node.js).
HTTP basics, JSON, REST or RPC familiarity.
Fundamental data structures and databases (SQL or NoSQL).

Learning path

1) Think in scalability

Estimate capacity (QPS, latency, throughput), identify bottlenecks, pick scale-up vs scale-out.

2) Build stateless frontends

Keep request state external (cookies, tokens, caches), enable horizontal scaling behind a load balancer.

3) Balance and route traffic

Use L4/L7 load balancing, health checks, sticky sessions only when needed.

4) Cache the hot paths

Choose client, CDN, reverse-proxy, or data cache. Define TTLs, invalidation rules, and cache keys.

5) Go async for heavy work

Queue long-running jobs, make handlers idempotent, monitor DLQs (dead-letter queues).

6) Design for failure

Apply timeouts, retries with backoff, circuit breakers, and bulkheads.

7) Observe everything

Emit structured logs, RED/USE metrics, and distributed traces. Add health and readiness checks.

Quick reference: core concepts

Statelessness: any instance can serve any request.
Load balancing: distribute requests, remove unhealthy nodes.
Caching: trade memory for speed; invalidation is the hard part.
Async processing: smooth spikes, isolate failures, improve latency.
Reliability patterns: timeouts, retries, circuit breakers, idempotency.
Observability: logs (what happened), metrics (how much/fast), traces (where time went).

Worked examples

Example 1: Read-heavy API with caching

Scenario: Product details API, 90% reads, 10% writes, target p95 latency < 150 ms under 2k RPS.

Design: CDN or reverse-proxy cache for GETs, data cache (e.g., Redis) behind API, write-through or invalidate-on-write.
Cache key: product:{id}:v{version}. Bump version on write to avoid stale reads.

// Pseudocode (handler)
func GetProduct(id) {
  val = cache.get("product:"+id+":v"+version())
  if val != nil { return val }
  val = db.query("SELECT ... WHERE id=?", id)
  cache.set(key, val, ttl=60s)
  return val
}

Why this works

Most traffic is reads. Caching reduces DB load and latency. Versioned keys avoid complex invalidation logic.

Example 2: Rate limiter (token bucket)

Goal: Limit each user to 100 requests/min with small bursts.

// Redis token bucket (simplified)
now = unix_ms()
state = redis.hgetall("bucket:"+user)
fill = (now - state.last_refill_ms) * rate_per_ms
state.tokens = min(capacity, state.tokens + fill)
if state.tokens >= 1 { state.tokens -= 1; allow }
else { deny with 429 }
redis.hmset(...)

Key points

Use a shared store (e.g., Redis) for consistency across instances.
Keep limiter stateless by externalizing state.

Example 3: Reliable background jobs

Move image processing off the request path.

API stores metadata, enqueues job with idempotency_key = image_id.
Worker pulls, processes, updates status, acknowledges.
Failed jobs retried with exponential backoff; after N attempts, move to DLQ.

// Pseudocode
POST /upload -> enqueue({image_id, user_id})
worker:
  msg = queue.receive()
  if already_processed(msg.image_id): ack()
  else: process(); mark_done(); ack()

Operational practices

Monitor queue depth and processing latency.
Keep processing idempotent; retries won't duplicate effects.

Example 4: Request timeouts, retries, circuit breaker

Service A calls Service B. Require p95 latency < 200 ms.

// Client to Service B
client:
  timeout = 150ms
  retries = 2 (exponential backoff, jitter)
  circuitBreaker:
    open after >=50% failures over last 20 calls; half-open to test recovery

Why this helps

Short timeouts prevent thread/connection exhaustion. Retries with backoff avoid thundering herds. Circuit breaker stops cascading failures.

Drills and exercises

Compute capacity for an endpoint: expected RPS, avg and p95 latency, required concurrent workers.
Sketch stateless deployment: N instances behind a load balancer with health/readiness checks.
Design a cache key and TTL for a read-heavy resource. Write the invalidation rule.
Add retries with exponential backoff and jitter to one outbound call in your service.
Instrument one endpoint with a request log, RED metrics, and a trace span.
Convert a slow endpoint step into an asynchronous job and ensure idempotency.

Common mistakes and debugging tips

Overusing sticky sessions. Tip: keep services stateless; use sticky sessions only for protocols that require it.
No timeouts on external calls. Tip: set per-call timeouts; budget total request time.
Retrying everything. Tip: only retry safe, idempotent operations; use backoff + jitter.
Cache without invalidation. Tip: choose explicit TTLs and versioned keys or write-through schemes.
Ignoring partial failures. Tip: define fallbacks and degrade gracefully (serve stale cache, default responses).
Missing observability. Tip: structured logs, cardinality control, RED metrics, and traces with consistent IDs.

Mini project: Scalable image metadata service

Build a service that stores image metadata and exposes:

POST /images to submit metadata and enqueue thumbnail generation (async).
GET /images/{id} to fetch metadata quickly (cached).

Requirements and guidance

Stateless API behind a load balancer (mocked locally via multiple processes or ports).
Data cache (in-memory or Redis) with TTL and versioned keys.
Queue-backed worker to generate thumbnails (simulate work with sleep).
Idempotency key for POST to avoid duplicate processing.
Timeouts and retries on worker storage operations.
Logs for each request, metrics counters for success/error, and a basic trace ID passed through.

Define capacity assumptions (target RPS, latency budget).
Implement GET cache with versioned keys and a 60s TTL.
Enqueue work on POST; worker processes and updates status.
Add timeouts, retries with exponential backoff and jitter.
Expose health, liveness, and readiness endpoints.

Acceptance criteria

Under a simple load test, p95 for GET remains under your budget.
Stopping the worker causes DLQ growth; restarting drains it.
Duplicate POSTs with same idempotency key do not create duplicates.

Additional practical projects

Build a URL shortener with read-heavy caching and rate limiting.
Create a pub/sub notification service with fan-out workers and DLQ.
Implement a feature flag service with in-memory cache and periodic refresh from DB.

Subskills

Designing For Scalability — Estimate load, identify bottlenecks, choose scale-up vs scale-out, and plan capacity.
Stateless Services Principles — Externalize session/state, enable safe horizontal scaling.
Load Balancing Concepts — L4/L7 routing, health checks, and when to use sticky sessions.
Caching Layers — Client, CDN, reverse-proxy, and data caches; TTLs and invalidation patterns.
Asynchronous Processing Basics — Offload slow/fragile tasks; define job contracts and DLQs.
Message Queues Basics — At-least-once delivery, idempotency, backoff, and visibility timeouts.
Handling Failures And Timeouts — Timeouts, retries with jitter, circuit breakers, bulkheads.
Observability Concepts — Logs, metrics (RED/USE), traces, and health endpoints.

Next steps

Pick one practical project and complete it end-to-end.
Add observability to one existing service: logs, metrics, and traces.
Prepare a short design doc for a service you own, including capacity, caching, async, and failure plans.

Skill exam

Take the exam to validate your understanding. Available to everyone. Log in to save progress and resume later.

Quick capacity math helper

Concurrent workers ≈ RPS × avg_latency_seconds. Example: 200 RPS × 0.1 s = 20 workers.

Menu

System Design Basics

Table of Contents

Why this skill matters for Backend Engineers

Who this is for

Prerequisites

Learning path

1) Think in scalability

2) Build stateless frontends

3) Balance and route traffic

4) Cache the hot paths

5) Go async for heavy work

6) Design for failure

7) Observe everything

Worked examples

Example 1: Read-heavy API with caching

Example 2: Rate limiter (token bucket)

Example 3: Reliable background jobs

Example 4: Request timeouts, retries, circuit breaker

Drills and exercises

Common mistakes and debugging tips

Mini project: Scalable image metadata service

Additional practical projects

Subskills

Next steps

Skill exam

System Design Basics — Skill Exam

Topics

Designing For Scalability

Stateless Services Principles

Caching Layers

Message Queues Basics

Observability Concepts

Load Balancing Concepts

Asynchronous Processing Basics

Handling Failures And Timeouts

Have questions about System Design Basics?

AI Assistant