Menu

Topic 7 of 8

Capacity Planning Basics

Learn Capacity Planning Basics for free with explanations, exercises, and a quick test (for Backend Engineer).

Published: January 20, 2026 | Updated: January 20, 2026

Why this matters

As a Backend Engineer, you must ensure systems meet demand without wasting money. Capacity planning helps you predict peak load, right-size infrastructure, and protect service-level objectives (SLOs). Real tasks you will do:

  • Estimate how many service instances are needed for a promotion or product launch.
  • Set autoscaling targets and safety buffers to keep p95–p99 latency under SLO.
  • Forecast database storage, IOPS, and network throughput months ahead.
  • Plan redundancy (N+1, multi-AZ) so you stay within SLO during failures.
Quick glossary
  • Throughput (RPS/jobs/sec): how much work arrives.
  • Latency: time to complete a request (p50/p95/p99).
  • Utilization: percent of a resource used (CPU, memory, IOPS, bandwidth).
  • Saturation: queues building up; waits increase.
  • Headroom: spare capacity reserved for spikes/failover.
  • N+1: have at least one extra unit beyond the minimum to handle failure.

Concept explained simply

Capacity planning answers: Will our system meet peak demand within SLO while staying cost-efficient?

  • Measure current demand and performance.
  • Forecast peak demand (growth, seasonality, events).
  • Map demand to resources (CPU, memory, storage, network, IOPS).
  • Add safety margins (headroom, N+1, multi-AZ).
  • Continuously verify via load tests and observability.

Mental model

  • Little’s Law (simple version): Concurrent work β‰ˆ Arrival rate Γ— Time in system. If 200 requests/sec and avg time 0.2 sec, about 40 requests are in-flight.
  • Utilization: Keep typical utilization 50–70% so you have headroom for spikes and to protect tail latency.
  • Headroom: plan buffers (e.g., 20–40%) and N+1. Over-provision a bit to avoid SLO breaches; under-provisioning is costlier during incidents.
  • Workload type: CPU-bound, memory-bound, I/O-bound, or network-bound. The tightest constraint dominates capacity.
  • Scale strategy: Vertical (bigger machines) vs horizontal (more instances). Horizontal scaling plus autoscaling is common for stateless services.

Step-by-step capacity planning (repeatable)

  1. Define SLO/SLI: e.g., p95 latency < 200 ms, error rate < 0.5%.
  2. Baseline: Gather RPS/jobs, p95/p99 latency, CPU%, memory, IOPS, network. Note peak vs average.
  3. Forecast: Estimate peak using recent peaks Γ— growth, seasonality, and special events. Prefer p95 peaks over averages.
  4. Find constraints: Identify which resource saturates first (CPU, memory, IOPS, network).
  5. Per-instance capacity: Load test to find sustainable throughput at target utilization (e.g., 65–70%).
  6. Compute instances: Instances = ceil(peak_throughput / per_instance_capacity) Γ— safety_factor.
  7. Resilience: Apply N+1 and distribute across AZs. Ensure you can lose 1 unit/AZ and still meet SLO.
  8. Autoscaling: Set min/desired/max and targets (e.g., CPU 60–65%).
  9. Alerts: Alert on SLO burn, saturation, and approaching limits (e.g., 80–90% of capacity sustained).
  10. Review: Revisit after launches/incidents; update forecasts monthly or after big changes.

Worked examples

Example 1 β€” API service sizing

Baseline: 500 RPS peak today. Load test shows each instance sustains 140 RPS at ~65% CPU with p95 latency < 180 ms. Next month’s campaign: expect 2.5Γ— peak.

  • Forecast peak: 500 Γ— 2.5 = 1250 RPS.
  • Instances before buffer: ceil(1250 / 140) = ceil(8.93) = 9.
  • Safety buffer 30%: 9 Γ— 1.3 = 11.7 β†’ 12 instances.
  • Multi-AZ (3 AZs) with N+1 mindset: 12 total β‰ˆ 4 per AZ. Losing 1 instance still leaves 11 which covers 1250 RPS at ~76% of per-instance capacity (acceptable if still within SLO).
  • Autoscaler: min 3 (one per AZ), desired 12, max 18, target CPU 60–65%.
Example 2 β€” Batch queue deadline

Goal: process 1,000,000 jobs in 2 hours. Each job needs 0.2 sec CPU and fits in memory per worker.

  • Total CPU-seconds: 1,000,000 Γ— 0.2 = 200,000 sec.
  • Available wall-clock: 2 hours = 7200 sec.
  • Workers needed: 200,000 / 7200 β‰ˆ 27.78 β†’ 28 workers.
  • Add 20% headroom: 28 Γ— 1.2 = 33.6 β†’ 34 workers.
Example 3 β€” Storage growth forecast

Current DB: 100 GB. Growth ~8%/month. Volume limit: 2 TB (2048 GB). Plan upgrade when at 75%.

  • Target threshold: 0.75 Γ— 2048 = 1536 GB.
  • How many months to reach 1536 from 100 with 8% growth? 100 Γ— (1.08)^m = 1536 β†’ (1.08)^m = 15.36 β†’ m β‰ˆ ln(15.36)/ln(1.08) β‰ˆ 33.7 months.
  • Plan: schedule capacity change at ~32 months, earlier if ingestion spikes.

Who this is for

  • Backend and platform engineers responsible for service reliability and cost.
  • On-call engineers preparing for traffic spikes or migrations.

Prerequisites

  • Basic understanding of service metrics (RPS, latency, CPU, memory).
  • Familiarity with autoscaling concepts and multi-AZ deployments.
  • Ability to run or read results from load tests.

Learning path

  • Before this: Observability fundamentals (SLIs/SLOs), basic scaling.
  • This lesson: Core formulas, buffers, and a repeatable planning process.
  • After this: Incident response playbooks, cost optimization, advanced forecasting.

Exercises & practice

Note: Exercises and the quick test are available for free. If you log in, your progress will be saved.

Exercise 1 β€” Size a stateless API for a spike

Given: average 350 RPS; expected peak multiplier 3Γ—; SLO p95 < 200 ms; one instance sustains 180 RPS at ~70% CPU (p95 160 ms), degrades after 230 RPS; plan for 30% headroom; 3 AZs with N+1 per AZ. Calculate total instances and per-AZ distribution. Propose autoscaler min/desired/max and CPU target.

Exercise 2 β€” Forecast storage and set alerts

Given: DB size 400 GB today; linear growth 2.5 GB/day; volume limit 1.5 TB; set alerts at 75% and 90% of limit. Compute when to trigger each alert and when the volume will be full if nothing changes. Suggest a review cadence.

Practice checklist

  • [ ] I based plans on peak demand and SLOs, not averages.
  • [ ] I identified the dominant bottleneck (CPU/memory/IOPS/network).
  • [ ] I included both headroom and N+1/multi-AZ considerations.
  • [ ] I proposed explicit autoscaling targets and bounds.
  • [ ] I added actionable alerts before hard limits.

Common mistakes and self-check

  • Planning from averages, not peaks. Fix: multiply by peak factors or use recent p95 peaks.
  • No buffer for failover. Fix: add N+1 and headroom (20–40%).
  • Ignoring the real bottleneck. Fix: validate with profiling/load tests.
  • Setting alerts on raw CPU only. Fix: alert on SLO burn and saturation signals too.
  • One-time plan. Fix: schedule monthly reviews and after major launches.
Self-check prompts
  • If one AZ is lost, do I still meet SLO?
  • What metric will saturate first, and how do I know?
  • How quickly can autoscaling respond vs how fast traffic spikes?
  • What’s the rollback if my forecast is wrong by 50%?

Practical projects

  • Create a capacity plan for one critical service: include SLOs, forecast, per-instance capacity, total instances, headroom, N+1, autoscaling, and alerts.
  • Run a load test to find sustainable throughput at 60–70% CPU and update your plan.
  • Build a dashboard showing demand, utilization, latency (p95/p99), and headroom; add alert rules at 80% and 90% of limits.

Next steps

  • Automate weekly reports: peak demand, headroom, and upcoming storage deadlines.
  • Introduce pre-warming or scheduled scaling for known events.
  • Review cost: compare right-sized instances vs over-provisioning; optimize after SLO is safe.

Mini challenge

Your service currently handles 800 RPS at 65% CPU with 8 pods. Marketing expects a 2Γ— traffic spike for 1 hour. You want 25% headroom and can only scale in whole pods. How many pods should you run during the spike?

Reveal answer

Per-pod capacity at 65% β‰ˆ 800 / 8 = 100 RPS. Peak = 800 Γ— 2 = 1600 RPS. Pods before buffer: 1600 / 100 = 16. Add 25% headroom: 16 Γ— 1.25 = 20 pods.

Practice Exercises

2 exercises to complete

Instructions

Given:

  • Avg 350 RPS; expected peak multiplier 3Γ—.
  • SLO: p95 < 200 ms.
  • One instance sustains 180 RPS at ~70% CPU (p95 160 ms), degrades near 230 RPS.
  • Plan 30% headroom.
  • 3 AZs; aim N+1 per AZ.

Tasks:

  • Compute forecast peak RPS.
  • Compute instances needed before and after headroom.
  • Distribute across 3 AZs considering N+1.
  • Propose autoscaler min/desired/max and CPU target.
Expected Output
Plan with total instances and per-AZ counts; autoscaler min/desired/max; CPU target percent.

Capacity Planning Basics β€” Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Capacity Planning Basics?

AI Assistant

Ask questions about this tool