Menu

Topic 2 of 8

Capacity Planning

Learn Capacity Planning for free with explanations, exercises, and a quick test (for Platform Engineer).

Published: January 23, 2026 | Updated: January 23, 2026

Why this matters

Capacity planning keeps systems reliable and cost-effective. As a Platform Engineer, you will forecast demand, right-size compute and storage, set safe utilization targets, and plan scaling before outages happen. Real tasks include deciding how many API instances to run for a product launch, how much storage to allocate for logs or Kafka, and when to add database replicas to protect SLOs.

  • Prevent outages by staying below saturation.
  • Control costs by avoiding overprovisioning.
  • Hit SLOs by aligning capacity with demand and error budgets.

Who this is for

  • Platform and SRE engineers responsible for reliability and scaling.
  • Backend engineers who own services and on-call rotations.
  • Tech leads who approve capacity/cost plans.

Prerequisites

  • Basic understanding of CPU, memory, I/O, and network limits.
  • Familiarity with service metrics (RPS/QPS, latency percentiles, error rates).
  • Comfort with simple math: percentages, averages, and rounding.

Concept explained simply

Capacity planning predicts how much load will arrive and ensures you have enough resources, with a safety margin, while keeping utilization healthy. It’s a balance: too little causes incidents; too much wastes money.

Mental model: Highways and headroom

Imagine traffic lanes. If a lane runs at 95% full, one small surge causes a traffic jam. Keep traffic around 50–70% so you have room for spikes and delays. In systems, that "room" is headroom: the extra capacity above expected peak.

  • Forecast demand (peak and patterns).
  • Choose safe utilization targets (e.g., CPU 50–70%).
  • Add headroom (e.g., +20–40% above peak).
  • Plan scaling triggers (autoscale thresholds, batch windows).

Core formulas and targets

  • Effective capacity per instance = throughput_at_reference_util × (target_util / reference_util)
  • Required capacity with headroom = peak_demand × (1 + headroom_fraction)
  • Instances needed = ceil(required_capacity / effective_capacity_per_instance)
  • Little’s Law (queues): L = λ × W (in-flight = arrival_rate × wait_time)

Common targets:

  • CPU target utilization: 50–70% under normal peak.
  • Memory target: leave 20–30% free to avoid GC/OOM risks.
  • Disk utilization: aim for 60–70% with 20% free space policy.
  • Network: keep below 70–80% line rate to limit drops.

Worked examples

Example 1: Web API instances for launch day

Given: forecast peak = 7,000 RPS; each instance sustained 350 RPS at 70% CPU during load test. Target utilization = 60%. Headroom = 30%.

  • Effective per-instance capacity = 350 × (0.60 / 0.70) ≈ 300 RPS
  • Required capacity with headroom = 7,000 × 1.30 = 9,100 RPS
  • Instances needed = ceil(9,100 / 300) = 31

Answer: Run 31 instances. Add autoscaling with warm-up to avoid cold starts.

Example 2: Storage for a 5-day Kafka retention

Given: avg ingress = 120 MB/s; daily peak 3 hours at 2× (240 MB/s); retention = 5 days; compression = 0.5; replication = 3; overhead = 10%; keep 20% free.

  • Daily volume = (21h × 120 + 3h × 240) × 3600 ≈ 11,664,000 MB ≈ 11.664 TB
  • 5 days raw = 58.32 TB; compressed = 29.16 TB
  • Replicated (×3) = 87.48 TB; +10% overhead = 96.23 TB
  • Provision with 20% free: 96.23 / 0.8 ≈ 120.29 TB

Answer: About 120–121 TB usable provisioned space.

Example 3: Batch window impact on API headroom

Scenario: Nightly batch job adds 1,500 RPS for 45 minutes at 01:00 UTC; normal peak = 4,000 RPS; headroom policy 25%; current capacity = 5,500 RPS. Is it safe?

  • Batch peak demand = 4,000 + 1,500 = 5,500 RPS
  • Required with headroom = 5,500 × 1.25 = 6,875 RPS
  • Current capacity = 5,500 RPS ⇒ shortfall = 1,375 RPS

Action: Add instances or shift the batch to reduce overlapping peak.

Step-by-step playbook

  1. Collect demand: Peak RPS/throughput, percentiles, seasonality, batch overlaps.
  2. Define targets: Utilization thresholds, SLOs, error budgets, warm-up times.
  3. Load test: Measure per-instance throughput at a known utilization (e.g., 70%).
  4. Compute: Convert to effective capacity at target utilization; add headroom; round up instances.
  5. Plan scaling: Autoscaling rules, min/max bounds, cooldowns, and manual runbooks.
  6. Validate cost: Estimate monthly spend; review against budget. Varies by country/company; treat as rough ranges.
  7. Review: After releases or traffic changes, re-check assumptions.

Learning path

  • Start: Understand utilization, headroom, and forecasting basics.
  • Next: Practice with compute, storage, and network examples.
  • Advance: Apply Little’s Law and error budgets to plan safe limits.
  • Master: Build runbooks and autoscaling policies with realistic thresholds.

Practice exercises

Do these now. They mirror the graded exercises below.

Exercise 1: Right-size API instances

Forecast peak = 7,000 RPS. Per instance: 350 RPS at 70% CPU (from load test). Target util = 60%. Headroom = 30%.

  • Find effective per-instance capacity at 60%.
  • Add headroom to peak.
  • Round up instances needed.

When done, compare with the solution in the Exercises section.

Exercise 2: Plan replicated storage

5-day retention; avg 120 MB/s; 3 hours/day at 240 MB/s; compression 0.5; replication 3; 10% overhead; 20% free space target.

  • Compute daily and 5-day volumes.
  • Apply compression, replication, overhead.
  • Ensure 20% free space.
  • I considered peak, not just average
  • I applied utilization targets correctly
  • I added headroom before rounding
  • I accounted for replication/overhead/free space

Common mistakes and self-check

  • Using average instead of peak or percentile demand.
  • Confusing utilization target with headroom (they are different layers of safety).
  • Ignoring warm-up and scale-out delay.
  • Forgetting replication and free space policies in storage estimates.
  • Not revisiting plans after a feature launch or traffic change.
Self-check prompts
  • Did I model the worst overlapping loads within the same time window?
  • Is my instance capacity based on measured data at a known utilization?
  • Do autoscaling thresholds avoid oscillation (cooldowns, min/max)?
  • If a node fails, does remaining capacity still meet SLOs?

Practical projects

  • Create a capacity workbook: one sheet each for API, DB, and Kafka/logs with inputs, formulas, and outputs.
  • Design an autoscaling policy for a stateless service, including min/max, cooldowns, and alarm thresholds.
  • Run a synthetic load test and update your per-instance capacity numbers and utilization targets.

Mini challenge

Your service has 2,800 RPS peak, but marketing plans a campaign expected to add +60% traffic for 2 hours. You target 65% CPU, have load-test data of 200 RPS/instance at 70% CPU, and want 25% headroom. Is your current fleet of 20 instances enough?

Reveal hint

Adjust per-instance capacity to 65%, compute campaign peak, add headroom, then divide and round up.

Next steps

  • Finish the exercises and take the quick test below. Anyone can take it for free; only logged-in users will see saved progress.
  • Apply these steps to one real service you own this week.
  • Schedule a 30-minute review with your team to validate assumptions.

Quick Test — how it works

The test is available to everyone for free. Logged-in users will have their progress saved automatically.

Practice Exercises

2 exercises to complete

Instructions

Forecast peak = 7,000 RPS. Load test showed each instance sustains 350 RPS at 70% CPU. Your target utilization is 60%. You want 30% headroom above peak.

  • Compute effective per-instance capacity at 60%.
  • Compute required capacity including headroom.
  • How many instances are needed? Round up.
Expected Output
31 instances

Capacity Planning — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Capacity Planning?

AI Assistant

Ask questions about this tool