Why this matters
Capacity planning keeps systems reliable and cost-effective. As a Platform Engineer, you will forecast demand, right-size compute and storage, set safe utilization targets, and plan scaling before outages happen. Real tasks include deciding how many API instances to run for a product launch, how much storage to allocate for logs or Kafka, and when to add database replicas to protect SLOs.
- Prevent outages by staying below saturation.
- Control costs by avoiding overprovisioning.
- Hit SLOs by aligning capacity with demand and error budgets.
Who this is for
- Platform and SRE engineers responsible for reliability and scaling.
- Backend engineers who own services and on-call rotations.
- Tech leads who approve capacity/cost plans.
Prerequisites
- Basic understanding of CPU, memory, I/O, and network limits.
- Familiarity with service metrics (RPS/QPS, latency percentiles, error rates).
- Comfort with simple math: percentages, averages, and rounding.
Concept explained simply
Capacity planning predicts how much load will arrive and ensures you have enough resources, with a safety margin, while keeping utilization healthy. It’s a balance: too little causes incidents; too much wastes money.
Mental model: Highways and headroom
Imagine traffic lanes. If a lane runs at 95% full, one small surge causes a traffic jam. Keep traffic around 50–70% so you have room for spikes and delays. In systems, that "room" is headroom: the extra capacity above expected peak.
- Forecast demand (peak and patterns).
- Choose safe utilization targets (e.g., CPU 50–70%).
- Add headroom (e.g., +20–40% above peak).
- Plan scaling triggers (autoscale thresholds, batch windows).
Core formulas and targets
- Effective capacity per instance = throughput_at_reference_util × (target_util / reference_util)
- Required capacity with headroom = peak_demand × (1 + headroom_fraction)
- Instances needed = ceil(required_capacity / effective_capacity_per_instance)
- Little’s Law (queues): L = λ × W (in-flight = arrival_rate × wait_time)
Common targets:
- CPU target utilization: 50–70% under normal peak.
- Memory target: leave 20–30% free to avoid GC/OOM risks.
- Disk utilization: aim for 60–70% with 20% free space policy.
- Network: keep below 70–80% line rate to limit drops.
Worked examples
Example 1: Web API instances for launch day
Given: forecast peak = 7,000 RPS; each instance sustained 350 RPS at 70% CPU during load test. Target utilization = 60%. Headroom = 30%.
- Effective per-instance capacity = 350 × (0.60 / 0.70) ≈ 300 RPS
- Required capacity with headroom = 7,000 × 1.30 = 9,100 RPS
- Instances needed = ceil(9,100 / 300) = 31
Answer: Run 31 instances. Add autoscaling with warm-up to avoid cold starts.
Example 2: Storage for a 5-day Kafka retention
Given: avg ingress = 120 MB/s; daily peak 3 hours at 2× (240 MB/s); retention = 5 days; compression = 0.5; replication = 3; overhead = 10%; keep 20% free.
- Daily volume = (21h × 120 + 3h × 240) × 3600 ≈ 11,664,000 MB ≈ 11.664 TB
- 5 days raw = 58.32 TB; compressed = 29.16 TB
- Replicated (×3) = 87.48 TB; +10% overhead = 96.23 TB
- Provision with 20% free: 96.23 / 0.8 ≈ 120.29 TB
Answer: About 120–121 TB usable provisioned space.
Example 3: Batch window impact on API headroom
Scenario: Nightly batch job adds 1,500 RPS for 45 minutes at 01:00 UTC; normal peak = 4,000 RPS; headroom policy 25%; current capacity = 5,500 RPS. Is it safe?
- Batch peak demand = 4,000 + 1,500 = 5,500 RPS
- Required with headroom = 5,500 × 1.25 = 6,875 RPS
- Current capacity = 5,500 RPS ⇒ shortfall = 1,375 RPS
Action: Add instances or shift the batch to reduce overlapping peak.
Step-by-step playbook
- Collect demand: Peak RPS/throughput, percentiles, seasonality, batch overlaps.
- Define targets: Utilization thresholds, SLOs, error budgets, warm-up times.
- Load test: Measure per-instance throughput at a known utilization (e.g., 70%).
- Compute: Convert to effective capacity at target utilization; add headroom; round up instances.
- Plan scaling: Autoscaling rules, min/max bounds, cooldowns, and manual runbooks.
- Validate cost: Estimate monthly spend; review against budget. Varies by country/company; treat as rough ranges.
- Review: After releases or traffic changes, re-check assumptions.
Learning path
- Start: Understand utilization, headroom, and forecasting basics.
- Next: Practice with compute, storage, and network examples.
- Advance: Apply Little’s Law and error budgets to plan safe limits.
- Master: Build runbooks and autoscaling policies with realistic thresholds.
Practice exercises
Do these now. They mirror the graded exercises below.
Exercise 1: Right-size API instances
Forecast peak = 7,000 RPS. Per instance: 350 RPS at 70% CPU (from load test). Target util = 60%. Headroom = 30%.
- Find effective per-instance capacity at 60%.
- Add headroom to peak.
- Round up instances needed.
When done, compare with the solution in the Exercises section.
Exercise 2: Plan replicated storage
5-day retention; avg 120 MB/s; 3 hours/day at 240 MB/s; compression 0.5; replication 3; 10% overhead; 20% free space target.
- Compute daily and 5-day volumes.
- Apply compression, replication, overhead.
- Ensure 20% free space.
- I considered peak, not just average
- I applied utilization targets correctly
- I added headroom before rounding
- I accounted for replication/overhead/free space
Common mistakes and self-check
- Using average instead of peak or percentile demand.
- Confusing utilization target with headroom (they are different layers of safety).
- Ignoring warm-up and scale-out delay.
- Forgetting replication and free space policies in storage estimates.
- Not revisiting plans after a feature launch or traffic change.
Self-check prompts
- Did I model the worst overlapping loads within the same time window?
- Is my instance capacity based on measured data at a known utilization?
- Do autoscaling thresholds avoid oscillation (cooldowns, min/max)?
- If a node fails, does remaining capacity still meet SLOs?
Practical projects
- Create a capacity workbook: one sheet each for API, DB, and Kafka/logs with inputs, formulas, and outputs.
- Design an autoscaling policy for a stateless service, including min/max, cooldowns, and alarm thresholds.
- Run a synthetic load test and update your per-instance capacity numbers and utilization targets.
Mini challenge
Your service has 2,800 RPS peak, but marketing plans a campaign expected to add +60% traffic for 2 hours. You target 65% CPU, have load-test data of 200 RPS/instance at 70% CPU, and want 25% headroom. Is your current fleet of 20 instances enough?
Reveal hint
Adjust per-instance capacity to 65%, compute campaign peak, add headroom, then divide and round up.
Next steps
- Finish the exercises and take the quick test below. Anyone can take it for free; only logged-in users will see saved progress.
- Apply these steps to one real service you own this week.
- Schedule a 30-minute review with your team to validate assumptions.
Quick Test — how it works
The test is available to everyone for free. Logged-in users will have their progress saved automatically.