Sections
Related Guides
Load Balancer Design: L4/L7 Routing, Health Checks, and Failover
High-Level Design
Caching: Strategy, Redis Internals & Distributed Patterns
High-Level Design
System Migration in Production: Zero-Downtime Strategy and Risk Control
Production Engineering
SLO Design: Error Budgets, Burn Rate Alerts, and the Reliability Tradeoff
Production Engineering
Technical Debt Triage: Prioritizing Fixes That Reduce Real Risk
Production Engineering
Capacity Planning for Production Systems
A practical framework for forecasting load, setting headroom, and scaling capacity ahead of incidents. Covers demand modeling, uncertainty bands, and cost-reliability tradeoffs.
Capacity Planning Is Reliability Engineering with Economics
Under-provisioning causes incidents; over-provisioning burns margin. Good capacity planning balances reliability targets and cost constraints using explicit demand models and uncertainty bands.
Capacity Planning Loop
Demand forecast
Model baseline + seasonality + event spikes.
Service envelope
Translate forecast into CPU/memory/IO/network by tier.
Headroom policy
Set target utilization and reserve margin by criticality.
Stress and failure tests
Validate assumptions through load and degradation tests.
Review cadence
Update forecasts monthly/quarterly with observed error.
Typical Headroom Policy
| Tier | Target Utilization | Reserved Headroom | Reason |
|---|---|---|---|
| Critical online path | 50-60% | 40-50% | absorbs bursts and partial failures |
| Important but non-critical | 65-75% | 25-35% | balanced cost and reliability |
| Batch/offline | 75-85% | 15-25% | can tolerate queueing delay |
Forecast to Provisioning Flow
Common Failure
Using average load for planning while incidents occur at p95/p99 bursts and during partial dependency failures.
Interview Summary
Include uncertainty bands and explicit headroom policy. Point estimates alone are not production planning.
Interview Questions
Click to reveal answersSign in to take the Quiz
This topic has 15 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.
Sign in to take quiz →