Note: Everyone can use the quick test below. Logged-in users also get saved progress.
Why this matters
As a Data Architect, you design systems that handle many jobs at once: ingesting events, running transformations, serving analytics, and powering data products. Concurrency and capacity planning prevent slowdowns, outages, and surprise costs by ensuring you have the right resources and limits for peak and steady loads.
- Size Kafka topics, partitions, and consumer groups to meet event SLAs.
- Set warehouse concurrency slots/queues to keep BI queries responsive.
- Right-size Spark clusters and Airflow concurrency to meet batch windows.
- Define safe limits: connection pools, thread pools, query slots, rate limits.
- Plan headroom for spikes, backfills, reprocessing, and schema changes.
Concept explained simply
Concurrency is how many operations your system handles at the same time. Throughput is how many operations it completes per unit time. Latency is how long a single operation takes. Capacity is the maximum sustainable throughput at an acceptable latency (your SLO).
Mental model: a highway
Lanes = concurrency. Cars passing per minute = throughput. The time a car spends on the road = latency. When too many cars enter (high arrival rate) or lanes are too few, queues form and latency spikes. Adding lanes (more concurrency) helps until on-ramps, tolls, or exits (downstream bottlenecks) become the new constraint.
Key metrics and quick formulas
- Arrival rate λ (requests/s, events/s)
- Service time W (s) and Service rate μ = 1/W (ops/s per worker)
- Concurrency (in-flight work) L
- Utilization U ≈ λ / (workers × μ). Keep U well below 100% to avoid queuing.
Little’s Law (steady state)
L = λ × W. If you want a latency target W and expect arrival rate λ, then you need concurrency L to keep work flowing without queues. Example: λ = 2000 events/s, W = 0.05 s ⇒ L ≈ 100 in-flight events capacity.
Setting safe headroom
Operate at 50–70% of peak to absorb bursts and tail latencies. A common rule: plan 30–50% headroom above expected peak.
Connection/slot sizing shortcut
Max concurrent operations ≈ target throughput × target latency. Convert latency to seconds. Example: 600 rps × 0.08 s ≈ 48 concurrent operations.
Worked examples
1) Kafka partitions and consumers
Goal: Ingest 120k events/min (2000/s). Each consumer processes ~250 messages/s at target latency. Choose partitions and consumer instances.
- Needed consumer capacity = 2000 / 250 = 8 consumers.
- Kafka throughput scales roughly with partitions (to a point). Minimum partitions ≥ consumers ⇒ start with 8–12 partitions.
- Add 30% headroom ⇒ 8 × 1.3 ≈ 10.4 ⇒ choose 12 partitions and 10–12 consumers (spread across instances).
- Monitor consumer lag and P99 latency. If lag rises at steady traffic, add consumers or optimize processing.
2) Data warehouse concurrency slots
Goal: Keep dashboard queries under 2 s P95 during peak: 1800 queries/min (30/s). Each slot can complete ~5 queries/s (with current workload mix).
- Required slots at 100% utilization = 30 / 5 = 6.
- Operate at ~65% utilization ⇒ 6 / 0.65 ≈ 9.2 ⇒ provision 10 slots.
- Create queues/classes: BI interactive (high priority, small concurrency), ETL (lower priority, larger concurrency). Example: 10 slots total: 6 BI, 4 ETL.
- Set per-user/query limits to prevent one query from consuming all slots.
3) Spark batch window sizing
Goal: Process 6 TB within a 1-hour window. Current job benchmarks at 200 GB/min cluster-wide with 80 executors.
- Throughput needed = 6 TB / 60 min = 100 GB/min.
- Current cluster does 200 GB/min ⇒ headroom = 100%. You could reduce to ~40–50 executors and still meet the window with margin.
- If adding complex joins reduces throughput by 40%, adjusted throughput ≈ 120 GB/min with 80 executors ⇒ still above 100 GB/min target. Keep 20–30% extra executors for skew/backfills.
4) Connection pool sizing
OLTP API writes to a metadata DB. Peak 600 requests/s, average write latency 80 ms (0.08 s). What pool size?
- Concurrency need ≈ 600 × 0.08 = 48 connections.
- Add 30% headroom ⇒ ≈ 62 ⇒ cap at 60–64 connections.
- Set queue length and request timeouts to avoid pile-ups. If DB CPU >70% at peak, scale vertically/horizontally or add a write buffer.
A simple capacity planning process
- Gather workload: arrival rates, peaks, seasonality, payload sizes, read/write mix.
- Define SLOs: P95/P99 latency, throughput, freshness, and failure budgets.
- Baseline: measure service time per component; find bottlenecks.
- Model: use Little’s Law and utilization to estimate needed concurrency and workers.
- Experiment: load test; vary concurrency; record latency and error curves.
- Set limits: pools, slots, partitions, autoscaling rules, backpressure, and rate limits.
- Document: assumptions, charts, and upgrade triggers (e.g., if P99 > target for 3 days, add N workers).
Who this is for and prerequisites
Who this is for
- Data Architects designing platforms and SLAs.
- Data Engineers owning pipelines and warehouses.
- Analytics Engineers tuning BI concurrency.
Prerequisites
- Basic understanding of data pipelines (batch/streaming).
- Comfort with metrics: throughput, latency, CPU, memory, I/O.
- Familiar with your stack (e.g., Kafka, Spark, Airflow, warehouse).
Exercises you can do now
-
Exercise 1: Concurrency from SLO
You expect 300 rps at peak and want P95 latency under 120 ms. Calculate required concurrent workers and propose a safe limit.
Hint
Workers ≈ throughput × latency (in seconds). Add 30–50% headroom.
Self-check checklist
- I converted latency to seconds before multiplying.
- I added at least 30% headroom to handle bursts.
- I considered downstream limits (DB, network, partitions).
Common mistakes and how to self-check
- Running near 100% utilization: queues explode near saturation. Aim for 50–70% at peak.
- Ignoring tail latency (P99): average latency can look fine while P99 violates SLOs. Track tail metrics.
- Assuming linear scaling: doubling threads/partitions doesn’t always double throughput due to locks, I/O, and coordination.
- No backpressure: fast producers overwhelm slow consumers. Add rate limits, bounded queues, or flow control.
- One-size-fits-all pools: BI queries vs ETL have different needs. Separate queues/classes.
- Forgetting reprocessing/backfills: plan extra capacity and off-peak windows.
Self-check mini audit
- Do I have headroom targets written down?
- Are P95 and P99 monitored with alerts?
- Do producers have max in-flight limits and retries with backoff?
- Is there a throttle or admission control for spikes?
Practical projects
- Throughput curve test: Use a load generator to sweep concurrency from 1 to N, graph throughput vs latency, and find the knee of the curve. Document the operating point and limits.
- Kafka partition plan: Propose partitions/consumers for two topics: 1) 50k/s small events, 2) 2k/s heavy events. Include headroom, monitoring, and upgrade triggers.
- Warehouse WLM policy: Define resource classes for interactive, ELT, and ad-hoc. Set max concurrency and queueing rules to meet BI SLOs.
Mini challenge
During a promo, ingest jumps 3× for 20 minutes. Your stream processing P99 rises above SLO, but CPU averages 60%. What’s your next move?
Reveal considerations
- CPU isn’t the only bottleneck—check partitions, network, and downstream sinks.
- Increase partitions/consumers if each consumer is saturated.
- Enable backpressure or rate limiting to protect sinks.
- Consider short-term autoscale and long-term headroom increase.
Learning path
- Master metrics: latency distributions, throughput, utilization, and saturation.
- Apply Little’s Law and headroom planning to your pipelines and warehouse.
- Introduce backpressure, rate limits, and resource classes.
- Practice load testing and read scaling behavior from curves.
- Evolve autoscaling policies and upgrade triggers based on SLOs.
Next steps
- Run Exercise 1 on a real service you own and propose new limits.
- Implement monitoring for P95/P99, queue depth, and consumer lag.
- Take the quick test below to confirm understanding.