Topic Not Found

Why this matters

As a Platform Engineer, you are responsible for making apps fast, stable, and cost-efficient. Resource requests and limits control how much CPU and memory a Pod reserves and can use; autoscaling adds or removes capacity based on load. Get these wrong and you get throttling, OOMKills, noisy neighbors, or wasteful clusters. Get them right and you ship reliable, scalable platforms.

Real tasks you will do:
- Right-size CPU/memory for Deployments to stop throttling and OOMKills.
- Configure HPA to scale replicas based on CPU/memory or custom metrics.
- Work with Cluster Autoscaler to add nodes when requests cannot be scheduled.
- Set sensible defaults/limits in namespaces so teams avoid outages and cost spikes.

Concept explained simply

Each container in Kubernetes can declare two numbers for each resource:

request: how much you ask the scheduler to reserve. Determines placement and capacity planning.
limit: the maximum the container may use. Exceeding CPU limit causes throttling; exceeding memory limit kills the container (OOMKilled).

Units:

CPU: millicores (500m = 0.5 CPU of a core). CPU is compressible: exceeding the limit throttles but doesn’t kill.
Memory: bytes (e.g., 256Mi, 1Gi). Memory is not compressible: exceeding the limit OOMKills the container.

Pod QoS classes (derived from requests/limits):

Guaranteed: every container has equal request==limit for CPU and memory. Most protected from eviction.
Burstable: some requests set; can burst up to limits. Middle protection.
BestEffort: no requests/limits. Least protected; first to be evicted under pressure.

Autoscaling layers:

HPA (Horizontal Pod Autoscaler): changes replica count based on metrics (e.g., CPU utilization). More replicas = more parallelism.
VPA (Vertical Pod Autoscaler): recommends or applies larger/smaller requests/limits per Pod (restarts Pods when applying).
Cluster Autoscaler: adds/removes nodes when Pods can’t be scheduled due to insufficient requested resources.

Mental model: budget and speed limit

Think of request as your reserved budget (a desk on the floor) and limit as the speed limit (how fast you can go). The scheduler places you based on your reserved budget. When traffic grows, HPA adds more workers (more desks), and if there’s no room, Cluster Autoscaler rents more floor space (new nodes).

Worked examples

Example 1: Right-size a latency-sensitive API

Observed p95 CPU per Pod is ~150m during peak with spikes to 400m. Memory steady at 180Mi with occasional peaks to 300Mi.

Pick headroom: request CPU 200m, limit 500m; request memory 256Mi, limit 512Mi.
Why: request covers typical p95 so the Pod schedules reliably; limit allows burst without throttling too quickly; memory limit above peak to avoid OOMKills.

resources:
  requests:
    cpu: "200m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Example 2: Configure HPA for CPU

Target average CPU utilization per Pod at 60%, replicas 2–10.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

Note: HPA decisions depend on requests. If requests are too small, the same raw CPU load looks like a high utilization percentage and causes over-scaling.

Example 3: Memory-bound worker avoiding OOMKills

Worker uses 600–700Mi with spikes to 900Mi when processing large batches.

Set request 700Mi, limit 1Gi; CPU request 250m, limit 1000m (to allow bursts).
Use HPA on memory utilization at 70% to add replicas when usage grows.

resources:
  requests:
    cpu: "250m"
    memory: "700Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: worker
  minReplicas: 1
  maxReplicas: 8
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70

What if Pods don't scale even when HPA says they should?

Check events for Pending Pods with reason "Insufficient cpu/memory". Cluster Autoscaler may need to add nodes.
If CA is enabled but not scaling, requests may be larger than any single node’s free space. Reduce per-Pod request or use larger nodes.

Exercises you can run

Do these after reading the examples. They mirror the graded exercises below.

Exercise 1: Plan requests/limits and an HPA for a spiky web service. Target 60% CPU utilization; expected YAML includes Deployment resources and HPA 2–10 replicas.
Exercise 2: Diagnose throttling and memory eviction from given logs; propose resource tweaks and HPA settings.

Checklist before you move on:
- You can explain the difference between request and limit in one sentence.
- You can choose CPU/memory values from observed p95 and peak usage.
- You can configure an autoscaling/v2 HPA with CPU or memory targets.
- You know how QoS classes change with request/limit settings.

Common mistakes and how to self-check

Too-low requests inflating HPA utilization and causing over-scaling.
Self-check

Compare raw CPU (cores) vs utilization %. If raw CPU is steady but % is high, requests are probably too small.
CPU limits too tight causing throttling.
Self-check

Look for throttling metrics or messages like "throttling" in logs. Increase CPU limit or remove it for latency-critical apps.
Memory limits below peak causing OOMKilled.
Self-check

Check container restarts with reason OOMKilled. Set limit above known peak; set request closer to steady-state.
Expecting HPA to fix bad per-Pod sizing.
Self-check

If each Pod instantly OOMs or throttles, HPA won’t help. First fix per-Pod requests/limits.
Ignoring Cluster Autoscaler when Pods stay Pending.
Self-check

Pending with "Insufficient" reasons means the cluster lacks resources. Reduce per-Pod request or ensure Cluster Autoscaler can scale nodes.

Practical projects

Autoscaling API: Deploy a sample API with proper requests/limits and HPA on CPU. Perform a basic load test and tune targets to hit a latency SLO.
Memory-heavy batch worker: Configure memory-first sizing and HPA on memory utilization; verify zero OOMKills across a large batch.
Cost-aware namespace defaults: Create a LimitRange and ResourceQuota that prevent BestEffort Pods and cap over-provisioning.

Learning path

Before: Containers basics, Pod/Deployment, metrics collection fundamentals.
Now: Requests, limits, QoS; HPA v2 configuration; Cluster Autoscaler behavior.
After: Custom/external metrics for HPA, VPA recommendations, pod topology spread, priority and preemption.

Who this is for

Platform and DevOps engineers owning multi-tenant clusters.
Backend engineers deploying services who need predictable performance.
SREs responsible for latency/error budgets and cost controls.

Prerequisites

Comfort with Kubernetes Deployments, Pods, and basic YAML editing.
Basic understanding of CPU/memory metrics and Pod logs.

Next steps

Apply these settings to one real service and watch metrics for 24–48 hours.
Tune requests/limits and HPA targets to meet latency/error budgets.
Add namespace defaults (LimitRange) to guide teams toward safe values.

Mini challenge

Pick an existing Deployment that occasionally OOMKills. Increase memory limit 20–30% above observed peak, set request near steady-state, and add an HPA on memory at 70%. Verify no OOMKills over a full traffic cycle and note the replica behavior.

Quick Test: what to expect

10 short questions on requests, limits, QoS, and autoscaling. Everyone can take the test; only logged-in users get saved progress.

Menu

Resource Requests Limits Autoscaling

Table of Contents

Why this matters

Concept explained simply

Worked examples

Example 1: Right-size a latency-sensitive API

Example 2: Configure HPA for CPU

Example 3: Memory-bound worker avoiding OOMKills

Exercises you can run

Common mistakes and how to self-check

Practical projects

Learning path

Who this is for

Prerequisites

Next steps

Mini challenge

Quick Test: what to expect

Practice Exercises

Plan requests/limits and HPA for a spiky API

Instructions

Expected Output

Fix throttling and memory eviction

Resource Requests Limits Autoscaling — Quick Test

Have questions about Resource Requests Limits Autoscaling?

AI Assistant