Why this skill matters for Platform Engineers
Containers give you reproducible builds and predictable runtime environments. Kubernetes schedules, scales, and heals those containers across clusters. As a Platform Engineer, this enables you to ship reliable platforms, reduce drift, and provide self-service infrastructure for product teams.
- Speed: consistent builds and fast rollouts.
- Reliability: health checks, rollbacks, autoscaling.
- Security: image hardening, isolation, policy controls.
- Cost control: right-size workloads with requests/limits and autoscaling.
What you’ll be able to do
- Build, harden, and publish container images for apps and platforms.
- Model services with Deployments, Services, Ingress, and autoscaling.
- Operate clusters: upgrades, maintenance, quotas, and basic multi-tenancy.
- Debug issues quickly with kubectl, logs, exec, events, and metrics.
Who this is for
- Platform and DevOps engineers building internal platforms for multiple teams.
- Backend engineers deploying microservices to Kubernetes.
- SREs who need strong, reproducible service foundations.
Prerequisites
- Comfortable using a terminal and Git.
- Basics of Linux processes, filesystems, and networking (ports, DNS, IPs).
- At least one programming language to package an example service (e.g., Python, Go, Node.js, Java).
Learning path
- Container build and hardening
Goals & practice
Build a minimal, non-root image using multi-stage builds. Pin base images by digest. Scan and minimize attack surface.
# Example: multi-stage for Go FROM golang:1.22-alpine AS build WORKDIR /src COPY . . RUN CGO_ENABLED=0 go build -o app ./cmd/app FROM gcr.io/distroless/static:nonroot USER 65532 COPY --from=build /src/app /app ENTRYPOINT ["/app"] - Kubernetes core objects
Goals & practice
Deploy an app with a Deployment, expose it with a Service, and add probes and resource requests/limits.
- Namespaces and multi-tenancy
Goals & practice
Create isolated namespaces, apply ResourceQuotas and LimitRanges, and test RBAC roles for least privilege.
- Ingress and networking basics
Goals & practice
Route external traffic to Services using an Ingress controller. Test DNS and service discovery inside the cluster.
- Scaling and resiliency
Goals & practice
Set requests/limits, configure HorizontalPodAutoscaler, and verify scale up/down under load.
- Helm basics
Goals & practice
Template your app with Helm, use values files for environments, and do a safe rolling upgrade.
- Cluster maintenance
Goals & practice
Plan upgrades, cordon/drain nodes, validate add-on compatibility, and perform post-upgrade checks.
- Debugging and troubleshooting
Goals & practice
Use kubectl logs, exec, describe, events, and port-forward. Check pod scheduling, DNS, and network policies.
Worked examples
1) Harden a container image
Build a small, non-root image and pin base image by digest, reducing attack surface.
# Dockerfile (Node.js example)
# 1) Build stage
FROM node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
RUN npm run build
# 2) Runtime stage (distroless) pinned by digest
FROM gcr.io/distroless/nodejs20@sha256:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
WORKDIR /app
USER 65532
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules
ENV NODE_ENV=production
CMD ["/app/dist/server.js"]
Key checks: no root, minimal base, pinned digest, no build tools in final image.
2) Deployment + Service with probes
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels: {app: api}
template:
metadata:
labels: {app: api}
spec:
containers:
- name: api
image: registry.example.com/api:1.2.3
ports:
- containerPort: 8080
readinessProbe:
httpGet: {path: /healthz, port: 8080}
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
httpGet: {path: /live, port: 8080}
initialDelaySeconds: 15
periodSeconds: 10
resources:
requests: {cpu: "100m", memory: "128Mi"}
limits: {cpu: "300m", memory: "256Mi"}
---
apiVersion: v1
kind: Service
metadata:
name: api
spec:
selector: {app: api}
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
type: ClusterIP
Result: Rolling updates, health-checked pods, and stable cluster IP.
3) HorizontalPodAutoscaler (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 8
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Ensure Metrics Server is healthy so the HPA can read CPU metrics.
4) Ingress for path routing
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-gateway
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: example.local
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api
port: {number: 80}
- path: /
pathType: Prefix
backend:
service:
name: web
port: {number: 80}
Test with curl --resolve example.local to hit the Ingress.
5) Helm values override and upgrade
# values-prod.yaml
replicaCount: 4
image:
repository: registry.example.com/api
tag: 1.2.4
resources:
requests:
cpu: 150m
memory: 192Mi
limits:
cpu: 400m
memory: 384Mi
# Upgrade (idempotent)
# helm upgrade --install api ./chart -n prod -f values-prod.yaml
Chart templating keeps environments consistent while enabling safe overrides.
Drills and exercises
- Build a multi-stage image for your app and verify it runs as a non-root user.
- Add readiness and liveness probes; simulate a failure and observe rollout behavior.
- Set requests/limits and validate scheduling with kubectl describe pod.
- Create a Namespace + ResourceQuota + LimitRange; confirm over-limit deployments fail.
- Configure an Ingress route; confirm TLS termination locally (self-signed is fine for practice).
- Add an HPA and load test; watch it scale up and back down.
- Package your app with Helm and perform a zero-downtime upgrade.
Common mistakes and debugging tips
Using root images or bloated base images
Symptom: security scans report high CVEs. Fix: use minimal/distroless, drop capabilities, run as non-root, pin by digest.
Missing readiness probes
Symptom: traffic hits pods before they’re ready, causing errors. Fix: always define readinessProbe; verify endpoints before Service starts routing.
No resource requests/limits
Symptom: noisy neighbors, throttling, OOMKills. Fix: set realistic requests/limits; monitor to tune.
Confusing liveness vs readiness
Liveness restarts a bad container; readiness gates traffic. Don’t make liveness probe too strict or you’ll create restart loops.
DNS and service discovery issues
Use kubectl exec and tools like nslookup or dig (install in a debug pod) to confirm service names and cluster DNS suffix.
RBAC too broad
Follow least privilege. Use Roles/RoleBindings per namespace, and only ClusterRoles when needed.
Mini project: Production-ready microservice on Kubernetes
Goal: Deploy a small HTTP API with safe defaults, routing, scaling, and configuration management.
- Build a minimal, non-root container image; pin the base image digest.
- Create a Namespace with ResourceQuota and LimitRange for the service team.
- Deploy the API with a Deployment, Service, probes, and resources.
- Expose routes via an Ingress; add path-based routing to a second service (e.g., a static site).
- Add an HPA targeting 70% CPU; validate with a simple load test.
- Package with Helm; create values files for dev and prod; perform a rolling upgrade with changed values.
- Write a short runbook: startup checks, rollback steps, and common troubleshooting commands.
Acceptance checklist
- Image runs as non-root and is <150MB.
- Pods show 0 restarts under steady load for 15 minutes.
- HPA scales from 2 to at least 4 replicas under load, then returns to 2.
- Ingress routes / and /api correctly.
- helm upgrade is idempotent and preserves uptime.
Subskills
Build small, non-root, pinned images; remove build tools; scan and reduce attack surface.
Model workloads with Deployments, expose with Services, and use health probes and rollouts.
Isolate teams with Namespaces, quotas, default limits, and RBAC.
Route external traffic, understand ClusterIP/NodePort/LoadBalancer, and service discovery.
Right-size pods and enable HPA to scale under load while controlling costs.
Template Kubernetes manifests, separate config from code, and deliver safe upgrades.
Plan and execute safe cluster and node upgrades with health checks.
Use kubectl, events, logs, and exec to quickly find and fix issues.
Next steps
- Automate image builds and scans in CI; enforce non-root and digest pins with policies.
- Introduce GitOps for cluster state, and add canary/blue-green strategies for safer releases.
- Add observability: Pod metrics, traces, dashboards, and alerting tied to SLOs.