How to learn Deployments Services Ingress Basics for Kubernetes For ML Workloads in MLOps Engineer for free

Who this is for

MLOps Engineers deploying ML inference APIs or batch services on Kubernetes.
Data/ML Engineers who need stable rollout, scaling, and simple external access.
Developers moving from local Docker to production Kubernetes.

Prerequisites

Comfort with Docker images.
Basic kubectl usage (apply, get, describe, logs).
Know Pods and containers (what a Pod is, basic YAML structure).

Why this matters

Your day-to-day MLOps tasks often include:

Rolling out a new model version with zero downtime.
Keeping a stable endpoint for clients even as Pods restart or scale.
Routing external HTTP traffic to the right service, often with path-based rules like /predict or /metrics.
Quickly reverting a bad release without breaking traffic.

Deployments, Services, and Ingress are the trio that make this reliable:

Deployment = desired state, rolling updates, scaling for your Pods.
Service = stable virtual IP and DNS name for accessing Pods.
Ingress = HTTP(S) routing from outside the cluster to internal Services.

Concept explained simply

Think of Kubernetes networking like a building with three layers of doors:

Deployment: The operations team that ensures a certain number of identical rooms (Pods) are always ready and handles swapping occupants during renovations (rolling updates).
Service: A receptionist with a permanent phone number (ClusterIP) who forwards calls to any available room that matches certain labels.
Ingress: The main entrance for visitors from the street; it reads the visitor's request (host/path) and directs them to the right receptionist (Service).

Mental model

Labels and selectors are the glue. Deployment labels Pods; Service selects Pods via those labels; Ingress points to the Service.
Ports must align. Container port (in Pod) -> Service targetPort -> Service port -> Ingress backend service/port.
Health checks matter. Readiness gates traffic until the app is ready; liveness restarts stuck containers.

Worked examples

Example 1: Deployment for an inference API

Goal: run 2 replicas of a FastAPI/Flask model server with safe rollouts and health checks.

Show Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: infer-deploy
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: infer
  template:
    metadata:
      labels:
        app: infer
    spec:
      containers:
      - name: infer
        image: ghcr.io/example/infer:1.0
        ports:
        - containerPort: 8080
        env:
        - name: MODEL_NAME
          value: resnet50
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /live
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "1"
            memory: "1Gi"

What this gives you: consistent scaling, safe rolling updates, and probes that prevent bad pods from receiving traffic.

Example 2: Service for stable access

Goal: create a stable virtual IP to reach the Pods.

Show Service YAML

apiVersion: v1
kind: Service
metadata:
  name: infer-svc
spec:
  type: ClusterIP
  selector:
    app: infer
  ports:
  - name: http
    port: 80
    targetPort: 8080

Traffic flow: Service port 80 forwards to Pod port 8080. Inside the cluster, DNS name is infer-svc.default.svc.cluster.local (format varies by namespace).

Example 3: Ingress for external HTTP routing

Goal: expose /predict and /health from the outside world.

Show Ingress YAML (generic, controller required)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: infer-ing
  annotations:
    # Example for NGINX; annotations vary by controller
    nginx.ingress.kubernetes.io/rewrite-target: /$1
spec:
  rules:
  - host: ml.example.local
    http:
      paths:
      - path: /predict
        pathType: Prefix
        backend:
          service:
            name: infer-svc
            port:
              number: 80
      - path: /health
        pathType: Prefix
        backend:
          service:
            name: infer-svc
            port:
              number: 80

Note: An Ingress controller must be installed in the cluster. The YAML declares intent; the controller does the actual routing.

How to apply: step-by-step

Apply the Deployment. Wait until all Pods are Ready.
Apply the Service. Confirm it has a ClusterIP and endpoints.
Apply the Ingress. Verify the controller admits the rule and endpoints are ready.
Smoke test /health first, then /predict to validate functionality.

Useful kubectl commands

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml

kubectl get deploy,po,svc,ing
kubectl describe deploy infer-deploy
kubectl describe svc infer-svc
kubectl describe ing infer-ing

kubectl get endpointslices -l app=infer
kubectl logs deploy/infer-deploy

Exercises (your turn)

These map to the graded exercises below. You can complete them in any conformant Kubernetes cluster. If you cannot run a cluster now, write the YAML and self-check against the checklist.

Exercise 1: Create a Deployment and Service for an inference API

Requirements:

Deployment named infer-deploy with 2 replicas.
Image: ghcr.io/example/infer:1.0
Container port: 8080; readiness on /health; liveness on /live.
Env: MODEL_NAME=beta
Service infer-svc exposes port 80 -> targetPort 8080.

Self-check checklist

kubectl get deploy shows 2/2 READY.
kubectl get svc shows infer-svc with a ClusterIP and port 80.
kubectl describe endpoints or endpointslices shows addresses for 2 Pods.
kubectl port-forward svc/infer-svc 8080:80 returns healthy response on /health.

Exercise 2: Add an Ingress for /predict

Requirements:

Ingress name: infer-ing
Host: ml.example.local
Path: /predict routes to infer-svc port 80

Self-check checklist

kubectl get ing shows ADDRESS populated (after controller config).
curl -H "Host: ml.example.local" hits the service.
Readiness failures do not receive traffic (test by breaking readiness temporarily).

Common mistakes and how to self-check

Mismatched labels: Service selector must match Pod labels. Self-check: kubectl get pods -l app=infer and ensure non-empty.
Port mismatch: Service targetPort must equal containerPort (or named port). Self-check: kubectl describe svc and verify.
Only liveness probe: Using liveness without readiness can send traffic to unready Pods. Add readiness to gate traffic.
Forgetting Ingress controller: Ingress YAML without a controller does nothing. Self-check: kubectl get pods -n ingress-controller-namespace (varies by setup).
Zero maxUnavailable during rollout without surge: Can deadlock. Use maxSurge: 1 and maxUnavailable: 0 for zero-downtime updates.

Practical projects

Blue/Green switch: Deploy infer-v1 and infer-v2. Point Service selector to one at a time and switch over.
Path router: Expose /predict to infer-svc and /metrics to a separate metrics-svc via Ingress.
A/B experiment: Two Deployments behind two Services, then manually route 10% of test traffic to B using a separate path (/predict-b).

Learning path

Start: Deployments/Services/Ingress basics (this lesson).
Next: ConfigMaps/Secrets for model configs and API keys.
Then: Horizontal Pod Autoscaler for load-based scaling of inference.
Advanced: Canary rollouts with Progressive Delivery tools; TLS and cert rotation; Multi-tenancy and network policies.

Next steps

Finish the exercises and compare with the provided solutions.
Take the quick test below to validate knowledge. Note: Everyone can take the test; progress is saved only when logged in.
Apply to a real service in your environment and capture before/after SLOs (latency, error rate) during a rollout.

Mini challenge

Create two Deployments (infer-v1, infer-v2) and two Services (infer-v1-svc, infer-v2-svc). Use an Ingress to route:

/v1/predict -> infer-v1-svc
/v2/predict -> infer-v2-svc

Bonus: Switch your main /predict to point to v2 by updating only the Ingress. Measure downtime (should be near zero if readiness is correct).

Menu

Deployments Services Ingress Basics

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Mental model

Worked examples

Example 1: Deployment for an inference API

Example 2: Service for stable access

Example 3: Ingress for external HTTP routing

How to apply: step-by-step

Exercises (your turn)

Exercise 1: Create a Deployment and Service for an inference API

Exercise 2: Add an Ingress for /predict

Common mistakes and how to self-check

Practical projects

Learning path

Next steps

Mini challenge

Practice Exercises

Deploy an inference API with a stable Service

Instructions

Expected Output

Expose /predict via Ingress

Deployments Services Ingress Basics — Quick Test

Have questions about Deployments Services Ingress Basics?

AI Assistant