How to learn Jobs CronJobs For Batch Inference for Kubernetes For ML Workloads in MLOps Engineer for free

Why this matters

Most ML systems need scheduled or one-off batch inference: nightly scoring, weekly backfills, or temporary reprocessing after a model update. Kubernetes Jobs and CronJobs give you reliable execution, retries, scaling, resource limits, and automatic cleanup—all essential for robust MLOps.

Run large predictions as parallel shards.
Schedule off-peak jobs to save cost.
Retry failures safely without re-running everything.
Track status, logs, and outcomes for auditability.

Who this is for

MLOps engineers deploying batch inference on Kubernetes.
Data scientists moving prototypes to production in clusters.
Platform engineers standardizing ML batch patterns.

Prerequisites

Comfort with basic Kubernetes objects (Pods, Deployments, Namespaces).
Can build and push a container image with your inference code.
Basic YAML editing and using kubectl.

Concept explained simply

A Job runs a finite task until it completes. Think: “Do this work and finish.” A CronJob runs a Job on a schedule. Think: “Do this same task every night at 02:00.”

Mental model

Job = a checklist with N boxes (completions). parallelism = how many boxes you check at once.
CronJob = a calendar that creates a new Job at specific times.
Retries/backoff = what happens if a box fails; Kubernetes tries again up to backoffLimit.
Cleanup (ttlSecondsAfterFinished) = auto-remove completed Jobs after a delay.

Key Kubernetes objects for batch ML

Job: one-off batch run; supports parallelism, completions, retries.
CronJob: scheduled Jobs; supports concurrencyPolicy and history limits.
Pod template: command to run inference, resources, volumes, secrets.

Worked examples

Example 1 — One-off Job for a single dataset

apiVersion: batch/v1
kind: Job
metadata:
  name: infer-once
spec:
  backoffLimit: 3
  ttlSecondsAfterFinished: 3600
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: infer
        image: your-batch-infer:latest
        args: ["--input", "/data/input.parquet", "--output", "/data/preds.parquet"]
        resources:
          requests: { cpu: "1", memory: "2Gi" }
          limits: { cpu: "2", memory: "4Gi" }
        volumeMounts:
        - name: data
          mountPath: /data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: ml-bucket-pvc

Use this when you have a single file to score and just need it to finish once.

Example 2 — Nightly CronJob with no overlap

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-batch-infer
spec:
  schedule: "0 2 * * *"  # run every day at 02:00
  concurrencyPolicy: Forbid  # do not start a new run if the previous is still running
  startingDeadlineSeconds: 600  # if missed, try within 10 minutes
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      backoffLimit: 2
      ttlSecondsAfterFinished: 86400
      template:
        spec:
          restartPolicy: Never
          containers:
          - name: infer
            image: your-batch-infer:stable
            env:
            - name: MODEL_VERSION
              value: "v3"
            args: ["--date", "{{yesterday}}", "--model", "$(MODEL_VERSION)"]
            resources:
              requests: { cpu: "500m", memory: "1Gi" }
              limits: { cpu: "1", memory: "2Gi" }

Use this for predictable, scheduled scoring. Forbid avoids overlapping runs.

Example 3 — Sharded Job with Indexed completions

apiVersion: batch/v1
kind: Job
metadata:
  name: infer-sharded
spec:
  completions: 100
  parallelism: 10
  completionMode: Indexed
  backoffLimit: 2
  ttlSecondsAfterFinished: 7200
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: worker
        image: your-batch-infer:latest
        env:
        - name: SHARD_INDEX
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
        args: ["--shard-index", "$(SHARD_INDEX)", "--num-shards", "100"]
        resources:
          requests: { cpu: "1", memory: "2Gi" }
          limits: { cpu: "2", memory: "4Gi" }

Each Pod processes one shard, indexed 0..99, enabling deterministic re-runs for failed shards.

Patterns and parameters cheat-sheet

parallelism: number of Pods running at once.
completions: total successful Pods needed before Job is complete.
completionMode: Indexed enables stable shard indices; NonIndexed treats Pods as interchangeable.
backoffLimit: maximum retries for failed Pods before Job fails.
activeDeadlineSeconds: overall time limit for the Job.
ttlSecondsAfterFinished: auto-delete completed Jobs/Pods after N seconds.
CronJob concurrencyPolicy: Allow, Forbid, Replace (choose to avoid overlap).
startingDeadlineSeconds: catch up quickly after a missed schedule.
History limits: successfulJobsHistoryLimit and failedJobsHistoryLimit for retention.
restartPolicy: Never (common) or OnFailure for retrying inside same Pod.
Resources: set requests/limits to match model memory/CPU needs.
Data access: use volumes, object-store gateways, or service endpoints.
Secrets/config: mount credentials and model versions safely via Secrets/ConfigMaps.

Monitoring and debugging

kubectl get jobs, describe jobs to see status, conditions, failed/succeeded counts.
kubectl logs for Pod output; add structured logs (JSON) to simplify parsing.
Check events in describe output for scheduling or image pull issues.
For CronJobs: inspect lastScheduleTime and lastSuccessfulTime in status.
Emit metrics (duration, processed rows, errors) from your container logs.

Exercises (do these hands-on)

Note: The quick test is available to everyone. Only logged-in users have their exercise and test progress saved.

Exercise 1 — Sharded Job for 100 splits
- Create a Job with completions=100, parallelism=10, completionMode=Indexed.
- Expose the shard index to the container as SHARD_INDEX via the Pod annotation batch.kubernetes.io/job-completion-index.
- Command should look like: --shard-index $(SHARD_INDEX) --num-shards 100.
- Set backoffLimit=2, restartPolicy=Never, and ttlSecondsAfterFinished=3600.
Exercise 2 — Nightly CronJob without overlap
- Create a CronJob scheduled at 02:00 daily.
- Use concurrencyPolicy=Forbid and startingDeadlineSeconds=600.
- Keep only 3 successful and 2 failed Job histories.
- Set Job backoffLimit=1 and ttlSecondsAfterFinished=86400.

[ ] I validated my YAML with kubectl apply --dry-run=client -f file.yaml
[ ] I confirmed the Job/CronJob status fields behave as expected in a test cluster
[ ] I saw retries happen when I forced a failure

Common mistakes and self-check

Overlapping CronJobs: Fix with concurrencyPolicy=Forbid or Replace.
Infinite retries: Set backoffLimit and/or activeDeadlineSeconds.
Memory OOM kills: Increase memory limits or reduce parallelism; check Pod OOMKilled status.
Data collisions: In sharded jobs, ensure each shard writes to unique output paths.
Leaving clutter: Use ttlSecondsAfterFinished and history limits.

Self-check prompts

Does a failed shard re-run only that shard (Indexed), not all work?
Can your CronJob survive a missed schedule within the deadline?
Are logs sufficient to audit inputs, model version, and outputs?

Practical projects

Build a sharded batch inference pipeline that processes 1M rows split into 200 shards with Indexed Jobs and writes shard outputs to unique paths.
Create a nightly CronJob that scores new data, computes summary metrics, and posts a compact report to a storage location.
Add auto-cleanup (TTL) and a simple retention policy for successful and failed runs, and document recovery steps for failed shards.

Learning path

Now: Jobs and CronJobs for batch inference (this page).
Next: Resource tuning and autoscaling for batch Pods; advanced scheduling (affinity, taints/tolerations).
Then: Orchestrating multi-step batch flows (Pipelines) and integrating model registries and feature stores.

Next steps

Implement the exercises in a sandbox namespace.
Add structured logging and success/failure metrics to your container.
Review alerting for missed or failed CronJobs.

Mini challenge

Scenario: Backfill last 14 days safely

Create a CronJob that runs every hour but only processes one missing day at a time using a parameter (e.g., --date). Avoid overlap, retry twice on failure, and auto-clean completed Jobs after 12 hours. Hint: Use Forbid, startingDeadlineSeconds, and an idempotent container that checks which dates are pending.

Ready to test yourself?

Scroll to the Quick Test below. Not logged in? Your answers are not saved, but you can still practice for free.

Menu

Jobs CronJobs For Batch Inference

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Key Kubernetes objects for batch ML

Worked examples

Patterns and parameters cheat-sheet

Monitoring and debugging

Exercises (do these hands-on)

Common mistakes and self-check

Practical projects

Learning path

Next steps

Mini challenge

Ready to test yourself?

Practice Exercises

Sharded Job for 100 splits

Instructions

Expected Output

Nightly CronJob without overlap

Jobs CronJobs For Batch Inference — Quick Test

Have questions about Jobs CronJobs For Batch Inference?

AI Assistant