CI/CD for Platform Engineers: what and why
CI/CD Platform work turns code into reliable, repeatable releases. As a Platform Engineer, you build the pipelines, standards, and tooling that let teams ship safely and quickly. This includes pipeline design, artifact management, secrets, promotion flows, GitOps, and automated rollback strategies.
- Your impact: faster lead time, fewer failed deploys, easy rollbacks, auditability.
- Daily tasks: creating pipeline templates, setting up artifact repos, enforcing release standards, integrating security scans, and enabling GitOps for environments.
What you'll be able to do
- Design multi-stage pipelines (build, test, scan, package, deploy) with caching and parallelization.
- Implement artifact versioning and immutability policies.
- Create reusable pipeline templates for teams with guardrails.
- Manage secrets safely with short-lived credentials.
- Automate promotions and rollbacks with health checks and approval gates.
- Adopt GitOps for declarative, auditable deployments.
- Handle both monorepo and multi-repo workflows at scale.
- Ship with documented release automation standards.
Who this is for
- Platform/DevOps/Backend engineers owning build-and-release workflows.
- Engineers introducing standards across multiple teams and services.
- Tech leads improving reliability, speed, and compliance of releases.
Prerequisites
- Comfortable with Git (branches, tags, PRs).
- Basic Docker and container registry usage.
- Familiar with at least one CI system (GitHub Actions, GitLab CI, or Jenkins).
- Basic Kubernetes experience is helpful (for GitOps and rollbacks), but not required.
Learning path
- Build & Test: Create a fast, cache-aware build and unit-test pipeline. Add static analysis.
- Scan & Package: Add SCA/Container scanning. Package artifacts with immutable tags.
- Deploy to Staging: Introduce environment configs and smoke tests. Capture artifacts and logs.
- Promotion Gates: Add quality/approval gates and automated checks for production promotion.
- Rollback Automation: Detect unhealthy deploys and roll back safely.
- Template & Standardize: Extract common steps into templates/reusable workflows for teams.
- GitOps: Move environment state to Git; sync with a controller for auditability.
- Scale to Many Repos: Support monorepo path filters or multi-repo triggers and dependency rules.
Milestone checks
- You can set up a multi-stage pipeline with cache and parallel jobs.
- Artifacts are versioned, immutable, and traceable back to commits.
- Secrets are not stored in plaintext; access is scoped and short-lived.
- Production promotions require clear signals (tests, metrics, approvals).
- Rollbacks are one command or fully automated on failing health checks.
- Teams can use a template with minimal configuration.
- Environment drift is minimized via GitOps.
Worked examples
1) GitHub Actions: build, test, scan, and push image
name: ci
on:
push:
branches: [ main ]
pull_request:
permissions:
contents: read
packages: write
id-token: write # for OIDC to cloud/registry if needed
jobs:
build_test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20' }
- name: Install deps (with cache)
run: |
npm ci
- name: Unit tests
run: npm test -- --ci
image:
needs: build_test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Extract version
id: v
run: echo "version=$(git describe --tags --always)" >> $GITHUB_OUTPUT
- name: Build image
run: |
docker build -t ghcr.io/${{ github.repository }}:${{ steps.v.outputs.version }} .
- name: Login to GHCR
run: echo ${{ secrets.GITHUB_TOKEN }} | docker login ghcr.io -u ${{ github.actor }} --password-stdin
- name: Push image
run: |
docker push ghcr.io/${{ github.repository }}:${{ steps.v.outputs.version }}
Key points:
- Separate build/test from image packaging with
needs. - Immutable tag from commit/tag description.
- Minimal required permissions and OIDC ready.
2) GitLab CI: staged promotions with manual prod
stages: [build, test, scan, package, deploy]
variables:
IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
build:
stage: build
script: ["npm ci", "npm run build"]
rules:
- if: $CI_PIPELINE_SOURCE == "push"
test:
stage: test
script: ["npm test -- --ci"]
scan:
stage: scan
script: ["trivy fs --exit-code 1 ."]
package:
stage: package
script:
- docker build -t $IMAGE .
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker push $IMAGE
rules:
- if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
deploy_staging:
stage: deploy
environment: name: staging
script: ["kubectl set image deploy/app app=$IMAGE", "kubectl rollout status deploy/app"]
needs: [package]
deploy_prod:
stage: deploy
environment: name: production
when: manual
allow_failure: false
script: ["kubectl set image deploy/app app=$IMAGE", "kubectl rollout status deploy/app"]
needs: [deploy_staging]
Key points:
- Tag-based packaging to keep prod images immutable and traceable.
- Manual prod gate; staging must pass first.
3) Jenkins: shared library and credentials
@Library('ci-lib') _
pipeline {
agent any
options { timestamps() }
stages {
stage('Build & Test') {
steps {
checkout scm
sh 'npm ci && npm test -- --ci'
}
}
stage('Package') {
steps {
script { docker.build("app:${env.BUILD_NUMBER}") }
}
}
stage('Push Image') {
steps {
withCredentials([usernamePassword(credentialsId: 'reg-creds', usernameVariable: 'U', passwordVariable: 'P')]) {
sh 'echo $P | docker login registry.example.com -u $U --password-stdin'
sh 'docker tag app:${BUILD_NUMBER} registry.example.com/app:${BUILD_NUMBER}'
sh 'docker push registry.example.com/app:${BUILD_NUMBER}'
}
}
}
}
}
Key points:
- Shared library helps standardize steps across teams.
- Credentials are scoped via Jenkins credentials store.
4) GitOps: Argo CD Application for staging
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: app-staging
spec:
project: default
source:
repoURL: https://git.example.com/org/env-configs.git
targetRevision: main
path: apps/app/staging
destination:
server: https://kubernetes.default.svc
namespace: app-staging
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Promotion flow: merge a PR that bumps the image tag in apps/app/staging/values.yaml. Argo CD syncs automatically, giving a clear audit trail of who promoted what, when.
5) Semantic versioning helper
# outputs next version based on last semver tag and commit messages
LAST=$(git describe --tags --abbrev=0 2>/dev/null || echo "0.0.0")
MAJOR=$(echo $LAST | cut -d. -f1)
MINOR=$(echo $LAST | cut -d. -f2)
PATCH=$(echo $LAST | cut -d. -f3)
COMMITS=$(git log $LAST..HEAD --pretty=%s)
if echo "$COMMITS" | grep -qiE '^feat'; then MINOR=$((MINOR+1)); PATCH=0
elif echo "$COMMITS" | grep -qiE 'fix|chore|refactor'; then PATCH=$((PATCH+1))
fi
NEXT="$MAJOR.$MINOR.$PATCH"
echo "$NEXT"
Use this in a release job to compute the next tag and apply it atomically with the artifact publish step.
6) Automatic rollback on failed smoke test
- name: Deploy to staging
run: |
kubectl set image deploy/app app=$IMAGE
kubectl rollout status deploy/app --timeout=90s
- name: Smoke test
run: |
set -e
curl -fsS http://app.staging.svc.cluster.local/healthz | grep 'ok'
- name: Rollback if failed
if: failure()
run: |
kubectl rollout undo deploy/app
Key points:
- Gate deployment with a quick health endpoint.
- On failure, automatically revert to last ReplicaSet.
Drills and exercises
- [ ] Add caching to a pipeline step and measure time saved before/after.
- [ ] Replace a long-lived registry token with OIDC or a short-lived token.
- [ ] Enforce immutable tags (reject pushes to an existing tag) in your registry.
- [ ] Create a reusable pipeline template that runs tests, SAST, and uploads coverage.
- [ ] Add a smoke test step that fails fast if the service is unhealthy.
- [ ] Implement a promotion job that only runs on signed tags like vX.Y.Z.
- [ ] Configure a GitOps app with auto-sync and a manual sync window for prod.
- [ ] Set up path filters for a monorepo so only changed services build.
- [ ] Capture SBOMs for builds and store them next to artifacts.
- [ ] Add a rollback command and practice it on a test cluster.
Common mistakes and debugging tips
Using latest tags
Problem: deployments drift and rollbacks are unpredictable. Fix: always use immutable tags or digests; store tag->commit mapping as build metadata.
Secrets in plaintext
Problem: secrets in repo or environment variables without masking. Fix: use secret managers and OIDC; scope access by environment; rotate regularly.
No cache strategy
Problem: slow pipelines. Fix: cache dependencies and Docker layers; reorder Dockerfile to maximize cache hits.
Skipping security scans
Problem: vulnerable artifacts are shipped. Fix: add SAST/SCA/container scans with severity thresholds and fail the build on high severity.
Manual, undocumented promotions
Problem: inconsistent releases. Fix: codify promotions with tags/PRs, approvals, and automated checks. Use GitOps for audit trails.
One-pipeline-fits-all
Problem: fragile pipelines across teams. Fix: create templates with inputs and guardrails; allow teams to extend via well-defined hooks.
No rollback plan
Problem: long outages. Fix: predefine rollback commands and health checks; test them regularly.
Mini project: CI/CD with GitOps and automated rollback
Goal: Build a pipeline for a demo service that packages a Docker image, scans it, deploys to staging via GitOps, runs smoke tests, and rolls back on failure. Promotion to prod is gated by approval.
- Build and test: run unit tests; upload coverage artifact.
- Scan: run dependency and container scans with a failing threshold.
- Package: build image with immutable tag
${GIT_SHA}and push. - GitOps staging: open a PR to the env repo bumping the image tag; auto-merge on green checks.
- Staging verify: smoke test; auto-rollback if failing.
- Promotion: create a signed tag
vX.Y.Z; require approval to merge prod PR. - Prod verify: post-deploy checks; record release notes and SBOM URL in metadata.
Acceptance criteria
- All steps are reproducible from a clean runner.
- Artifacts and images are traceable to commit and build logs.
- Staging and prod states are visible in Git; rollbacks are one command.
- At least one reusable template is used by the pipeline.
Practical projects
- Policy-driven pipelines: create a template that enforces code coverage, scan thresholds, and required approvals.
- SBOM & provenance: generate SBOMs and attach provenance metadata to artifact uploads.
- Monorepo accelerator: build a workflow that detects changed services and runs matrix builds with path filters.
Subskills
- Build And Deploy Pipelines — Design multi-stage pipelines across CI tools with caching and parallelization. Estimated time: 60–120 min.
- Artifact Repositories And Versioning — Configure registries, enforce immutability, retention, and semantic versioning. Estimated time: 45–90 min.
- Pipeline Templates For Teams — Create reusable templates/workflows with inputs and policy checks. Estimated time: 45–90 min.
- Secrets In CI CD — Use OIDC/secret managers; avoid plaintext; enable rotation and scoping. Estimated time: 45–90 min.
- Automated Rollbacks And Promotions — Health gates, progressive delivery, and safe revert strategies. Estimated time: 60–120 min.
- GitOps Concepts — Declarative delivery with sync policies and PR-based promotions. Estimated time: 45–90 min.
- Managing Monorepos And Multi Repo Flows — Path filters, matrix builds, and cross-repo triggers. Estimated time: 60–120 min.
- Release Automation Standards — Branching, tagging, changelogs, approvals, and compliance evidence. Estimated time: 45–90 min.
Next steps
- Study each subskill below and complete the drills.
- Build the mini project and validate acceptance criteria.
- Take the skill exam to check readiness. Anyone can take it; only logged-in users have their progress saved.