Why this matters
ML pipelines touch many systems: data warehouses, model registries, artifact stores, cloud compute, and monitoring tools. Each needs credentials (API keys, tokens, passwords). Poor secrets management leads to outages, data leaks, or compromised cloud accounts. As an MLOps Engineer, you must inject secrets safely into CI/CD and runtime jobs, rotate them, and keep them out of code and logs.
- [ ] Avoid hardcoding keys in repos
- [ ] Use short-lived credentials where possible
- [ ] Separate secrets per environment (dev/stage/prod)
- [ ] Log redaction and secret scanning
- [ ] Least privilege and rotation policies
Who this is for
Engineers building or maintaining ML CI/CD pipelines, platform engineers, and data scientists deploying models who need a practical, safe approach to handling credentials.
Prerequisites
- Basic CI/CD familiarity (e.g., a YAML-based pipeline)
- Containers and environment variables
- High-level cloud IAM concepts (roles, policies)
Concept explained simply
A secret is sensitive data needed by your pipeline or jobs—like a database password or cloud token. Secrets management means storing them encrypted, granting minimal access, injecting them at runtime only, rotating them regularly, and keeping them out of code and logs.
Mental model
Think of secrets like one-time visitor badges: they are issued just before entry, expire quickly, and cannot be reused elsewhere. Your pipeline requests a badge when needed, uses it, and it becomes useless soon after. This reduces blast radius if leaked.
Design principles checklist
- [ ] Source of truth: a secret manager (Vault/Cloud Secret Manager/KMS-backed storage)
- [ ] Zero hardcoding in code or pipeline YAML
- [ ] Short-lived creds via OIDC/workload identity whenever possible
- [ ] Per-environment isolation and namespacing
- [ ] Least privilege roles and scoped access
- [ ] Encryption at rest and in transit
- [ ] Automatic rotation with clear owners
- [ ] Masking in logs and CI output
- [ ] Repo secret scanning pre-commit and in CI
- [ ] Audit trails and access logs reviewed regularly
Worked examples
Example 1 — CI runner to Cloud: OIDC for short-lived credentials
Goal: A pipeline needs to upload a model artifact to a cloud bucket without storing static keys.
# pipeline.yaml (generic example)
permissions:
id-token: write # allow OIDC token minting
contents: read
steps:
- name: Authenticate to Cloud via OIDC
run: |
# Exchange OIDC token from CI for a short-lived cloud credential
# (Use your cloud's OIDC federation endpoint)
export OIDC_TOKEN=$(echo "<provided-by-ci>")
# Request role assumption for limited bucket access
creds=$(cloud-cli sts assume-role-with-oidc \
--role-arn arn:cloud:iam::ACCOUNT_ID:role/ml-artifact-writer \
--audience pipeline \
--oidc-token "$OIDC_TOKEN" \
--duration-seconds 3600)
export ACCESS_KEY=$(echo "$creds" | jq -r .AccessKeyId)
export SECRET_KEY=$(echo "$creds" | jq -r .SecretAccessKey)
export SESSION_TOKEN=$(echo "$creds" | jq -r .SessionToken)
- name: Upload model artifact
env:
ACCESS_KEY: $ACCESS_KEY
SECRET_KEY: $SECRET_KEY
SESSION_TOKEN: $SESSION_TOKEN
run: |
cloud-storage cp model.pkl s3://ml-artifacts/prod/model.pkl --acl private
Notes:
- CI grants an OIDC token; the cloud issues a 1-hour credential limited to the artifacts bucket.
- No static keys are stored in the repo or CI variables.
Example 2 — Pipeline fetches DB password from a secret manager (Vault-like)
Goal: A training job needs a temporary PostgreSQL password.
# pipeline snippet
steps:
- name: Authenticate CI to Secret Manager
run: |
jwt=$(cat $CI_JOB_JWT) # CI provides a signed JWT
token=$(vault login -method=jwt role=ml-ci jwt=$jwt -format=json | jq -r .auth.client_token)
echo "VAULT_TOKEN=$token" >> $GITHUB_ENV
- name: Fetch temporary DB creds
env:
VAULT_TOKEN: ${{ env.VAULT_TOKEN }}
run: |
# Dynamic secret with TTL (e.g., 1h)
creds=$(vault read -format=json database/creds/featurestore-ro)
export PG_USER=$(echo $creds | jq -r .data.username)
export PG_PASS=$(echo $creds | jq -r .data.password)
echo "::add-mask::$PG_PASS" # mask in logs
psql "host=db.internal user=$PG_USER password=$PG_PASS dbname=features sslmode=require" -c "SELECT 1;"
Notes:
- Dynamic credentials are generated on demand and expire automatically.
- Masking prevents accidental exposure in logs.
Example 3 — Kubernetes job with mounted secret and log redaction
Goal: An inference job pulls a model from a private registry using a token mounted as a file.
# k8s manifest fragment
apiVersion: v1
kind: Secret
metadata:
name: model-registry-token
namespace: prod
stringData:
token: "<opaque>" # stored via secret manager sync, not in plain YAML
---
apiVersion: batch/v1
kind: Job
metadata:
name: pull-model
spec:
template:
spec:
containers:
- name: worker
image: ml-worker:stable
env:
- name: TOKEN_FILE
value: /var/run/secret/token
volumeMounts:
- name: regtok
mountPath: /var/run/secret
readOnly: true
command: ["bash","-lc"]
args:
- "set -euo pipefail; \
tok=$(cat $TOKEN_FILE); \
echo 'Token loaded (content masked)'; \
model-cli pull --token $tok --model prod/latest"
volumes:
- name: regtok
secret:
secretName: model-registry-token
automountServiceAccountToken: false
restartPolicy: Never
Notes:
- Token is a file with restricted permissions, not printed.
- Service account token mounting disabled to minimize exposure.
Step-by-step: set up secrets in your ML pipeline
- Choose a source of truth
Use a dedicated secret manager (Vault or cloud-native). Enable encryption at rest and audit logging.
- Define access policies
Create roles with least privilege: read-only for CI, scoped to specific paths or secret names; separate per environment.
- Enable workload identity
Configure OIDC/workload identity so CI jobs or K8s workloads exchange their identity for short-lived credentials.
- Inject at runtime
Expose secrets as env vars or mounted files only in the steps/containers that need them. Keep lifetime short.
- Mask and redact
Turn on CI secret masking. Avoid echoing secrets; scrub command output.
- Rotate and expire
Use dynamic secrets or rotate static ones on a schedule. Document owners and rotation cadence.
- Scan and audit
Run secret scanning on commits and pipelines. Review access logs regularly.
Exercises
These mirror the graded exercises below. Do them now, then take the quick test at the bottom. Note: Test progress is available to everyone; sign in to save your results.
- Exercise 1: Write a secrets inventory and environment mapping (see Exercises section).
- Exercise 2: Draft a pipeline snippet to get short-lived cloud creds via OIDC and upload an artifact.
- Exercise 3: Add log masking and a basic secret scanning policy to your pipeline.
Common mistakes (and self-check)
- Hardcoding secrets in code or YAML
Self-check: Search for patterns like "AKIA", "-----BEGIN PRIVATE KEY-----", or "password=" in your repo. Ensure scanners run pre-commit and in CI. - Long-lived static keys shared across environments
Self-check: Verify each environment uses separate principals and policies; prefer dynamic/short-lived credentials. - Overly broad permissions
Self-check: Review IAM policies: are they resource-scoped? Are write permissions necessary? - Secrets leaking to logs
Self-check: Confirm masking is enabled. Grep pipeline logs for known secret fragments (in a safe environment). - Forgetting rotation ownership
Self-check: Each secret has an owner, rotation interval, and a runbook. If not, add them.
Practical projects
- Project 1: Convert a pipeline from static cloud keys to OIDC short-lived credentials.
- Project 2: Introduce a secret manager path for DB credentials and remove passwords from CI variables.
- Project 3: Implement repo + CI secret scanning with fail-on-detection and add a suppression process.
- Project 4: Create environment-specific secret namespaces and enforce least privilege via policy tests.
Learning path
- Before this: CI/CD basics for ML, container fundamentals.
- This lesson: Safe secret storage, injection, masking, rotation, scanning.
- Next: Policy-as-code for pipelines, artifact signing, supply chain security (SBOM/attestations).
Next steps
- Complete the exercises and take the quick test below.
- Create a small demo repo where all secrets are fetched dynamically.
- Schedule a rotation drill: rotate one secret end-to-end and verify no downtime.
Mini challenge
Your pipeline trains a model and writes to a feature store, then deploys to a cluster. List all secrets involved, the minimal scope each needs, and which can be short-lived. Propose a rotation plan that doesn’t break deployments.