Topic Not Found

Who this is for

You build, train, or deploy ML systems and need to keep credentials, tokens, and keys safe in local dev, training jobs, and CI/CD pipelines.

Machine Learning Engineers and Data Scientists automating training/deployments
Platform/DevOps engineers supporting ML infrastructure
Beginners who need a clean, practical baseline for handling secrets

Prerequisites

Basic Git and command line
Familiarity with environment variables
You have run at least one ML training job or pipeline step

Why this matters

Real tasks you will face:

Fetching private datasets (object storage, databases) during training
Pushing models to a private registry or tracking server
Deploying services that need API keys at runtime (feature stores, vector DBs, telemetry)
Rotating keys without code changes or downtime

Good secrets management prevents leaks, audit failures, and outages. It also speeds up onboarding and reduces brittle, hand-edited configs.

Concept explained simply

A secret is anything you would not post publicly: API keys, tokens, passwords, SSH keys, encryption keys, and connection strings.

Basic rule: keep secrets out of source code and logs; inject them at runtime from a secure store with least privilege and rotation.

Mental model

Think of three layers:

Storage: a safe (secret manager, vault, key store)
Delivery: a courier that brings a sealed envelope to your job (CI/CD secret injection, runtime mount, short-lived credentials)
Use: your code opens the envelope, reads the secret from env/file, then locks it back (no printing to logs)

When rotated, only the envelope contents change; your code and pipelines stay the same.

Core building blocks

Storage options: repository/CI secrets, cloud secret managers, vault services, Kubernetes Secrets, parameter stores
Injection methods: environment variables, mounted files, runtime API fetch
Principles: least privilege (scoped access), rotation, audit, no secrets in code or git
Short-lived credentials: prefer identity-based access (workload identity, service principals) over long-lived keys

Secure defaults to adopt

Never commit .env files with real values; commit .env.template
Keep secrets out of build artifacts and model files
Mask secrets in CI logs and avoid echoing them
Rotate on a schedule and on any suspicion of exposure

Worked examples

1) Deploying an inference service on Kubernetes

Create a secret in your secret manager and sync it to your cluster as a Kubernetes Secret for your runtime API key.
Mount the secret as an environment variable for the pod; do not bake it into the image.
Grant the service account least privilege to only the secret it needs.

# Deployment snippet (illustrative)
containers:
- name: inference
  image: registry.example.com/inference:latest
  env:
  - name: RUNTIME_API_KEY
    valueFrom:
      secretKeyRef:
        name: runtime-api
        key: api_key

Outcome: your image is generic; the cluster injects the secret at runtime.

2) Training job needs access to dataset in object storage

Preferred: attach a workload identity/role to the compute job so it gets short-lived access tokens automatically.
Fallback: store access keys in a secret manager; inject as env vars into the training job; the SDK reads them at runtime.
Ensure the role or keys are scoped to read-only for the dataset bucket/prefix.

# Example: environment variables consumed by SDK
DATA_BUCKET=s3://team-datasets/project-42/
AWS_REGION=us-east-1
# Access keys injected by secret manager at runtime, not committed

3) CI publishes a model to a private registry

Save a registry access token as a CI secret (masked).
CI job reads it as an environment variable and runs the publish step.
Token has only the permissions needed (e.g., write:models, read:models), not admin.

# CI step (illustrative)
export REGISTRY_TOKEN="$REGISTRY_TOKEN"
python ci/publish_model.py --token "$REGISTRY_TOKEN" --model dist/model.pkl

Outcome: no tokens in code; CI has controlled access.

A simple, safe process to adopt

During development: use a .env.template with placeholder keys; keep real values in a local .env excluded by .gitignore.
In CI/CD: store secrets in the platform’s secret store; reference them as masked variables; avoid printing them.
In production: prefer identity-based access (service accounts, workload identity) and rotate any static credentials.
Observability: log that credentials are present (boolean), not their values; set monitors for rotation age.

Rotation playbook

Introduce new secret (v2) alongside old (v1)
Update consumers to use v2
Verify
Revoke v1
Document the change and schedule next rotation

Exercises

Do these now. They mirror the graded section.

Exercise 1: Local .env safety baseline

Create .env.template with placeholders and commit it
Create .env with real values and add .env to .gitignore
Load values in a small script without printing secrets

[ ] .env is ignored by Git
[ ] App runs and reads secrets via env
[ ] No secrets printed to logs

Exercise 2: Use a CI secret in a pipeline step

Add a secret in your CI platform (e.g., MODEL_REGISTRY_TOKEN)
Reference it in a job as an environment variable
Run a dummy step that confirms authenticated access without revealing the token

[ ] Secret appears as masked in logs
[ ] Step succeeds without printing the token
[ ] Principle of least privilege applied

Common mistakes and how to self-check

Committing real .env files or sample notebooks with credentials. Self-check: run a secrets scanner locally before pushing; review git history for accidental commits.
Embedding secrets in Docker images. Self-check: run the container and list env; also inspect image layers for stray files.
Overly broad permissions. Self-check: list permissions granted to the service account; remove unused ones and re-run pipeline.
Printing secrets in logs. Self-check: search logs for keys, tokens, or patterns like "AKIA"; verify your CI masks secret values.
Relying on base64 as "encryption". Tip: base64 is encoding, not encryption; use a secret manager or KMS-backed store.

Practical projects

Secure data fetcher: a small script that downloads a private dataset using identity-based access, with optional fallback to secret-injected credentials
Model publisher: a CI job that builds, signs, and publishes a model artifact using a masked token and least-privileged role
Rotation drill: introduce a new token, update consumers, and revoke the old one while keeping all pipelines green

Learning path

Before this: basic Git, CI job definitions, environment variables.

Now: master secrets basics (this page), then:

Short-lived credentials and workload identity
Kubernetes secrets and sealed secrets
Policy as code for least privilege

Next steps

Complete the exercises and the quick test below
Apply the checklist to one active project today
Schedule a rotation window on your team calendar

Mini challenge

Refactor one existing pipeline so that no secret is stored in repo or image. Use identity-based access if possible; otherwise, inject from a secret manager. Prove success by showing a green run and confirming no secrets appear in the diff, image layers, or logs.

Deployment-ready checklist

[ ] Secrets never stored in source control or images
[ ] CI secrets masked; logs show no secret values
[ ] Least privilege enforced on all roles/tokens
[ ] Rotation process documented and tested
[ ] Local dev uses .env.template and ignored .env
[ ] Artifacts and configs contain no embedded secrets

Quick Test

The quick test is available to everyone. Only logged-in users get saved progress.

Menu

Secrets Management Basics

Table of Contents