Why this matters
Secrets (passwords, API keys, tokens, certs, encryption keys) are the keys to your infrastructure. As a Platform Engineer, you will plan, automate, and audit how secrets are stored, distributed, and rotated. Real tasks include setting rotation schedules, integrating apps with a vault, eliminating hardcoded credentials, enabling zero-downtime DB password changes, and proving compliance through audit logs.
- Reduce breach impact by limiting how long stolen secrets remain valid.
- Meet regulatory requirements with defined rotation cadences and records.
- Enable safer incident response: revoke and reissue quickly.
Quick Test note: Everyone can take the test; only logged-in learners have progress saved.
Concept explained simply
A vault is a secure service that stores, issues, and revokes secrets under strict access policies. Rotation means replacing an existing secret with a new one on a schedule or when risk changes.
Key terms
- Static secret: a long-lived value stored and retrieved.
- Dynamic secret: a short-lived credential minted on demand with a lease/TTL.
- Lease/TTL: time after which a secret auto-expires or must be renewed.
- Revocation: explicitly invalidating a secret before its TTL.
- Secret engine/provider: plugin or backend that creates or stores secrets (e.g., DB, cloud IAM, PKI).
Mental model
Think of a vault as a credential mint with an access gate. Clients present a token to the gate, get a time-limited pass, do work, and the pass expires. Rotation is replacing locks and keys regularly so old stolen keys no longer work.
Rotation patterns
- In-place rotate: change the password for the same identity; clients must refresh connections.
- Blue/green secrets: create a new credential alongside the old one; migrate clients; then revoke old.
- Leased dynamic: rely on the vault to issue short-lived credentials; let leases expire.
- Key ring for encryption: keep multiple key versions; encrypt new data with the latest; keep old versions for decryption; gradually re-encrypt.
Vault vs KMS vs config
- Vault: policy-based brokering of secrets, dynamic credentials, audit logs, leases.
- KMS: cryptographic keys and operations (encrypt/decrypt, sign); pair it with a secret store for non-key secrets.
- Config store: general configuration; do not treat it as a security boundary unless it supports encryption, RBAC, audit, and rotation hooks.
Distribution patterns
- On-start pull: app fetches secrets at startup. Simple but needs restart for rotation.
- Sidecar/agent: manages tokens, renewals, files on tmpfs; app hot-reloads.
- Init container + reload hook: write secret, then signal app to reload on changes.
Worked examples
Example 1: Zero-downtime Postgres password rotation
- Plan blue/green: create a new DB user (e.g., app_user_v2) with the same roles.
- Configure vault to issue the new credential and write it to the app (sidecar or file) with a versioned filename.
- Make the app support live reload of DB credentials and connection pool recycle.
- Switch app to app_user_v2; monitor errors and connection counts.
- Revoke app_user_v1; verify no connections remain; remove old user.
Tip: if you must rotate in-place, force a rolling restart to refresh connections safely.
Example 2: Rotating cloud access keys with dual keys
- Create a second access key for the service account.
- Distribute the new key via vault/agent; update workloads.
- Wait for rollout completion; confirm successful API calls.
- Deactivate and then delete the old key.
Better: use short-lived credentials tied to workload identity so rotation happens automatically.
Example 3: TLS certificate renewal without downtime
- Store cert and key under a versioned path; expose to the app as files.
- Automate renewal before expiry (e.g., 30 days).
- Signal the server to hot-reload certificates when files change.
- Keep previous cert available until all connections migrate.
Always verify that the full chain and private key match; monitor expiry dates.
Example 4: Encryption key rotation with a key ring
- Add a new key version to your key ring; mark as primary for encryption.
- Start encrypting new data with the new key version; keep old versions enabled for decryption.
- Backfill: progressively re-encrypt old data during low-traffic windows.
- After backfill and validation, disable old key versions (do not delete until legal/backup policy allows).
Pre-rotation checklist
- Inventory: which services consume this secret? any batch jobs or offline tools?
- Make rotation reversible: can you enable two credentials or versions at once?
- Canary: test rotation in staging with production-like traffic.
- App readiness: supports hot-reload or rolling restart without downtime.
- Monitoring: set alerts on auth failures and latencies during rotation window.
- Audit plan: who rotated, when, and which systems were affected.
Post-rotation verification
- No authentication failures in logs for 30–60 minutes.
- Old secret revoked; attempted use is denied and logged.
- All replicas/pods have refreshed credentials.
- Dashboards show healthy connection pools and request rates.
Exercises
Complete the exercise below. Then self-check using the checklist.
- Exercise 1: Design a rotation runbook for a database password using blue/green credentials and a vault agent.
Self-check
- Includes who/when/rollback, monitoring, and audit steps.
- Specifies client update mechanism (reload vs restart).
- Covers revocation and verification of the old secret.
Common mistakes and how to self-check
- Hardcoding secrets: Search code and container images for secrets before rotation; remove and replace with vault fetch.
- Single-credential cutover: Avoid flipping a single password without dual access; use blue/green or dynamic leases.
- No app reload: Verify the app reloads secrets or restarts gracefully; simulate rotation in staging.
- Unencrypted at rest: Ensure the secret store encrypts data and audit logs are enabled.
- Forgetting non-prod: Rotate in all environments; attackers often pivot from test to prod.
- Neglecting human access: Rotate admin tokens and enforce MFA and short TTLs.
Practical projects
- Build a rotation runbook and automation for a database user, including health checks and canary rollout.
- Implement a sidecar-based secret refresh for a web service that hot-reloads TLS certificates.
- Create a key ring with versioning and a gradual re-encryption job; include dashboards for progress and errors.
Learning path
- Understand vault fundamentals: authentication to the vault, policies, audit.
- Practice static secret storage and retrieval with app reload.
- Adopt dynamic secrets for databases and cloud IAM.
- Add rotation automation with scheduled jobs and alerts.
- Scale with sidecars/agents and versioned secrets across environments.
Prerequisites
- Basic networking and Linux familiarity.
- Understanding of RBAC and least privilege.
- Comfort with one programming language for service integration.
Who this is for
- Platform and DevOps engineers responsible for infrastructure security.
- Backend engineers integrating services with secret stores.
- SREs and security engineers improving operational resilience.
Next steps
- Do the exercise and take the quick test to measure understanding.
- Automate one real rotation in a non-production environment.
- Prepare a lightweight incident playbook for emergency revocation.
Mini challenge
You inherit a service with a single hardcoded API token. Design a 1-week plan to move it into a vault and rotate it with near-zero downtime. Include: validation steps, app changes, safe rollout, and audit evidence.