luvv to helpDiscover the Best Free Online Tools
Topic 6 of 8

Secrets Rotation

Learn Secrets Rotation for free with explanations, exercises, and a quick test (for Data Platform Engineer).

Published: January 11, 2026 | Updated: January 11, 2026

Why this matters

Secrets rotation means changing credentials (passwords, API keys, tokens, certificates) on a schedule or event so that leaked or old credentials stop working. As a Data Platform Engineer, you run pipelines, orchestrators, and services that touch databases, warehouses, message buses, and cloud storage. Rotating secrets reduces blast radius, meets compliance, and prevents outages caused by expired credentials.

  • Real tasks you will face: rotate warehouse passwords used by Airflow without breaking DAGs; rotate Kafka SASL credentials for streaming jobs; replace cloud storage access keys used by batch jobs; rotate TLS certificates for internal data APIs; migrate long-lived keys to short-lived, automatically refreshed credentials.

Concept explained simply

A secret is a key to a door. Rotation is changing that key frequently so old copies become useless. Good rotation keeps the door open to your services while denying old keys.

Mental model

  • Secret inventory: know every place a secret lives (source of truth, consumers, caches, config files).
  • Dual key phase: old key and new key both valid while you roll out updates.
  • Cutover: switch consumers to the new key and remove the old one.
  • Verification and revoke: confirm traffic uses the new key, then disable and delete the old one.
Common rotation strategies
  • Time-based: rotate every N days.
  • Event-driven: rotate after suspected leak, staff changes, or scope changes.
  • On-demand: rotate before a risky deployment.
  • Just-in-time credentials: dynamically issued, short-lived credentials with automatic expiry.

Key principles and safe defaults

  • Use a central secret store with versioning and access audit.
  • Prefer short-lived credentials (tokens, dynamic DB users) over long-lived keys.
  • Automate rotation and propagation; avoid manual updates.
  • Use dual credentials or versioned keys for zero-downtime cutover.
  • Keep a runbook with rollback steps and a break-glass account.
Rotation frequency guidelines
  • High-risk secrets (root access keys, privileged DB users): 7–30 days or move to short-lived tokens.
  • Service-to-service tokens with automatic refresh: token TTL minutes–hours; rotate signing keys every 30–90 days.
  • Certificates: rotate before 2/3 of lifetime is consumed; automate renewal.

Worked examples (step-by-step)

Example 1 — Zero-downtime rotation of a Postgres warehouse password used by Airflow

  1. Prepare dual credentials:
    • Create a second DB user or change the password and temporarily allow both (old and new) via a grace window if supported. If only one password is supported, create a new user with same roles.
  2. Update secret store:
    • Write the new password as a new version (e.g., secret warehouse_password v2).
  3. Update consumers safely:
    • Update Airflow Connection to reference the new version key name (not the literal value) or trigger secret refresh.
    • Deploy and restart only the worker/components that cache secrets.
  4. Verify:
    • Check connection tests and logs show successful auth with the new user/password.
  5. Revoke old:
    • Disable or drop the old user/password. Remove fallback grants.
Verification checklist
  • All DAG tasks using the warehouse succeed.
  • No connection retries due to auth failures.
  • DB auth logs show connections from the new user only.

Example 2 — Rotating cloud object storage access keys for batch jobs

  1. Issue new key pair in secret store or IAM.
  2. Publish as new secret version; keep old keys active during rollout.
  3. Roll through jobs:
    • Update job configs to reference the secret name, not hardcoded values.
    • Redeploy in waves; confirm a canary job passes.
  4. Cutover and disable old key after all consumers confirmed.
Signals of success
  • Batch jobs read/write without elevated 403/401 rates.
  • Access logs show only the new access key in use after cutover.

Example 3 — Rotating Kafka SASL/SCRAM credentials

  1. Create a new SCRAM user or update password and retain old user during transition.
  2. Update secret version for clients (producers/consumers).
  3. Rolling restart clients by consumer group to avoid lag spikes.
  4. Verify group lag stays stable and auth errors do not spike.
  5. Remove old user.
Tip: phased restarts
  • Restart 20–30% of clients at a time; watch lag and error metrics.

Implementation patterns

  • Versioned secrets: consumers always fetch the latest version by name; store handles versions.
  • Dynamic credentials: for databases, issue per-service users with short TTL; renew automatically.
  • Token refresh: OAuth/OIDC service accounts with short-lived access tokens and longer-lived refresh tokens; rotate signing keys regularly.
  • Certificates: automate CSR, issuance, and renewal; deploy with overlapping validity and proactive reloads.
Rollout and rollback runbook template
  • Pre-rotation: inventory consumers, identify caches, prepare monitoring and a fallback path.
  • Rotation: create new secret version, update consumers, verify, revoke old.
  • Rollback: if failures, revert consumers to old version, re-enable old credentials, investigate, retry later.

Who this is for and prerequisites

Who this is for

  • Data Platform Engineers maintaining warehouses, orchestration (e.g., Airflow), streaming platforms, and data services.
  • Engineers responsible for platform security and reliability.

Prerequisites

  • Know how your secret store works (versioning, access control).
  • Basic DB, messaging, and storage auth models.
  • Familiarity with your deployment method (CI/CD, rolling restarts).

Learning path

  • Before this: secret storage basics, least privilege, service identity.
  • This module: rotation patterns, zero-downtime rollouts, verification and rollback.
  • After this: audit logging for secret usage, monitoring auth failures, incident response for leaked credentials.

Common mistakes and self-check

  • Single cutover without dual keys: causes downtime if any consumer lags. Self-check: do you have overlap where both old and new work?
  • Hardcoded secrets in code or env files: consumers won’t update. Self-check: can you rotate by changing only the secret store?
  • Ignoring caches: long-lived process caches cause auth failures. Self-check: list all components that need restart or refresh.
  • No verification: revoking old too early breaks late or batch jobs. Self-check: have you checked logs/metrics that only the new secret is used?
  • Untracked consumers: shadow scripts break silently. Self-check: maintain an inventory and scan repos for secret references.

Practical projects

  • Build a rotation runbook for your warehouse credentials, including dual-user rollout and automated verification queries.
  • Implement short-lived tokens for a data API with automatic refresh and key rotation.
  • Create a dashboard that alerts when any service uses an old secret version after cutover.

Exercises

Exercise 1 — Design a zero-downtime rotation for a warehouse credential

Scenario: Airflow connects to a Postgres warehouse via a central secret name warehouse_password. You need to rotate today.

  1. Draft the steps to rotate with no downtime.
  2. List what to monitor during rollout.
  3. Define rollback steps.
Hints
  • Think dual credentials and versioned secrets.
  • Consider Airflow’s secret caching and connection reloads.
Expected output
  • A step list including new version creation, phased consumer updates, verification, and revocation.
  • A monitoring list: auth errors, task success rate, DB auth logs.
  • Rollback: revert to old version, re-enable old user.
Show solution
  1. Create new DB user or set new password; grant same roles.
  2. Publish as warehouse_password version v2.
  3. Update Airflow to fetch latest version; restart scheduler and affected workers.
  4. Run a canary DAG; check logs and DB auth for new user.
  5. Roll through remaining DAGs; watch auth errors.
  6. Disable old user and delete old secret version after 24h grace window.
  7. Rollback: point Airflow back to v1, re-enable old user if needed.

Exercise 2 — Map rotation method to scenario

Match each scenario to the most suitable rotation method:

  • A) TLS cert for internal data API expiring in 10 days
  • B) Long-lived access key found in a public repo
  • C) DB creds used by many batch jobs with weekly schedule
  • D) Service-to-service auth using OIDC tokens

Methods: 1) Automated renewal with overlap, 2) Immediate revoke and reissue, 3) Dual credentials with staged rollout, 4) Short-lived tokens with automatic refresh.

Hints
  • Think urgency vs. safety.
  • Consider how often tokens naturally expire.
Expected output

A)1, B)2, C)3, D)4

Show solution
  • A→1: Automate certificate renewal and deploy with overlap.
  • B→2: Treat as incident, revoke immediately, reissue, rotate dependents.
  • C→3: Use dual users and staged rollout to avoid breaking weekly jobs.
  • D→4: Use short-lived OIDC tokens with automatic refresh and rotate signing keys periodically.

Rotation checklist

  • Inventory all consumers of the secret.
  • Create new version or new principal with identical permissions.
  • Roll out to a canary first; monitor errors.
  • Update all consumers; handle caches and restarts.
  • Verify only new secret is used.
  • Revoke old and delete safely.
  • Document results and next scheduled rotation date.

Mini challenge

Pick one production secret and write a one-page rotation plan using the checklist. Include exact verification queries or metrics and a 10-minute rollback plan.

Next steps

  • Automate one rotation end-to-end with your CI/CD and secret store.
  • Set alerts for secrets nearing expiration or still using old versions after cutover.
  • Adopt short-lived credentials where possible to reduce manual rotations.

Quick Test

Take the quick test below. Available to everyone; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Scenario: Airflow connects to a Postgres warehouse via a central secret name warehouse_password. You need to rotate today. Produce steps, monitoring, and rollback.

Expected Output
A step-by-step plan including dual credentials, secret version update, phased rollout, verification signals, and rollback steps.

Secrets Rotation — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Secrets Rotation?

AI Assistant

Ask questions about this tool