Why this matters
As a Data Engineer, you move sensitive data across networks and store it in filesystems, object stores, and databases. Proper encryption protects customers, satisfies regulations, and prevents costly incidents.
- Real tasks you will do: enable TLS on services, set up KMS-managed keys, implement envelope encryption, rotate keys, and verify encryption with audits.
- Outcomes: confidentiality in motion (TLS) and at rest (AES with key management), reduced breach impact, and easier compliance sign-offs.
Concept explained simply
Encryption in transit protects data while it travels between systems. Encryption at rest protects data stored on disks, object storage, or databases.
Mental model
- Transit = a private, tamper-evident tunnel (TLS). Even if someone watches the road, they cannot read the message.
- At rest = a locked box (AES). The data is unreadable without the right key. Keys are guarded by a stronger safe (KMS/HSM).
Key terms (plain English)
- TLS: protocol that encrypts and authenticates network connections.
- mTLS: mutual TLS where both client and server present certificates.
- AES-GCM: modern, fast symmetric encryption for data at rest.
- Envelope encryption: data is encrypted with a Data Encryption Key (DEK); the DEK is encrypted with a Key Encryption Key (KEK) managed by a KMS/HSM.
- KMS: key management service that creates, stores, rotates, and audits keys.
- Perfect Forward Secrecy (PFS): protects past sessions even if a long-term key is compromised later.
How it works in practice
Encryption in transit
- Use TLS 1.2+ (prefer TLS 1.3). Disable weak ciphers. Prefer AES-GCM or CHACHA20-POLY1305.
- Use server certificates from a trusted CA; prefer mTLS within internal data platforms for strong service identity.
- Enable PFS (e.g., ECDHE key exchange). Rotate certificates regularly (90 days is a good default).
- Terminate TLS at the service boundary. For brokers (e.g., Kafka), enable TLS on each broker and for clients.
Encryption at rest
- Encrypt with AES-256-GCM or AES-256-CTR + HMAC for integrity.
- Use envelope encryption: generate a unique DEK per object/file/partition; wrap it with a KEK in KMS; store the encrypted DEK alongside the data.
- Different layers: disk-level encryption, filesystem encryption, database TDE, and application-level encryption. Combine layers as needed.
- Rotate KEKs on a schedule; re-wrap DEKs instead of re-encrypting entire datasets.
Worked examples
Example 1 — Secure a message pipeline (Producers ↔ Kafka ↔ Consumers)
- Create a private CA or use your enterprise CA to issue broker and client certificates.
- Configure each broker to require TLS 1.2+ and client auth (mTLS).
- Distribute client certs via a secure secret store; deny plaintext ports.
Minimal broker config snippet
listeners=SSL://0.0.0.0:9093
ssl.keystore.location=/opt/kafka/keystore.jks
ssl.keystore.password=${SECRET}
ssl.truststore.location=/opt/kafka/truststore.jks
ssl.truststore.password=${SECRET}
ssl.client.auth=required
ssl.enabled.protocols=TLSv1.2,TLSv1.3
ssl.cipher.suites=TLS_AES_256_GCM_SHA384,TLS_CHACHA20_POLY1305_SHA256
Verification: from a client host, run openssl s_client -connect broker:9093 -tls1_3 and confirm the certificate chain and cipher suite.
Example 2 — Encrypt an object store data lake with envelope encryption
- Enable default bucket encryption using a KMS-managed key (KEK).
- For application uploads, generate a DEK per object; encrypt the object with DEK; store the DEK encrypted (wrapped) by KMS.
- Tag sensitive prefixes (e.g.,
pii/) to require stronger policies and tighter access.
Declarative bucket policy example
{
"bucket": "analytics-raw",
"default_encryption": {
"algorithm": "AES256-GCM",
"kms_key_id": "kms/keys/analytics-keK-1"
},
"enforce_tls_incoming": true
}
Application-level envelope encryption pseudocode
dek = KMS.generate_data_key(key_id="kms/keys/analytics-keK-1")
ct, iv, tag = AES_GCM.encrypt(plaintext=data, key=dek.plaintext)
wrapped_dek = KMS.encrypt(key_id="kms/keys/analytics-keK-1", plaintext=dek.plaintext)
store(object_path, ct, metadata={"wrapped_dek": base64(wrapped_dek), "iv": base64(iv), "tag": base64(tag)})
Example 3 — Secure Postgres in transit and at rest
- Transit: enable TLS on the server, require
hostsslentries inpg_hba.conf, and use client certificates for mTLS where possible. - At rest: enable database or disk encryption. If disk-level is used, ensure backups and snapshots are also encrypted.
Postgres TLS snippet
# postgresql.conf
ssl = on
ssl_min_protocol_version = 'TLSv1.2'
ssl_ciphers = 'TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256'
# pg_hba.conf
hostssl all all 0.0.0.0/0 cert clientcert=1
Step-by-step playbooks
Playbook: Turn on TLS for an internal service
- Request a DNS name and certificate (wildcards only if justified).
- Configure server to use TLS 1.2+; disable weak ciphers; prefer PFS suites.
- Optionally require mTLS for internal calls; distribute client certs via a secret manager.
- Set auto-renewal and alerts at 30/14/7 days before expiry.
- Verify with
openssl s_clientand a canary call; block plaintext ports.
Playbook: Enable encryption at rest for a data lake
- Create or choose a KMS key (KEK) with least-privilege access policies.
- Enable default bucket encryption with that KEK.
- Require server-side encryption headers; reject unencrypted uploads.
- For application-managed encryption, implement envelope encryption and store encrypted DEKs with objects.
- Encrypt backups, logs, and query result caches.
Playbook: Rotate keys without downtime
- Create new KEK version in KMS.
- Update producers to use new KEK for new writes.
- Background job re-wraps existing DEKs from old KEK to new KEK.
- After re-wrap completes and validated, disable old KEK version.
Playbook: Verify and monitor
- Transit: inspect connections with
openssl s_client; verify TLS versions and ciphers. - At rest: sample objects/files and confirm encryption metadata or key IDs.
- Enable KMS audit logs; alert on key misuse, denied attempts, or disabled rotation.
Quick checklist
- [ ] All data-plane endpoints enforce TLS 1.2+ (prefer 1.3).
- [ ] Internal services requiring identity use mTLS.
- [ ] Weak ciphers/protocols disabled; PFS enabled.
- [ ] Storage buckets/disks/databases use KMS-backed encryption.
- [ ] DEKs per object/partition; DEKs wrapped with KEK; rotation documented.
- [ ] Backups, logs, and caches encrypted.
- [ ] Certificate and key rotation schedules with alerts.
- [ ] Regular verification steps documented.
Common mistakes and how to self-check
- Only encrypting disks but not backups/snapshots. Self-check: pick a snapshot and confirm encryption metadata.
- Allowing older TLS versions. Self-check: run
nmap --script ssl-enum-ciphersor similar and ensure TLS 1.0/1.1 are blocked. - Sharing one DEK across many files. Self-check: inspect metadata; ensure per-object DEKs.
- Terminating TLS at a proxy but using plaintext to the backend. Self-check: packet capture on backend network should show TLS handshakes, not plaintext.
- Forgetting rotation or alerts. Self-check: show next rotation date for each key; ensure alert rules exist.
Exercises
Complete the exercises below. You can compare with the provided solutions after trying.
- Exercise ex1: Design a TLS plan for a 3-service pipeline (ingest API, Kafka, Spark jobs). Deliver a one-page YAML policy with cert sources, cipher policy, mTLS scope, rotation cadence, and verification steps.
- Exercise ex2: Implement envelope encryption for object storage uploads. Deliver config for default bucket encryption, application pseudocode for DEK generation/wrap, and a rotation playbook.
Mini challenge
Pick one running pipeline and harden both layers in one day: enforce TLS 1.2+ end-to-end and enable default encryption on its storage. Document verification screenshots or command outputs.
Who this is for
- Data Engineers and Platform Engineers responsible for data pipelines and storage.
- Analytics Engineers integrating with secured data sources.
Prerequisites
- Comfort with networking basics (TCP/IP, DNS).
- Ability to edit service configs and use a command line.
- Basic understanding of cloud storage or database administration.
Learning path
- Understand TLS and certificates.
- Enable encryption at rest via your storage/database.
- Implement envelope encryption for sensitive data.
- Automate rotation and verification.
Practical projects
- Harden a message pipeline with mTLS and audited cert rotation.
- Build an ingest microservice that performs envelope encryption before writing to object storage.
- Create a compliance dashboard that surfaces TLS versions, cipher usage, and KMS key age.
Next steps
- Automate certificate issuance and renewal.
- Adopt per-tenant or per-dataset keys to limit blast radius.
- Expand verification into CI/CD checks.
Quick Test
Take the quick test to check your understanding. Available to everyone; if you log in, your progress will be saved.