Why this matters
Platform Engineers are accountable for guardrails that keep systems safe and compliant without blocking delivery. Policy enforcement and compliance ensures:
- Secure defaults: deny risky configs (e.g., public buckets, privileged pods).
- Evidence for audits: automated proofs of controls and exceptions.
- Consistent behavior: same checks in dev, CI, and prod.
- Fast feedback: issues caught early, fixes are cheaper.
Typical tasks you will do:
- Write policies as code to enforce security standards.
- Integrate checks into CI/CD, admission controllers, and cloud org policies.
- Manage exceptions with time-bound approvals and audit trails.
- Continuously monitor and remediate drift.
Concept explained simply
Policies are rules like “No public S3 buckets” or “No privileged containers.” Enforcement is where these rules are checked and blocked. Compliance proves the rules are in place and working.
Mental model: Define → Enforce → Verify → Improve
- Define: Translate standards (CIS, SOC 2, internal baselines) into policies as code.
- Enforce: Put checks at key control points (pre-commit, CI, admission, runtime).
- Verify: Collect evidence, scan continuously, and alert on drift.
- Improve: Track exceptions, fix root causes, and tighten defaults.
Control types:
- Preventive: block risky change before it lands (e.g., admission policy).
- Detective: find issues after the fact (e.g., nightly scan).
- Corrective: auto-remediate or guide a fix.
Worked examples
Example 1: Kubernetes — block privileged pods
Goal: Prevent pods from running privileged containers except in a break-glass namespace.
Policy intent (plain language): “Deny privileged containers everywhere, allow only in namespace emergency-ops with a time-bound waiver.”
# OPA/Gatekeeper-style intent (illustrative)
deny if: container.securityContext.privileged == true
exception if: namespace == "emergency-ops" AND has_waiver == true AND waiver_expiry > now
message: "Privileged containers are not allowed (use emergency-ops with approved waiver)."
- Enforcement point: K8s admission controller.
- Evidence: Admission logs + waiver record with expiry.
- Detective backup: Daily cluster policy report to catch drift.
Example 2: Terraform — deny public object storage
Goal: Prevent creation of publicly accessible buckets.
# Pseudo Rego for Terraform plan evaluation
package policy.bucket
deny[msg] {
some b
input.resources[b].type == "storage_bucket"
input.resources[b].config.public == true
msg := sprintf("Public bucket denied: %s", [input.resources[b].name])
}
- Enforcement point: CI checks on terraform plan.
- Preventive: Fails the pipeline before apply.
- Evidence: CI job artifacts (policy report) retained for audit.
Example 3: CI — only signed images may deploy
Goal: Ensure only signed, trusted images are used.
# Admission policy pseudo-logic
require: image.signature.trusted == true
on_fail: deny with "Image must be signed by trusted key"
- Enforcement points: CI image signing + cluster admission check.
- Detective: Registry scan for unsigned images to alert and quarantine.
- Evidence: Signing logs, admission decisions, registry reports.
How to implement policies (practical steps)
- List critical risks and map to rules (e.g., public data, privilege escalation, missing encryption).
- Choose control points: pre-commit, CI, IaC plan, admission, runtime, cloud org policies.
- Write policies as code with clear messages and owners.
- Add exceptions workflow: ticket ID, approver, expiry, scope.
- Collect evidence automatically: CI artifacts, audit logs, periodic reports.
- Continuously scan and alert on drift; auto-remediate where safe.
- Review metrics: number of blocks, exception age, MTTR for policy issues.
Checklist: a good policy
- Precise condition and scope (what is denied/allowed).
- Human-friendly denial message with fix guidance.
- Documented exceptions process with expiry.
- Test cases for allowed/denied/exception flows.
- Evidence location and retention period.
Exercises you can do now
Note: The Quick Test is available to everyone; only logged-in users get saved progress.
Exercise 1 (mirrors ex1): Write a deny rule for hostPath volumes
Policy intent: Disallow Kubernetes hostPath volumes everywhere except namespace "ops-tools".
- Deliverable: A short policy-as-code snippet with a denial message and the one-namespace exception.
- Edge cases: What if the pod has multiple volumes? What if the namespace label says "approved"?
Need a nudge?
- Iterate volumes and detect hostPath.
- Allow when namespace == ops-tools; deny otherwise.
- Include a message that tells the user how to fix.
Exercise 2 (mirrors ex2): Enforcement plan for storage encryption + no public access
Goal: "All storage must be encrypted at rest; public access is not allowed." Create a plan mapping each control point to a specific check, plus evidence and exception handling.
- Deliverable: A short plan covering CI, IaC plan, cloud org policy, and monitoring.
- Include: Who reviews exceptions, expiry, and how evidence is stored.
Need a nudge?
- Preventive first: org policies and IaC checks.
- Detective later: continuous cloud scans.
- Evidence: CI artifacts, audit logs, scan reports.
Exercise checklist
- Clear rule(s) and scope.
- Defined control points.
- Exception workflow with expiry.
- Evidence sources and retention.
- How to test the policy.
Common mistakes and how to self-check
- Vague rules: If a developer cannot tell what to fix, the rule is unclear. Self-check: Does the message name the exact field and a safe alternative?
- Only detective controls: If you detect but don't block, risk persists. Self-check: Do you have at least one preventive control per critical risk?
- Forever exceptions: Waivers without expiry become permanent holes. Self-check: Do all exceptions have end dates and owners?
- Policy drift: Policies differ across stages. Self-check: Same policy bundle used in CI and prod?
- No evidence: Passing checks but nothing stored. Self-check: Can you retrieve last month’s reports in minutes?
Practical projects
- Project A: Stand up a policy bundle that denies public buckets and unsigned images. Integrate with CI and admission; publish a daily compliance report.
- Project B: Exception management: implement a simple waiver format (ticket ID, approver, expiry) and validate it in policies.
- Project C: Drift detection: schedule scans that compare desired policies vs. live state and open tickets on violations.
Learning path
- Learn policy basics: preventive vs. detective vs. corrective.
- Pick a policy engine and write 3 core rules (privileged pods, public storage, required tags).
- Integrate in CI (plan evaluation) and in-cluster admission.
- Add exceptions with expiry and audit logging.
- Automate evidence collection and reporting.
- Scale with versioned policy bundles and unit tests.
Who this is for
- Platform/DevOps engineers enabling safe, fast delivery.
- SREs and security engineers building guardrails.
- Backend engineers contributing to infrastructure code.
Prerequisites
- Basic Kubernetes, CI/CD, and Infrastructure as Code (e.g., Terraform).
- Familiarity with cloud IAM concepts and audit logging.
- Comfort reading simple policy/pseudo-code.
Mini challenge
Write a short policy intent and enforcement plan for: "All resources must have cost-center and owner tags; deny creation otherwise, allow exceptions for migrations for 14 days." Include control points, exception fields, and evidence.
Next steps
- Refine your two exercise outputs with clearer messages and tests.
- Add one detective control (daily scan) and one corrective control (auto-fix tag) to your plan.
- Take the quick test to validate understanding.
Take the Quick Test
Everyone can take the test; only logged-in users have progress saved.