Why this matters
Secure defaults and guardrails make the safest path the easiest path. As a Platform Engineer, you bake security into templates, policies, and pipelines so product teams ship quickly without creating avoidable risk.
- Spin up a new service: it should have logging, encryption, and least-privilege by default.
- Deploy to Kubernetes: unsafe pods should be blocked automatically.
- Provision cloud resources: public access and unencrypted storage should be prevented unless there is a reviewed exception.
Real tasks you will do
- Create golden templates (IaC, CI/CD) with safe defaults.
- Write policy-as-code to block risky changes before merge/deploy.
- Design an exception process with time-bound approvals and audit.
- Measure coverage and drift to keep guardrails effective.
Concept explained simply
Secure defaults are the pre-set configurations everyone starts with (e.g., encryption on, private networking). Guardrails are controls that prevent or warn on dangerous deviations (e.g., policy checks that fail builds).
Mental model
Think of a highway with speed limit and lane barriers:
- The speed limit signs are secure defaults: clear, consistent baseline.
- The lane barriers are guardrails: they stop you from driving off the road.
- There are emergency gates: controlled exceptions with extra checks and time limits.
Design principles
- Opt-out, not opt-in: start secure, require justification to loosen.
- Golden paths: publish easy, well-documented templates that pass all checks.
- Shift left: block issues in PR or pre-merge, not in production.
- Low friction: give actionable errors and remediation hints.
- Traceable exceptions: documented reason, owner, expiry, and scope.
- Measure everything: coverage, block rates, false-positive rates, and mean time to exception closure.
Worked examples
Example 1 — Kubernetes namespace baseline (secure defaults)
Objective: ensure any pod is limited, isolated, and non-root by default.
# Namespace template values (Helm) produce:
apiVersion: v1
kind: ResourceQuota
spec:
hard:
requests.cpu: "2"
requests.memory: 4Gi
limits.cpu: "4"
limits.memory: 8Gi
---
apiVersion: v1
kind: LimitRange
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: 512Mi
defaultRequest:
cpu: "200m"
memory: 256Mi
---
# Default deny egress + ingress, teams must explicitly open flows
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
spec:
podSelector: {}
policyTypes: ["Ingress","Egress"]
ingress: []
egress: []
---
# Pod Security: baseline/restricted
apiVersion: policy
kind: PodSecurity
# or use built-in Pod Security Admission labels at namespace level
metadata:
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
Guardrail: admission policy (OPA/Gatekeeper or Kyverno) that denies pods without runAsNonRoot, readOnlyRootFilesystem, and drops all capabilities except those explicitly allowed.
# Kyverno example policy snippet
apiVersion: kyverno.io/v1
kind: ClusterPolicy
spec:
rules:
- name: require-seccontext
match:
resources:
kinds: ["Pod"]
validate:
message: "Pods must run as non-root with read-only root FS"
pattern:
spec:
securityContext:
runAsNonRoot: true
containers:
- securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
Example 2 — Terraform S3 module (secure-by-default)
Objective: all buckets encrypted, private, and logged by default.
# Module variables have safe defaults, not booleans left false
data "aws_kms_key" "s3" { key_id = "alias/org-s3" }
resource "aws_s3_bucket" "this" {
bucket = var.name
force_destroy = false
}
resource "aws_s3_bucket_public_access_block" "this" {
bucket = aws_s3_bucket.this.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
bucket = aws_s3_bucket.this.id
rule { apply_server_side_encryption_by_default { kms_master_key_id = data.aws_kms_key.s3.arn sse_algorithm = "aws:kms" } }
}
# Access logs on by default (module creates a central logging bucket)
# Consumers can override, but not disable without an exception
Guardrail: policy-as-code (e.g., OPA/Conftest) that fails PR if s3 public access or SSE is missing.
# OPA Rego sketch
package terraform.s3
deny[msg] {
input.resource.type == "aws_s3_bucket_public_access_block"
not input.resource.values.block_public_acls
msg := "S3 buckets must block public ACLs"
}
deny[msg] {
input.resource.type == "aws_s3_bucket_server_side_encryption_configuration"
not input.resource.values.rule.apply_server_side_encryption_by_default.sse_algorithm
msg := "SSE-KMS must be enabled"
}
Example 3 — CI/CD pipeline guardrails
Objective: stop risky artifacts before deployment.
# Pseudocode for a shared pipeline template (e.g., GitHub Actions, GitLab CI)
jobs:
sast:
steps:
- run: code-scan --min-severity=medium --fail-on=found
deps:
steps:
- run: dep-audit --deny="critical|high" --fail-on=found
image_scan:
steps:
- run: trivy image $IMAGE --severity HIGH,CRITICAL --exit-code 1
sign:
steps:
- run: cosign sign $IMAGE
verify:
needs: [sign]
steps:
- run: cosign verify --key org.pub $IMAGE
policy_check:
needs: [verify]
steps:
- run: conftest test ./ --policy ./policy
deploy:
needs: [policy_check]
if: ${{ success() }}
Defaults: All projects inherit this pipeline. Teams can add extra checks, but removing steps requires a documented exception.
Example 4 — Egress guardrail with controlled exceptions
Default: all workloads go through an egress gateway/proxy; direct internet egress is denied.
Guardrail: network policy and firewall rules enforce proxy usage. Exception: a short-lived allow rule with ticket, owner, scope (FQDNs), and expiry.
How to implement in your platform
- Inventory risks and pick defaults
Decide baselines for identity, network, storage, compute, CI, and observability. Write them down as testable rules. - Provide golden templates
Publish IaC modules and service templates that already pass scans and policies. Include examples and remediation messages. - Enforce with policy-as-code
Run policy checks in PRs and at deploy time. Make failure messages clear and link to the default template or fix instructions. - Create an exception process
Define who can approve, max duration, and what compensating controls apply. Store exceptions as code (YAML) and auto-expire. - Measure and iterate
Track coverage (% resources on golden paths), block rates, mean time to exception approval/closure, and number of expired exceptions auto-revoked.
Self-check questions
- Can a new team deploy without talking to security and still be safe?
- Do your policies fail early with helpful messages?
- Are exceptions time-bound, visible, and auditable?
- Can you show metrics that guardrails are reducing risk and friction?
Exercises
The exercise below mirrors the task in the Exercises section of this page.
Exercise 1: Kubernetes secure namespace baseline
Goal: define secure defaults for a new namespace and a guardrail that blocks non-compliant pods.
- Create a namespace template that applies resource quotas, default requests/limits, and Pod Security restricted mode.
- Add a default deny NetworkPolicy for ingress and egress.
- Write a Kyverno or Gatekeeper policy that denies pods without runAsNonRoot and readOnlyRootFilesystem.
- Describe an exception path (who approves, max duration, how it’s recorded).
Hints
- Use Pod Security Admission labels for restricted baseline.
- NetworkPolicy with empty ingress/egress arrays denies by default.
- Admission policy should provide a clear failure message and an example patch.
- Exceptions should include owner, scope, and expiry.
Expected output: YAML for namespace resources and policy, plus an exception description.
- Quotas and limits set
- Pod Security restricted enabled
- Default deny network policy present
- Admission policy blocks unsafe pods
- Exception process documented
Common mistakes and how to self-check
- Only documenting defaults, not enforcing them
Fix: add policy checks in CI and at admission time. Self-check: break the rule on a test branch—does it fail? - Overly strict with no escape hatch
Fix: add a clear exception process with expiry and compensating controls. Self-check: can a team request a time-bound exception in one file/change? - Silent failures
Fix: make error messages actionable with copy-pasteable patches. Self-check: do logs and CI output show exactly what to change? - Defaults that fight developers
Fix: co-design golden paths with product teams, measure friction. Self-check: count how many teams adopt templates without alteration. - Drift over time
Fix: schedule periodic conformance scans. Self-check: do you have a drift report by project/namespace?
Practical projects
Project 1 — Golden service template
- Deliverables: repo template with CI checks (SAST, deps, image scan, signature), service manifest with resource limits, liveness/readiness, restricted security context, telemetry pre-wired.
- Acceptance: a new repo from the template deploys cleanly with zero policy violations.
Project 2 — Policy pack for cloud storage
- Deliverables: OPA/Conftest rules for S3/Blob storage enforcing encryption, no public access, access logging, lifecycle policies.
- Acceptance: PR with a non-compliant bucket fails with a clear message and suggested fix.
Project 3 — Egress control with exceptions
- Deliverables: default deny egress, allow via proxy; exception CRD/YAML with owner, reason, allowed domains, expiry.
- Acceptance: exception auto-revokes on expiry and is visible in audit logs.
Who this is for and prerequisites
Who this is for
- Platform and DevOps engineers building shared infrastructure.
- Security engineers implementing policy-as-code.
- Developers who own service templates.
Prerequisites
- Basic Kubernetes (namespaces, deployments, policies).
- Intro Terraform or other IaC.
- Familiarity with CI/CD concepts.
Learning path
- Before: Identity and access basics, networking fundamentals.
- Now: Secure defaults and guardrails.
- Next: Policy-as-code deeper dive, secrets management automation, runtime protections.
Next steps
- Adopt golden templates for new services.
- Add policy checks to PRs for your top three risky resources.
- Roll out a lightweight exception process with auto-expiry.
Note: The Quick Test on this page is available to everyone for free. If you are logged in, your progress will be saved.
Mini challenge
Pick one default that is currently optional in your org (e.g., encryption at rest). Make it the default in your template and add a policy that blocks disabling it without an exception. Measure adoption after two weeks.