Topic Not Found

Why this matters

As a Data Platform Engineer, you provision and operate cloud resources that store sensitive data and run critical pipelines. Policy as Code (PaC) turns your security, compliance, and cost rules into automated checks that run on every change. This prevents risky configurations (like public buckets, weak IAM policies, or missing encryption) before they reach production.

Real tasks you will do: enforce encryption on data storage, block public exposure, require tagging for cost allocation, restrict IAM wildcards, and validate network egress rules.
Outcomes: fewer incidents, faster reviews, consistent guardrails across teams, and auditable compliance.

Concept explained simply

Policy as Code is just rules written in a programming language, evaluated against your infrastructure definitions (e.g., Terraform plan) or runtime resources (e.g., Kubernetes). If a rule is violated, the change is blocked or flagged.

Inputs: the thing to check (Terraform plan JSON, cloud resource JSON, Kubernetes admission request).
Policy: rules that must be true (e.g., "all buckets must be encrypted"), typically in languages like Rego (OPA) or Sentinel.
Decision: allow/deny or a list of violations.
Enforcement point: pre-commit, CI, pull request, or runtime admission controller.

Mental model

Think of PaC as unit tests for your infrastructure. The policy is the test file; the plan or resource is the subject under test. CI runs the tests on every change and fails fast when a rule breaks.

Core building blocks

Policy language: Rego (Open Policy Agent) is a common, vendor-neutral choice.
Decision API: evaluates input JSON against policies and returns allow/deny + reasons.
Policy bundles: collections of rules versioned with Git.
Enforcement modes: soft-fail (warn) for onboarding; hard-fail (block) for strict environments.

Worked examples

Example 1: S3 bucket must be private and encrypted

Show Rego policy

package infra.guardrails

deny[msg] {
  input.resource.type == "aws_s3_bucket"
  not input.resource.encryption.enabled
  msg := sprintf("Bucket %s must enable default encryption", [input.resource.name])
}

deny[msg] {
  input.resource.type == "aws_s3_bucket"
  input.resource.public_access == true
  msg := sprintf("Bucket %s must block public access", [input.resource.name])
}

Sample input

{
  "resource": {
    "type": "aws_s3_bucket",
    "name": "raw-data",
    "public_access": true,
    "encryption": {"enabled": false}
  }
}

Expected decision

{
  "deny": [
    "Bucket raw-data must enable default encryption",
    "Bucket raw-data must block public access"
  ]
}

Example 2: Require tags for cost and ownership

Show Rego policy

package infra.guardrails

required_tags := {"env", "owner"}

deny[msg] {
  r := input.resource
  t := required_tags[_]
  not r.tags[t]
  msg := sprintf("Resource %s missing tag: %s", [r.id, t])
}

Sample input

{
  "resource": {
    "id": "vpc-1234",
    "type": "aws_vpc",
    "tags": {"env": "prod"}
  }
}

Expected decision

{
  "deny": [
    "Resource vpc-1234 missing tag: owner"
  ]
}

Example 3: Forbid wildcard IAM policies

Show Rego policy

package infra.guardrails

deny[msg] {
  input.resource.type == "aws_iam_policy"
  stmt := input.resource.document.Statement[_]
  stmt.Effect == "Allow"
  stmt.Action[_] == "*"
  msg := sprintf("IAM policy %s allows wildcard Action", [input.resource.name])
}

deny[msg] {
  input.resource.type == "aws_iam_policy"
  stmt := input.resource.document.Statement[_]
  stmt.Effect == "Allow"
  (stmt.Resource == "*") or (is_array(stmt.Resource) and stmt.Resource[_] == "*")
  msg := sprintf("IAM policy %s allows wildcard Resource", [input.resource.name])
}

Sample input

{
  "resource": {
    "type": "aws_iam_policy",
    "name": "data-admin",
    "document": {
      "Statement": [
        {"Effect": "Allow", "Action": ["*"], "Resource": "*"}
      ]
    }
  }
}

Expected decision

{
  "deny": [
    "IAM policy data-admin allows wildcard Action",
    "IAM policy data-admin allows wildcard Resource"
  ]
}

How to wire this into your workflow

Local: run policy checks on Terraform plan before pushing.
CI: on pull request, generate plan JSON and evaluate policies; fail if any deny.
Environments: start with warn-only in dev; enforce hard fail in prod.
Runtime (optional): use an admission controller pattern for Kubernetes to validate manifests before they are admitted.

Minimal CI steps (conceptual)

# 1) Generate plan JSON
terraform init
terraform plan -out=tf.plan
terraform show -json tf.plan > plan.json

# 2) Evaluate policies (example with OPA)
opa eval -i plan.json -d policies 'data.infra.guardrails.deny'

Minimal starter policy set for data platforms

Data storage: encryption at rest, block public access, lifecycle rules for object retention.
Networking: no internet-facing subnets for data plane; egress via approved endpoints.
Identity: forbid wildcard IAM; require least-privilege patterns.
Tagging: env, owner, cost-center for every resource.
Backups: require automated backups/snapshots on stateful services.

Exercises

Complete the exercise below. The Quick Test at the end is available to everyone; only logged-in users get saved progress.

Exercise 1 — S3 guardrails policy (mirror of the exercise below)

Write a Rego policy that denies:

Buckets with public_access == true
Buckets without encryption.enabled == true
Buckets without versioning.enabled == true

Use the provided input and return clear messages including the bucket name.

Input to validate

{
  "resource": {
    "type": "aws_s3_bucket",
    "name": "raw-data",
    "public_access": true,
    "encryption": {"enabled": false},
    "versioning": {"enabled": false}
  }
}

Expected output

{
  "deny": [
    "Bucket raw-data must block public access",
    "Bucket raw-data must enable default encryption",
    "Bucket raw-data must enable versioning"
  ]
}

Hints

Create three deny rules; each checks one condition.
Use sprintf to include the bucket name in messages.
Check boolean fields carefully; missing fields should also trigger a deny.

Self-check checklist

Policy returns zero denies when all conditions are satisfied.
Messages are human-readable and actionable.
Each rule is focused (one responsibility per rule).
Edge cases covered: missing fields, nulls, unexpected types.

Common mistakes and how to catch them

Only checking one resource type: ensure policies generalize or clearly scope to target resources.
Weak messages: write messages that tell engineers exactly what to fix.
All-or-nothing rollout: start in warn-only mode to build trust, then enforce.
No tests for policies: add simple unit tests with positive and negative cases.
Drift between policy and architecture: review policies whenever platform patterns change.

Mini challenge

Extend the tagging policy to also require tag values to match a regex (e.g., env in {dev,stg,prod}). Add negative and positive test inputs and verify messages.

Who this is for

Data Platform Engineers building Terraform modules and platform guardrails.
DevOps/SREs collaborating on secure-by-default infrastructure for data workloads.

Prerequisites

Basic Terraform knowledge (resources, plan).
JSON familiarity.
Command-line basics.

Learning path

Start: understand inputs and decisions (this lesson).
Next: write small, targeted policies and add them to CI.
Scale: group policies by domain (identity, network, storage) and add tests.

Practical projects

Project 1: Policy bundle for data storage (encryption, public access, lifecycle).
Project 2: Identity bundle (no wildcards, restricted assume-role, MFA conditions).
Project 3: CI integration that fails PRs on policy violations and posts messages.

Next steps

Convert your top 3 platform guardrails into policies.
Run them in warn-only for a week, capture feedback, refine, then enforce.
Document how to fix each violation directly in the message text.

Menu

Policy As Code Basics

Table of Contents

Why this matters

Concept explained simply

Mental model

Core building blocks

Worked examples

Example 1: S3 bucket must be private and encrypted

Example 2: Require tags for cost and ownership

Example 3: Forbid wildcard IAM policies

How to wire this into your workflow

Minimal starter policy set for data platforms

Exercises

Exercise 1 — S3 guardrails policy (mirror of the exercise below)

Self-check checklist

Common mistakes and how to catch them

Mini challenge

Who this is for

Prerequisites

Learning path

Practical projects

Next steps

Practice Exercises

S3 guardrails with Rego

Instructions

Expected Output

Policy As Code Basics — Quick Test

Have questions about Policy As Code Basics?

AI Assistant