Skill Not Found

What is Infrastructure as Code (IaC) for Data Platform Engineers?

Infrastructure as Code lets you define and provision data platform infrastructure using code. Instead of clicking through consoles, you write declarative files (for example, Terraform) that create cloud resources for data lakes, warehouses, orchestration, networking, and access. This reduces manual errors, enables repeatable environments (Dev/Stage/Prod), and makes changes auditable through version control.

What you will be able to do

Spin up complete data platform stacks (storage, compute, orchestration, networking, IAM) in minutes.
Standardize environments with reusable modules and policies.
Manage secrets and configs safely across Dev/Stage/Prod.
Detect drift, review changes via pull requests, and roll back safely.

Who this is for

Data Platform Engineers building and operating cloud data platforms.
Data Engineers moving from ad-hoc setup to repeatable, governed infrastructure.
Platform/DevOps engineers collaborating on data-specific stacks.

Prerequisites

Basic cloud familiarity (AWS, Azure, or GCP concepts).
Git fundamentals (branching, pull requests).
Command line basics and a code editor.
Optional but helpful: understanding of data components (S3/GCS/Azure Storage, Redshift/BigQuery/Synapse, Airflow/Kubernetes).

Learning path (practical roadmap)

Milestone 1 — Terraform basics and state

Install Terraform; authenticate to one cloud provider.
Create a simple resource (a bucket) with variables and outputs.
Move state to a remote backend with state locking.

Milestone 2 — Environments: Dev, Stage, Prod

Parameterize resources by environment (variables, workspaces, or directories).
Introduce tags/labels and naming conventions.
Adopt a minimal folder structure per environment.

Milestone 3 — Reusable modules and standards

Extract common patterns (e.g., data lake bucket) into modules.
Pin provider and module versions; add READMEs and examples.
Add tests or validations (terraform validate, fmt, and basic unit checks).

Milestone 4 — Networking and IAM

Provision VPC/VNet, subnets, routing, and security groups.
Create IAM roles/policies or equivalent for data services.
Restrict access using least privilege.

Milestone 5 — Secrets, configs, and policy as code

Fetch secrets from a manager (never hard-code secrets in Terraform).
Validate plans with policy checks (OPA/Conftest, Sentinel, or equivalent).
Redact sensitive outputs; avoid logging secrets.

Milestone 6 — Drift detection, change management, and CI

Detect drift with periodic plans and alerts.
Use pull requests to run plan; require approvals for apply.
Tag releases, document rollbacks, and maintain a change log.

Worked examples

Example 1 — AWS S3 data lake bucket + IAM role

# providers.tf
terraform {
  required_version = ">= 1.5.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.region
}

# variables.tf
variable "region" { type = string }
variable "env" { type = string }

# main.tf
resource "aws_s3_bucket" "data_lake" {
  bucket = "acme-${var.env}-data-lake"
  tags = { env = var.env, owner = "data-platform" }
}

resource "aws_s3_bucket_versioning" "v" {
  bucket = aws_s3_bucket.data_lake.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_iam_role" "etl_role" {
  name               = "acme-${var.env}-etl-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Action = "sts:AssumeRole",
      Effect = "Allow",
      Principal = { Service = "ecs-tasks.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy" "etl_bucket_access" {
  name = "acme-${var.env}-etl-bucket-access"
  role = aws_iam_role.etl_role.id
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Effect = "Allow",
      Action = ["s3:GetObject","s3:PutObject","s3:ListBucket"],
      Resource = [aws_s3_bucket.data_lake.arn, "${aws_s3_bucket.data_lake.arn}/*"]
    }]
  })
}

# outputs.tf
output "bucket_name" { value = aws_s3_bucket.data_lake.bucket }
output "etl_role_arn" { value = aws_iam_role.etl_role.arn }

Run terraform init, terraform plan, terraform apply. Use a remote backend and state locking in real projects.

Example 2 — GCP GCS bucket + service account

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

provider "google" {
  project = var.project
  region  = var.region
}

variable "project" { type = string }
variable "region"  { type = string }
variable "env"     { type = string }

resource "google_storage_bucket" "data_lake" {
  name          = "acme-${var.env}-data-lake"
  location      = var.region
  force_destroy = false
  uniform_bucket_level_access = true
}

resource "google_service_account" "etl" {
  account_id   = "etl-${var.env}"
  display_name = "ETL SA ${var.env}"
}

resource "google_storage_bucket_iam_member" "access" {
  bucket = google_storage_bucket.data_lake.name
  role   = "roles/storage.objectAdmin"
  member = "serviceAccount:${google_service_account.etl.email}"
}

Example 3 — AWS VPC + subnet + security group for warehouse

variable "cidr_block" { default = "10.20.0.0/16" }
variable "env" { type = string }

resource "aws_vpc" "main" {
  cidr_block = var.cidr_block
  tags = { Name = "acme-${var.env}-vpc" }
}

resource "aws_subnet" "private_a" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.20.1.0/24"
  map_public_ip_on_launch = false
  availability_zone       = "us-east-1a"
}

resource "aws_security_group" "warehouse" {
  name   = "acme-${var.env}-warehouse-sg"
  vpc_id = aws_vpc.main.id

  ingress {
    description = "Allow app subnet"
    from_port   = 5439
    to_port     = 5439
    protocol    = "tcp"
    cidr_blocks = ["10.20.0.0/16"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Attach this security group to a data warehouse (e.g., Redshift) module to restrict access to your app/VPC.

Example 4 — Reusable module for a versioned data bucket

Module structure:

modules/
  data_bucket/
    main.tf
    variables.tf
    outputs.tf

# modules/data_bucket/variables.tf
variable "name" { type = string }
variable "tags" { type = map(string) default = {} }

# modules/data_bucket/main.tf
resource "aws_s3_bucket" "this" {
  bucket = var.name
  tags   = var.tags
}

resource "aws_s3_bucket_versioning" "this" {
  bucket = aws_s3_bucket.this.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
  bucket = aws_s3_bucket.this.id
  rule { apply_server_side_encryption_by_default { sse_algorithm = "AES256" } }
}

# modules/data_bucket/outputs.tf
output "bucket" { value = aws_s3_bucket.this.bucket }

# root usage
module "buckets" {
  source = "./modules/data_bucket"
  for_each = { dev = {}, stage = {}, prod = {} }
  name = "acme-${each.key}-data-lake"
  tags = { env = each.key, owner = "data-platform" }
}

Example 5 — Policy as Code (OPA/Rego) to prevent public buckets

Policy file: policy/deny_public_buckets.rego

package terraform.s3

deny[msg] {
  input.resource_type == "aws_s3_bucket"
  acl := input.values.acl
  acl == "public-read" || acl == "public-read-write"
  msg := sprintf("Public ACL not allowed on %s", [input.address])
}

Run conftest against your Terraform plan output to block merges when a public ACL is detected.

Drills and exercises

Create a bucket and enable versioning and encryption using variables only.
Refactor a repeated resource into a module; add inputs/outputs and a README.
Provision a VPC/VNet and restrict inbound traffic to a single CIDR.
Fetch a secret from a secrets manager and pass it as a variable to a service without printing it.
Run terraform plan in a pull request and write a short change summary.
Introduce a policy that blocks creation of public resources; test that it fails the plan appropriately.

Common mistakes and debugging tips

Hard-coding environment values: use variables, locals, and naming conventions.
Unpinned versions: pin provider and module versions to avoid surprise changes.
Secrets in code or state: never commit secrets; use a secrets manager and avoid writing them to state or outputs.
Missing state locking: use a remote backend with locking to prevent concurrent applies.
Overusing depends_on: rely on implicit dependencies via references; add depends_on only when necessary.
Plan noise: separate data and control planes into smaller stacks to keep plans readable.
Not testing destroy: periodically run plan -destroy in non-prod to ensure clean teardown.

Mini project — Minimal Data Platform with IaC

Goal: Provision a minimal data platform stack in a sandbox environment.

Scope

Networking: VPC/VNet, private subnet, and security group.
Storage: versioned data lake bucket with encryption.
Compute/orchestration: managed Airflow or container runtime placeholder (module stub).
Access: IAM role/service account with least privilege.
Secrets: database password stored and fetched from a secrets manager.
Policy: block public buckets.

Steps

Set up remote state and locking.
Create networking and storage with modules.
Create IAM role and attach least-privilege policy for bucket access.
Store a fake password in a secrets manager; reference it as sensitive input.
Add a policy test to prevent public storage.
Open a PR: include the plan output, reviewer checklist, and a rollback plan.

Subskills

Terraform Basics — Write and apply Terraform code with variables, outputs, and remote state.
Environment Provisioning Dev Stage Prod — Structure code to produce consistent, isolated environments.
Reusable Modules And Standards — Build versioned modules with clear inputs/outputs and docs.
Secrets And Config Management — Manage sensitive values via a secrets manager and avoid state leaks.
Networking And IAM Provisioning — Provision VPC/VNet, subnets, security groups, and least-privilege identities.
Policy As Code Basics — Enforce guardrails that block risky infra changes before merge.
Drift Detection Basics — Detect and remediate configuration drift using plans and scheduled checks.
Change Management For Infra — Run plans in PRs, require approvals, and document rollbacks.

Practical projects

Data Lake Starter: module set for storage, logging, and lifecycle with per-environment config.
Warehouse Landing Zone: VPC/VNet + subnet + security group + route tables + NAT where needed.
Airflow on Kubernetes: create a namespace, service account, and secrets; integrate with storage.
Policy Pack: write 3–5 policies preventing public endpoints, weak encryption, and tag omissions.

Next steps

Add CI/CD: validate, format, plan-on-PR, and manual approve apply.
Introduce testing: unit tests for modules and smoke tests post-apply.
Add cost and security checks as part of the policy gate.
Document standards and publish your modules for team reuse.

Menu

Infrastructure As Code

Table of Contents

What is Infrastructure as Code (IaC) for Data Platform Engineers?

What you will be able to do

Who this is for

Prerequisites

Learning path (practical roadmap)

Worked examples

Drills and exercises

Common mistakes and debugging tips

Mini project — Minimal Data Platform with IaC

Subskills

Practical projects

Next steps

Topics

Terraform Basics

Environment Provisioning Dev Stage Prod

Reusable Modules And Standards

Secrets And Config Management

Networking And IAM Provisioning

Drift Detection Basics

Policy As Code Basics

Change Management For Infra

Have questions about Infrastructure As Code?

AI Assistant