Who this is for
- Platform and backend engineers who need reproducible cloud networking and access controls.
- Anyone moving manual console work for VPCs, security groups, and IAM into code and CI.
Prerequisites
- Basic Terraform or similar IaC tool familiarity (providers, resources, variables, state).
- Cloud fundamentals: what a VPC/subnet is, what IAM roles/policies are.
- Command line basics and a test cloud account (sandbox).
Why this matters
As a Platform Engineer, you’ll repeatedly:
- Create and evolve VPCs, subnets, route tables, NAT/IGW, and load balancers for multiple environments.
- Harden network boundaries with security groups and NACLs, and avoid accidental open access.
- Model IAM roles, policies, and trust relationships for apps, people, and CI systems with least privilege.
- Review plans, run automated checks, and roll out changes safely across accounts and regions.
Concept explained simply
Networking as code: you describe network components in files (e.g., VPCs, subnets, route tables, gateways, security groups). The tool creates them the same way every time.
IAM as code: you describe who can do what (policies/roles) and who trusts whom (assume-role policies). The tool applies consistent permissions across environments.
Mental model
- Blueprints: Files are the blueprint; the cloud is the construction site.
- Desired state: You write what you want. The tool figures out creates/updates/destroys.
- Least privilege by default: Start with no access. Add only what the app or team needs.
- Composable modules: Package repeatable VPCs and IAM roles as modules for dev/stage/prod.
Key building blocks
- VPC, subnets (public/private), route tables, Internet/NAT Gateways, VPC endpoints.
- Security Groups vs NACLs: SGs are stateful and attached to ENIs; NACLs are stateless on subnets.
- Load balancers (ALB/NLB), DNS records, and certificates.
- IAM principals (users, roles), policies (permission policies), trust policies (who may assume), STS, and OIDC for CI.
- Tags and naming conventions for governance and cost allocation.
Worked examples
Example 1: Minimal VPC with public + private subnets
# Terraform-style HCL (illustrative)
provider "aws" {
region = var.region
}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_support = true
enable_dns_hostnames = true
tags = { Name = "demo-vpc" }
}
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.main.id
}
resource "aws_subnet" "public_a" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
map_public_ip_on_launch = true
availability_zone = var.az_a
tags = { Tier = "public" }
}
resource "aws_subnet" "private_a" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.2.0/24"
availability_zone = var.az_a
tags = { Tier = "private" }
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.igw.id }
}
resource "aws_route_table_association" "public_a" {
subnet_id = aws_subnet.public_a.id
route_table_id = aws_route_table.public.id
}
resource "aws_eip" "nat" { vpc = true }
resource "aws_nat_gateway" "nat" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public_a.id
}
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
route { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.nat.id }
}
resource "aws_route_table_association" "private_a" {
subnet_id = aws_subnet.private_a.id
route_table_id = aws_route_table.private.id
}
Why it’s good
- Public subnet gets Internet via IGW. Private subnet egresses via NAT.
- Clear separation enables private app servers without public IPs.
Example 2: Security group with restricted ingress
resource "aws_security_group" "web_sg" {
name = "web-sg"
vpc_id = aws_vpc.main.id
ingress {
description = "HTTPS from office"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["203.0.113.0/24"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = { Owner = "platform" }
}
Why it’s good
- Ingress is locked to a known CIDR block, not 0.0.0.0/0.
- Descriptive tags support audits and cost tracking.
Example 3: IAM role with least privilege and OIDC trust (CI)
# OIDC provider (example: GitHub Actions). Thumbprint placeholder.
resource "aws_iam_openid_connect_provider" "github" {
url = "https://token.actions.githubusercontent.com"
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [var.github_thumbprint]
}
data "aws_iam_policy_document" "assume_oidc" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
principals { type = "Federated" identifiers = [aws_iam_openid_connect_provider.github.arn] }
condition {
test = "StringEquals"
variable = "token.actions.githubusercontent.com:aud"
values = ["sts.amazonaws.com"]
}
condition {
test = "StringLike"
variable = "token.actions.githubusercontent.com:sub"
values = ["repo:your-org/your-repo:*"]
}
}
}
resource "aws_iam_role" "ci" {
name = "ci-deploy"
assume_role_policy = data.aws_iam_policy_document.assume_oidc.json
}
data "aws_iam_policy_document" "deploy" {
statement {
sid = "DescribeAndDeployMinimal"
actions = [
"ec2:Describe*",
"eks:Describe*",
"iam:PassRole"
]
resources = ["*"]
}
}
resource "aws_iam_policy" "deploy" {
name = "ci-deploy-policy"
policy = data.aws_iam_policy_document.deploy.json
}
resource "aws_iam_role_policy_attachment" "attach" {
role = aws_iam_role.ci.name
policy_arn = aws_iam_policy.deploy.arn
}
Why it’s good
- No long-lived access keys. CI receives short-lived credentials via OIDC.
- Conditions restrict which repo can assume the role. Start minimal; expand only as needed.
Bonus example: IAM role for EC2 instances
data "aws_iam_policy_document" "assume_ec2" {
statement {
actions = ["sts:AssumeRole"]
principals { type = "Service" identifiers = ["ec2.amazonaws.com"] }
}
}
resource "aws_iam_role" "ec2" {
name = "app-ec2-role"
assume_role_policy = data.aws_iam_policy_document.assume_ec2.json
}
resource "aws_iam_role_policy" "ec2_inline" {
name = "read-ssm-params"
role = aws_iam_role.ec2.id
policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Allow",
Action = ["ssm:GetParameter", "ssm:GetParameters"],
Resource = "*"
}]
})
}
Step-by-step: turn requirements into code
- Write inputs: regions, AZs, CIDRs, allowed CIDRs, names, and tags.
- Build smallest viable network (VPC, one public and one private subnet, routing) before adding extras.
- Create security groups per workload. Start deny-all and open only what’s required.
- Model IAM roles with separate trust and permission policies. Add conditions (resource ARNs, tags).
- Plan, review, apply in a sandbox, then promote via environments.
- Codify outputs (IDs, ARNs) for downstream modules.
Review checklist
- [ ] Provider, region, and versions are pinned.
- [ ] Non-overlapping CIDR blocks; clear public/private separation.
- [ ] No 0.0.0.0/0 ingress unless justified and time-bound.
- [ ] IAM policies least-privilege with conditions.
- [ ] Tags applied consistently.
- [ ] Outputs expose only what downstream components need.
Exercises
Everyone can do the exercises and take the test. Only logged-in users have their progress saved.
Exercise 1: Minimal VPC + IAM role
Goal: Create a small VPC with one public and one private subnet, basic routing, and an EC2 role with least privilege to read SSM parameters.
- Define provider, variables (region, AZs, CIDRs), and a VPC with DNS enabled.
- Add a public subnet + IGW + route table to 0.0.0.0/0.
- Add a private subnet + NAT Gateway + route table to 0.0.0.0/0 via NAT.
- Create a security group allowing HTTPS from a single CIDR.
- Create an IAM role for EC2 with an inline policy for ssm:GetParameter.
- Export outputs: vpc_id, public_subnet_id, private_subnet_id, role_arn.
Need a nudge?
- Associate subnets to correct route tables using route table associations.
- Enable map_public_ip_on_launch on public subnets.
- Use a trust policy for EC2 service principal.
Expected result
- A plan creating your VPC, two subnets, IGW, NAT, two route tables with associations, one security group, and one IAM role with inline policy.
- Outputs show VPC and subnet IDs and the role ARN.
Show solution
variable "region" {}
variable "az_a" {}
variable "public_cidr" { default = "10.0.1.0/24" }
variable "private_cidr" { default = "10.0.2.0/24" }
provider "aws" { region = var.region }
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_support = true
enable_dns_hostnames = true
tags = { Name = "exercise-vpc" }
}
resource "aws_internet_gateway" "igw" { vpc_id = aws_vpc.main.id }
resource "aws_subnet" "public_a" {
vpc_id = aws_vpc.main.id
cidr_block = var.public_cidr
availability_zone = var.az_a
map_public_ip_on_launch = true
tags = { Tier = "public" }
}
resource "aws_subnet" "private_a" {
vpc_id = aws_vpc.main.id
cidr_block = var.private_cidr
availability_zone = var.az_a
tags = { Tier = "private" }
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.igw.id }
}
resource "aws_route_table_association" "public_a" {
subnet_id = aws_subnet.public_a.id
route_table_id = aws_route_table.public.id
}
resource "aws_eip" "nat" { vpc = true }
resource "aws_nat_gateway" "nat" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public_a.id
}
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
route { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.nat.id }
}
resource "aws_route_table_association" "private_a" {
subnet_id = aws_subnet.private_a.id
route_table_id = aws_route_table.private.id
}
resource "aws_security_group" "web_sg" {
name = "web-sg"
vpc_id = aws_vpc.main.id
ingress {
description = "HTTPS from office"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["203.0.113.0/24"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
data "aws_iam_policy_document" "assume_ec2" {
statement {
actions = ["sts:AssumeRole"]
principals { type = "Service" identifiers = ["ec2.amazonaws.com"] }
}
}
resource "aws_iam_role" "ec2" {
name = "exercise-ec2-role"
assume_role_policy = data.aws_iam_policy_document.assume_ec2.json
}
resource "aws_iam_role_policy" "ssm_read" {
name = "read-ssm-params"
role = aws_iam_role.ec2.id
policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Allow",
Action = ["ssm:GetParameter", "ssm:GetParameters"],
Resource = "*"
}]
})
}
output "vpc_id" { value = aws_vpc.main.id }
output "public_subnet_id" { value = aws_subnet.public_a.id }
output "private_subnet_id" { value = aws_subnet.private_a.id }
output "role_arn" { value = aws_iam_role.ec2.arn }
Common mistakes and self-checks
- Overly broad ingress rules. Self-check: Search for 0.0.0.0/0 in ingress; restrict or add conditions like source CIDR lists.
- Forgetting route table associations. Self-check: Each subnet must have exactly one association.
- IAM trust without constraints. Self-check: Ensure conditions (e.g., repo or audience) exist for OIDC trust policies.
- No version pinning. Self-check: Pin provider and module versions; commit the lock file.
- Mixed public/private workloads. Self-check: Confirm map_public_ip_on_launch is only true for public subnets.
- Leaky permissions. Self-check: Review actions and resources; prefer ARNs over "*" where possible.
Practical projects
- Reusable network module: Parameterize CIDRs, AZ count, and tags. Use it for dev/stage/prod.
- CI deploy role: OIDC-trusted role restricted to a single repo and environment tags.
- Service segregation: Separate SGs for web, app, and db tiers; only necessary ports open between tiers.
Learning path
- Start: Minimal VPC + IAM role (this lesson).
- Next: Load balancer + target groups + HTTPS certs.
- Then: VPC endpoints and private-only architectures.
- Finally: Multi-account architecture with cross-account roles.
Mini challenge
Create an OIDC-trusted IAM role for your CI that can only deploy to staging resources tagged Environment=staging. Add a condition on aws:ResourceTag/Environment and verify plans fail when targeting non-staging.
Next steps
- Refactor today’s code into modules and add variables for names and tags.
- Add a static analysis step (format, validate, plan) to your CI before apply.
- Practice drift detection by changing a SG in console, then running plan to catch it.