luvv to helpDiscover the Best Free Online Tools
Topic 5 of 8

Networking And IAM Provisioning

Learn Networking And IAM Provisioning for free with explanations, exercises, and a quick test (for Data Platform Engineer).

Published: January 11, 2026 | Updated: January 11, 2026

Why this matters

As a Data Platform Engineer, you will provision cloud networks (VPC/VNet), subnets, routing, and identities/roles that control access to data stores, compute, and pipelines. Done right, your platform is secure, reliable, and reproducible. Done poorly, it risks data exposure, outages, and costly rework.

  • Real tasks you will face:
    • Create a private network for Spark clusters with controlled internet egress via NAT.
    • Allow a pipeline role to read a raw data bucket and write only to curated zones.
    • Set up private endpoints to data services so traffic stays on your cloud network.
    • Segment environments (dev/test/prod) with consistent, versioned IaC.

Concept explained simply

Networking defines who can talk to whom. IAM defines who can do what. Infrastructure as Code (IaC) turns both into versioned files you review, test, and apply predictably.

Mental model

  • Network as a town: VPC/VNet is the town, subnets are neighborhoods, route tables are road signs, firewalls/security groups are gatekeepers, NAT/Internet Gateways are bridges to the outside.
  • IAM as a keyring: identities (users/roles/service principals) hold keys; policies define which locks they open; least privilege means you only carry the keys you truly need.
Provider-agnostic mapping (open)
  • AWS: VPC, Subnet, Route Table, Internet/NAT Gateway, Security Group, NACL; IAM users/roles/policies; PrivateLink endpoints.
  • Azure: VNet, Subnet, Route Table, Internet/NAT, NSG; RBAC + custom roles; Private Endpoints; Managed Identities.
  • GCP: VPC, Subnet, Routes, Cloud NAT; Firewall rules; IAM roles/policies; Private Service Connect.

Design principles you can reuse

  • Least privilege first: start with deny, open only required actions and resources.
  • Isolate by environment: separate networks and IAM boundaries per env (dev/test/prod).
  • Private by default: keep data planes private; use NAT and private endpoints for managed services.
  • Idempotent IaC: plans are reviewable; apply should be safe to run repeatedly.
  • Tag everything: environment, owner, cost-center, data-classification.

Worked examples

Example 1: Minimal private network with egress via NAT (Terraform, AWS-style)

variable "vpc_cidr" { default = "10.20.0.0/16" }
variable "azs" { type = list(string); default = ["a","b"] }

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  tags = { Name = "dp-vpc", env = "dev" }
}

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id
}

# Public subnet (for NAT gateways)
resource "aws_subnet" "public" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.20.0.0/24"
  map_public_ip_on_launch = true
  availability_zone       = "${data.aws_region.current.name}${var.azs[0]}"
  tags = { Name = "dp-public" }
}

# Private subnets
resource "aws_subnet" "private_a" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.20.1.0/24"
  availability_zone = "${data.aws_region.current.name}${var.azs[0]}"
}
resource "aws_subnet" "private_b" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.20.2.0/24"
  availability_zone = "${data.aws_region.current.name}${var.azs[1]}"
}

resource "aws_nat_gateway" "nat" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public.id
  depends_on    = [aws_internet_gateway.igw]
}

resource "aws_eip" "nat" { domain = "vpc" }

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.igw.id }
}

resource "aws_route_table_association" "public_assoc" {
  subnet_id      = aws_subnet.public.id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id
  route { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.nat.id }
}

resource "aws_route_table_association" "private_a_assoc" {
  subnet_id      = aws_subnet.private_a.id
  route_table_id = aws_route_table.private.id
}
resource "aws_route_table_association" "private_b_assoc" {
  subnet_id      = aws_subnet.private_b.id
  route_table_id = aws_route_table.private.id
}

# Security group for data workers (egress-only)
resource "aws_security_group" "workers" {
  name   = "dp-workers"
  vpc_id = aws_vpc.main.id
  egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] }
  tags = { purpose = "compute" }
}

data "aws_region" "current" {}

What this does: Your compute stays in private subnets and reaches the internet only through NAT. No inbound from the internet.

Example 2: Least-privilege policy for a data pipeline role (AWS IAM)

# Role assumed by an EC2 instance or a compute service
resource "aws_iam_role" "pipeline_role" {
  name = "dp-pipeline-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = { Service = "ec2.amazonaws.com" }
      Action   = "sts:AssumeRole"
    }]
  })
}

# Allow read-only on raw bucket, write-only on curated bucket
resource "aws_iam_policy" "pipeline_policy" {
  name   = "dp-pipeline-policy"
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect   = "Allow",
        Action   = ["s3:GetObject","s3:ListBucket"],
        Resource = [
          "arn:aws:s3:::raw-data-bucket",
          "arn:aws:s3:::raw-data-bucket/*"
        ]
      },
      {
        Effect   = "Allow",
        Action   = ["s3:PutObject"],
        Resource = ["arn:aws:s3:::curated-data-bucket/*"]
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "attach" {
  role       = aws_iam_role.pipeline_role.name
  policy_arn = aws_iam_policy.pipeline_policy.arn
}

Note the explicit resource ARNs and actions. This avoids broad wildcards and follows least privilege.

Example 3: Private endpoint to a managed data service

Pattern (agnostic):

  • Create a private endpoint interface in your VPC/VNet.
  • Place it in private subnets and attach to the target service (object storage, database, etc.).
  • Update security groups/NSGs to allow only required ports from your compute subnets.
  • Use the service's private DNS zone to resolve to the private IPs.
Why private endpoints?

Traffic never traverses the public internet, reducing exposure and egress costs while keeping routing simple for internal services.

How to structure your IaC for networking and IAM

  • Separate modules: network, security, identities, and data services.
  • Expose safe outputs: e.g., subnet IDs, security group IDs, role ARNs.
  • Parameterize per environment via variables; avoid hard-coding.
  • Add guardrails: format, validate, plan, apply; use policy checks where available.
Minimal module interface example
# variables.tf
variable "env" {}
variable "vpc_cidr" {}
variable "private_subnet_cidrs" { type = list(string) }

# outputs.tf
output "private_subnet_ids" { value = [aws_subnet.private_a.id, aws_subnet.private_b.id] }
output "worker_sg_id" { value = aws_security_group.workers.id }

Common mistakes and self-checks

  • Mistake: Public subnets for compute that handle sensitive data.
    • Self-check: Are your worker nodes in private subnets with no inbound from the internet?
  • Mistake: Wildcard IAM policies (Action="*", Resource="*").
    • Self-check: Can you list exact actions and resource ARNs? If not, you likely over-granted.
  • Mistake: Missing egress path from private subnets.
    • Self-check: Does a default route point to a NAT or egress gateway?
  • Mistake: Forgetting tags.
    • Self-check: Do all resources have env/owner/cost-center tags?
  • Mistake: Breaking state when renaming resources in IaC.
    • Self-check: If you rename, do you map old state to new address to avoid recreation?

Who this is for

  • Engineers building or operating cloud data platforms.
  • Data engineers who need secure, repeatable environments.
  • Platform/SRE engineers integrating data services with private networking and IAM.

Prerequisites

  • Basic understanding of cloud concepts (compute, storage, networking).
  • Comfort with a CLI and a declarative IaC tool (e.g., Terraform syntax basics).
  • Familiarity with JSON/YAML and version control.

Learning path

  1. Review core network concepts: subnetting, routing, security groups/firewalls.
  2. Learn IAM building blocks: identities, roles, policies, trust relationships.
  3. Build a minimal private network with NAT and egress-only rules.
  4. Add private endpoints to managed data services.
  5. Implement least-privilege roles for pipelines and analytics jobs.
  6. Harden with tagging, policy checks, and environment separation.

Exercises

Do these hands-on tasks. A quick checklist follows to self-verify. Solutions are available, but try first.

Exercise 1: Provision a private network with NAT and worker security group

  • Create a VPC/VNet with two private subnets and one public subnet.
  • Ensure private subnets route 0.0.0.0/0 through a NAT gateway (or equivalent).
  • Create a security group/firewall rule allowing egress only.
  • Output the private subnet IDs and the security group ID.

Exercise 2: Create a least-privilege pipeline role

  • Create an IAM role for a compute service (e.g., instances or jobs) with an assume-role policy.
  • Attach a policy that grants read-only to a raw bucket and write-only to a curated bucket (or equivalent storage paths).
  • Restrict actions and resources explicitly; avoid wildcards.

Self-check checklist

  • [ ] Private subnets exist with routes to NAT; no direct internet ingress.
  • [ ] Security group allows egress but no inbound from the internet.
  • [ ] IAM role trust policy limits who can assume it.
  • [ ] Permissions list specific actions and resource ARNs/IDs.
  • [ ] All resources have env/owner/cost-center tags.

Practical projects

  • Data landing zone: A reusable module that deploys network, private endpoints to object storage and warehouse, plus pipeline roles.
  • Environment factory: Parameterized stack that spins up dev/test/prod with identical network shapes and IAM patterns.
  • Access boundary audit: Generate a report of who can access which buckets/tables and compare to a desired access matrix.

Mini challenge

Harden a network by removing all public ingress paths while keeping outbound software updates possible. Document exactly which routes and security rules changed and why.

Hint
  • Move compute to private subnets.
  • Replace internet gateway routes on private subnets with NAT routes.
  • Restrict security groups to required egress ports only.
  • Confirm package mirrors are reachable via NAT.

Next steps

  • Complete the quick test below. Anyone can take it; logged-in learners have progress saved.
  • Extend your IaC with private endpoints to your most-used managed data services.
  • Review IAM policies against least privilege and remove unused permissions.

Practice Exercises

2 exercises to complete

Instructions

Using your IaC tool of choice, create:

  • A VPC/VNet with CIDR 10.20.0.0/16.
  • Two private subnets (10.20.1.0/24 and 10.20.2.0/24) and one public subnet (10.20.0.0/24).
  • A NAT gateway (or Cloud NAT) so private subnets can reach the internet; public subnet has the internet gateway/egress.
  • A security group/firewall that allows all egress, no inbound from the internet.
  • Outputs: private subnet IDs and the security group ID.
Expected Output
Plan shows creation of VPC/VNet, 3 subnets, NAT + routes, security group. Outputs display two private subnet IDs and one security group ID. Instances launched in private subnets have outbound internet access via NAT and no direct inbound from the internet.

Networking And IAM Provisioning — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Networking And IAM Provisioning?

AI Assistant

Ask questions about this tool