Menu

Cloud And Networking Basics

Learn Cloud And Networking Basics for Platform Engineer for free: roadmap, examples, subskills, and a skill exam.

Published: January 23, 2026 | Updated: January 23, 2026

Why this skill matters for Platform Engineers

Cloud and networking basics are the foundation of reliable platforms. As a Platform Engineer, you will design VPCs, control traffic, expose services safely, and keep costs predictable. Mastering these topics unlocks tasks like building secure private networks, setting up load balancers and DNS, connecting services across accounts and regions, and troubleshooting outages quickly.

Who this is for

  • Backend and platform engineers moving into cloud infrastructure work.
  • Developers who need a strong mental model of VPCs, DNS, and load balancers.
  • Ops/SRE engineers who want a structured refresher in networking fundamentals.

Prerequisites

  • Comfortable with Linux basics (shell, files, processes).
  • Basic understanding of IP addressing and routing (CIDR, subnets).
  • Familiarity with at least one cloud provider helps, but examples are provider-neutral.

Learning path

1) Design your first VPC

Plan IP space with room for growth. Create public and private subnets, route tables, an Internet Gateway (IGW), and a NAT for egress.

2) Add DNS and load balancing

Use DNS to name services. Put a load balancer in front of a stateless app. Configure health checks and reasonable TTLs.

3) Connect privately

Link VPCs with peering or a transit layer; avoid overlapping CIDRs. For producer-consumer patterns, consider PrivateLink-style endpoints.

4) Service discovery

Use DNS-based discovery (A/SRV records) or platform-native discovery (e.g., Kubernetes services). Standardize naming.

5) Think multi‑region

Decide active-active vs. active-passive. Define RTO/RPO and the failover mechanism (DNS, Anycast, or application-level).

6) Keep costs in check

Tag resources, enable budgets/alerts, and understand data transfer pricing. Right-size NAT, LB, and egress paths.

7) Manage quotas and limits

Track IP space, ENI limits, and LB targets. Plan ahead for scale events; request increases early.

8) Troubleshoot methodically

Use dig, traceroute, curl, and flow logs. Trace traffic hop by hop and verify each layer.

Worked examples

Example 1: Minimal VPC with public and private subnets (Terraform)
variable "vpc_cidr" { default = "10.0.0.0/16" }

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  tags = { Name = "demo-vpc" }
}

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id
}

resource "aws_subnet" "public_a" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  map_public_ip_on_launch = true
  availability_zone       = "us-east-1a"
}

resource "aws_subnet" "private_a" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.2.0/24"
  availability_zone = "us-east-1a"
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
}

resource "aws_route" "public_inet" {
  route_table_id         = aws_route_table.public.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.igw.id
}

resource "aws_route_table_association" "public_a" {
  subnet_id      = aws_subnet.public_a.id
  route_table_id = aws_route_table.public.id
}

# Add NAT Gateway in public subnet & route private subnet egress via NAT for internet updates

Key idea: public subnet routes to IGW; private subnet routes to NAT for egress-only internet.

Example 2: DNS + load balancer health checks
# Pseudo-commands
# 1) Register targets on LB, enable health checks at /healthz
# 2) Create DNS record: app.example.com -> LB address
# 3) Test
curl -I http://app.example.com/healthz
# Expect: 200 OK from healthy target

# TTL guidance
# - Start with TTL=60s for dynamic apps
# - Increase to 300s when stable to reduce DNS traffic

Key idea: DNS names the service; LB handles distribution and health-based removal of bad targets.

Example 3: VPC peering routes
# VPC-A: 10.10.0.0/16  <->  VPC-B: 10.20.0.0/16
# After creating a peering connection, add routes in both VPCs:
# In VPC-A route table: destination 10.20.0.0/16 via pcx-... (peering)
# In VPC-B route table: destination 10.10.0.0/16 via pcx-... (peering)
# Ensure no overlapping CIDRs, or routing will fail.

Key idea: peering is non-transitive; both sides must route explicitly; overlapping ranges block connectivity.

Example 4: Service discovery with Kubernetes
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  clusterIP: None  # headless service for discovery
  selector:
    app: api
  ports:
  - name: http
    port: 80
    targetPort: 8080

Key idea: clients resolve SRV/A records like api.default.svc to reach pods; use headless services for library-based discovery or sidecars.

Example 5: Active-passive failover with DNS
# Two regions, primary and secondary
# Health check endpoint: /ready
# Weighted or failover DNS records:
# - Primary: type A, set to PRIMARY with health check
# - Secondary: type A, set to SECONDARY
# During outage, health check fails and traffic shifts automatically.

Key idea: DNS failover depends on health check truth and TTLs; choose RTO aligned with TTL.

Drills and exercises

  • Plan a /16 VPC and carve at least 6 /24 subnets across 3 AZs.
  • Set up a public ALB/NLB with health checks and verify fail-out of an unhealthy instance.
  • Create a private subnet with no inbound internet and confirm outbound via NAT works while inbound is blocked.
  • Simulate VPC peering between two CIDR blocks; document routes and security rules required.
  • Configure DNS TTL at 60s, change a record, and measure how long clients take to see the change.
  • Use dig, traceroute, and curl to trace a request from client to service; capture each hop and response code.
  • Enable cost tags on networking resources; estimate monthly costs for NAT, load balancers, and data transfer under a sample traffic pattern.

Common mistakes and debugging tips

Mistake: Overlapping CIDRs between VPCs

Symptom: Peering cannot be established or routes appear but traffic blackholes.

Fix: Re-plan IP space. Use non-overlapping ranges (e.g., 10.0.0.0/8 subdivided carefully). Consider a transit architecture to centralize routing.

Mistake: DNS TTL too long during migrations

Symptom: Clients keep hitting old targets after a change.

Fix: Lower TTL before cutover (e.g., 60s), perform change, then raise TTL after stability.

Mistake: NAT vs. IGW confusion

Symptom: Private instances cannot reach the internet.

Fix: Ensure route 0.0.0.0/0 in private subnets points to NAT in a public subnet with an IGW attached to the VPC.

Mistake: Ignoring data transfer costs

Symptom: Unexpected monthly spikes.

Fix: Keep traffic within the same AZ when possible, use internal load balancers, compress payloads, and cache at edges. Tag and monitor costs.

Debug recipe: Is it DNS, network, or app?
# DNS
export NAME=app.example.com
dig +short $NAME

# Network path
traceroute app.example.com

# App health
curl -s -o /dev/null -w "%{http_code}\n" http://app.example.com/healthz

# If DNS resolves but traceroute fails: network.
# If traceroute works but curl fails: app or LB target health.

Mini project: Secure two‑tier app with private egress

Goal: Deploy a simple web tier behind a load balancer that talks to a private backend. Public internet access is only via the LB; backend has no inbound internet exposure.

Acceptance criteria
  • VPC with at least one public and one private subnet per AZ.
  • Public LB terminates client traffic; targets in private subnets.
  • NAT provides egress for OS updates; no inbound internet to private subnets.
  • DNS name points to the LB; health checks at /healthz.
  • Cost tags on all resources and a monthly budget alert.
Build steps
  1. Plan CIDR: /16 with two /24 subnets per AZ (public/private).
  2. Create VPC, IGW, subnets, and route tables.
  3. Provision NAT in a public subnet; route private subnets' 0.0.0.0/0 to NAT.
  4. Deploy backend (e.g., a mock API) in private subnets and a stateless web in private subnets.
  5. Create an internal security group for east-west traffic and a public one for LB ingress.
  6. Attach a public LB to the web tier; configure health checks.
  7. Create a DNS record for the LB; set TTL to 60s while iterating.
  8. Enable cost allocation tags and a budget alert.
  9. Run a smoke test and capture a network diagram and routes.

Practical projects

  • Blue/Green switch with DNS: Deploy two versions behind separate target groups and move traffic using weighted DNS.
  • VPC-to-VPC data sync: Set up peering and route tables; sync data over a private path and measure throughput and cost.
  • Service discovery lab: Compare DNS SRV-based discovery vs. a service mesh; document pros/cons and failure modes.

Subskills

  • VPC Design Basics — Plan CIDR blocks, public/private subnets, routing, IGW/NAT.
  • DNS And Load Balancing — Map names to services, tune TTLs, health checks, and traffic policies.
  • Private Connectivity And Peering — Connect VPCs/accounts securely; avoid overlapping IP spaces.
  • Service Discovery Concepts — Use DNS A/SRV records or platform-native discovery patterns.
  • Multi Region Concepts — Choose active-active or active-passive; define failover and data strategies.
  • Cost Management And FinOps Basics — Tagging, budgets, and data transfer cost awareness.
  • Quotas And Limits Management — Track and request increases for IPs, ENIs, LB targets, and more.
  • Networking Troubleshooting Basics — Apply dig, traceroute, curl, flow logs, and packet captures.

Next steps

  • Deepen automation with Infrastructure as Code and reusable network modules.
  • Standardize naming, tagging, and guardrails for multi-account environments.
  • Practice chaos drills: disable a target or AZ and verify your failover works as intended.

Cloud And Networking Basics — Skill Exam

This exam checks your understanding of VPC design, DNS and load balancing, private connectivity, service discovery, multi-region fundamentals, cost awareness, quotas, and troubleshooting. You can take it for free, as many times as you want. Progress and results are saved for logged-in users; guests can still complete the exam without saved progress.Tips: read carefully, select all correct choices when prompted, and use real-world reasoning. Good luck!

12 questions70% to pass

Have questions about Cloud And Networking Basics?

AI Assistant

Ask questions about this tool