How to learn DNS And Load Balancing for Cloud And Networking Basics in Platform Engineer for free

Why this matters

As a Platform Engineer, you route traffic reliably and safely. DNS maps names to services, and load balancers spread traffic, keep apps available during failures, and enable zero-downtime deploys. You will use these skills to:

Expose internal and external services with stable hostnames.
Design failover and disaster recovery using DNS and health checks.
Roll out blue/green or canary releases without breaking users.
Scale horizontally and keep response times predictable.

Who this is for

Platform and DevOps engineers starting with networking fundamentals.
Backend engineers who need to ship services behind stable endpoints.
SREs improving availability and release safety.

Prerequisites

Comfort with the command line and editing config files.
Basic TCP/IP understanding (IP, ports, HTTP).
Ability to run simple tests using curl, dig, or nslookup.

Concept explained simply

DNS (Domain Name System)

DNS is the phonebook of the internet. It translates human-readable names (like api.example.com) into IP addresses that computers use. It’s distributed and cached so lookups are fast and scalable.

Common records: A (IPv4), AAAA (IPv6), CNAME (alias to another name), TXT (metadata), MX (mail), NS (nameservers), SRV (service location).
TTL: Time-to-Live controls caching. Lower TTLs let you change answers faster but increase DNS query load.
Resolvers: Your device asks a recursive resolver, which finds authoritative nameservers and returns the final answer, caching along the way.
Gotcha: You cannot put a CNAME at the zone apex (example.com) because the apex must also have SOA and NS records. Use ALIAS/ANAME features if your DNS provider supports them.

Load Balancing

Load balancers distribute traffic across multiple backends to improve reliability and performance.

L4 vs L7: Layer 4 (TCP/UDP) routes by IP/port only. Layer 7 (HTTP/HTTPS) can route by path, host, headers, cookies, etc.
Algorithms: round-robin, weighted round-robin, least connections, IP hash (simple stickiness).
Health checks: Automatically remove unhealthy backends (e.g., HTTP 200 on /healthz).
Session affinity: Keep a user on the same backend if needed (cookies, IP hash).
Global vs local: DNS-based distribution across regions (global), load balancer-based distribution within a region (local).

Mental model

Imagine DNS as road signs that point users toward a city (your region or load balancer). Inside the city, a traffic officer (the load balancer) directs cars to different parking lots (your instances) based on rules and current congestion. The signs change slowly (TTL), while the officer reacts quickly (health checks, algorithms).

Core concepts to know

DNS caching and TTL trade-offs.
Authoritative vs recursive resolvers; SOA/NS records.
Record selection: A/AAAA vs CNAME; ALIAS/ANAME at apex.
Anycast and DNS-based global routing limitations due to caching.
L4 vs L7 load balancing; when to choose each.
Health checks, draining, surge protection, timeouts.
Strategies: blue/green, canary, weighted routing, failover.

Worked examples

Example 1: Map a service and plan TTLs

Goal: Expose api.example.com to an L7 load balancer at 203.0.113.12.

Records:

; zone: example.com
api IN A 203.0.113.12
api IN AAAA 2001:db8::12
; 5 minutes for faster change during rollout
api IN TTL 300

Trade-off: During a migration, use TTL 300 to switch targets quickly. After stability, raise to 3600 to reduce DNS traffic.

Example 2: Weighted DNS between two regions

Goal: Send 80% traffic to us-east, 20% to eu-west. Providers implement weights differently; conceptually it looks like:

api.example.com. 300 IN A 198.51.100.10 ; us-east weight=80
api.example.com. 300 IN A 203.0.113.20 ; eu-west  weight=20

Limitations: DNS caches mean users might not immediately see new weights. Use for coarse global distribution, not real-time traffic shifting.

Example 3: L7 load balancer (NGINX) with health checks

Upstreams and routes:

http {
  upstream app_backend {
    least_conn;
    server 10.0.1.10:8080 max_fails=2 fail_timeout=10s;
    server 10.0.1.11:8080 max_fails=2 fail_timeout=10s;
  }
  server {
    listen 80;
    location /healthz { return 200 'ok'; }
    location / {
      proxy_set_header Host $host;
      proxy_pass http://app_backend;
    }
  }
}

Effect: NGINX prefers backends with fewer connections and quickly avoids repeatedly failing ones.

Example 4: Blue/Green switch with weights

Goal: Move from v1 to v2 gradually via L7 weights.

upstream app_backend {
  # weighted round-robin via server weight
  server 10.0.1.10:8080 weight=9;  # v1
  server 10.0.1.20:8080 weight=1;  # v2
}

Increase v2 weight step-by-step, monitor errors and latency, then drain v1.

Hands-on practice exercises

Complete the exercises below. When ready, take the Quick Test at the bottom. Note: Tests are available to everyone; only logged-in users get saved progress.

Exercise 1: Plan DNS for a service and simulate a change

See details in the Exercises section below (Exercise 1).

Exercise 2: Configure NGINX load balancing with health checks

See details in the Exercises section below (Exercise 2).

Checklist: I can

Explain recursive vs authoritative DNS in one sentence.
Choose between A/AAAA vs CNAME vs ALIAS/ANAME at apex.
Set TTLs appropriately for migrations and steady state.
Pick L4 vs L7 based on requirements (protocol vs content routing).
Enable health checks and connection draining on a load balancer.
Run a blue/green or canary rollout with weights or routing rules.
Diagnose DNS issues with dig/nslookup and LB issues with curl/logs.

Common mistakes and how to self-check

Using CNAME at zone apex: Not allowed. Self-check: Verify SOA/NS exist at apex; use ALIAS/ANAME if needed.
TTL too high during migrations: Changes propagate slowly. Self-check: Lower TTL at least one TTL period before changes.
No health checks: Users hit dead backends. Self-check: Intentionally stop one backend and see if traffic shifts automatically.
Sticky sessions without reason: Reduces balancing quality. Self-check: Remove stickiness unless the app truly needs it.
Ignoring IPv6: Some users prefer AAAA. Self-check: Add AAAA and test with dig -6 and curl -6.
Per-request "balancing" via DNS only: DNS is coarse. Self-check: Use L7/L4 LB for fast failover and granular control.

Practical projects

High-availability web app: Put two app instances behind NGINX with health checks and demonstrate failover.
Global read distribution: Use DNS weights to send a portion of traffic to a read-only replica in another region; measure cache effects.
Blue/green release: Automate weight shifts and connection draining; record error rates and rollback steps.

Learning path

DNS basics: understand records, TTL, caching.
Hands-on: dig/nslookup to see resolution paths.
Load balancer fundamentals: algorithms, L4 vs L7.
Deploy a lab NGINX/HAProxy and configure health checks.
Practice blue/green and canary with controlled weights.
Add observability: logs and simple health endpoints.

Next steps

Finish the Exercises below and verify the Checklist items.
Take the Quick Test to confirm understanding. Tests are available to everyone; only logged-in users get saved progress.
Apply the Practical projects at work or in a lab.

Exercises

Exercise 1 — Plan DNS and simulate a change

Goal: Design records for api.example.com and simulate a migration with safe TTLs.

Show task

Create a zone file snippet for example.com that includes:
- SOA/NS placeholders, and records for api.example.com with A and AAAA.
- TTL 300 for api during migration.
Simulate a change: switch api.example.com from 198.51.100.10 to 203.0.113.12 and note propagation expectations.
Use dig to show what you expect to see before and after the TTL window.

Exercise 2 — NGINX load balancer with health checks

Goal: Balance across two backends with least connections, add a health endpoint, and test failover.

Show task

Write an NGINX config with an upstream of two servers, least_conn, and basic fail parameters.
Add a /healthz that returns 200.
Use curl in a loop to observe distribution, then stop one backend and verify automatic failover.

Mini challenge

You must move 30% of traffic to a new region for a read-only feature test within 1 hour, minimizing risk. What’s your plan using DNS and the load balancer?

Sample approach

Lower DNS TTL to 300 at least 5–10 minutes in advance.
Add a weighted DNS record for the new region at 0.3 effective weight.
Keep regional L7 health checks strict; monitor errors and latency.
If errors rise, drop weight to 0 or remove record; if stable, consider small increments.

Instructions

Design records and simulate a migration of api.example.com from old to new IP with safe TTL settings.

Create a zone snippet for example.com with minimal viable SOA/NS (placeholders ok) and these records:

; example.com
$TTL 3600
@        IN SOA ns1.example.com. admin.example.com. (
             2026012301 ; serial
             3600       ; refresh
             900        ; retry
             1209600    ; expire
             300 )      ; minimum
@        IN NS  ns1.example.com.
@        IN NS  ns2.example.com.
api      IN A   198.51.100.10
api      IN AAAA 2001:db8::10
; temporarily lower TTL during migration
api      300 IN A   198.51.100.10
api      300 IN AAAA 2001:db8::10

Simulate switch to new IP 203.0.113.12 by updating the A/AAAA at api and bumping the SOA serial.
Predict dig outputs before and after 5 minutes (300s). Run locally if possible:
```
dig +short api.example.com A
sleep 300
dig +short api.example.com A
```

Menu

DNS And Load Balancing

Table of Contents