Menu

Topic 2 of 8

DNS And Load Balancing

Learn DNS And Load Balancing for free with explanations, exercises, and a quick test (for Platform Engineer).

Published: January 23, 2026 | Updated: January 23, 2026

Why this matters

As a Platform Engineer, you route traffic reliably and safely. DNS maps names to services, and load balancers spread traffic, keep apps available during failures, and enable zero-downtime deploys. You will use these skills to:

  • Expose internal and external services with stable hostnames.
  • Design failover and disaster recovery using DNS and health checks.
  • Roll out blue/green or canary releases without breaking users.
  • Scale horizontally and keep response times predictable.

Who this is for

  • Platform and DevOps engineers starting with networking fundamentals.
  • Backend engineers who need to ship services behind stable endpoints.
  • SREs improving availability and release safety.

Prerequisites

  • Comfort with the command line and editing config files.
  • Basic TCP/IP understanding (IP, ports, HTTP).
  • Ability to run simple tests using curl, dig, or nslookup.

Concept explained simply

DNS (Domain Name System)

DNS is the phonebook of the internet. It translates human-readable names (like api.example.com) into IP addresses that computers use. It’s distributed and cached so lookups are fast and scalable.

  • Common records: A (IPv4), AAAA (IPv6), CNAME (alias to another name), TXT (metadata), MX (mail), NS (nameservers), SRV (service location).
  • TTL: Time-to-Live controls caching. Lower TTLs let you change answers faster but increase DNS query load.
  • Resolvers: Your device asks a recursive resolver, which finds authoritative nameservers and returns the final answer, caching along the way.
  • Gotcha: You cannot put a CNAME at the zone apex (example.com) because the apex must also have SOA and NS records. Use ALIAS/ANAME features if your DNS provider supports them.

Load Balancing

Load balancers distribute traffic across multiple backends to improve reliability and performance.

  • L4 vs L7: Layer 4 (TCP/UDP) routes by IP/port only. Layer 7 (HTTP/HTTPS) can route by path, host, headers, cookies, etc.
  • Algorithms: round-robin, weighted round-robin, least connections, IP hash (simple stickiness).
  • Health checks: Automatically remove unhealthy backends (e.g., HTTP 200 on /healthz).
  • Session affinity: Keep a user on the same backend if needed (cookies, IP hash).
  • Global vs local: DNS-based distribution across regions (global), load balancer-based distribution within a region (local).
Mental model

Imagine DNS as road signs that point users toward a city (your region or load balancer). Inside the city, a traffic officer (the load balancer) directs cars to different parking lots (your instances) based on rules and current congestion. The signs change slowly (TTL), while the officer reacts quickly (health checks, algorithms).

Core concepts to know

  • DNS caching and TTL trade-offs.
  • Authoritative vs recursive resolvers; SOA/NS records.
  • Record selection: A/AAAA vs CNAME; ALIAS/ANAME at apex.
  • Anycast and DNS-based global routing limitations due to caching.
  • L4 vs L7 load balancing; when to choose each.
  • Health checks, draining, surge protection, timeouts.
  • Strategies: blue/green, canary, weighted routing, failover.

Worked examples

Example 1: Map a service and plan TTLs

  1. Goal: Expose api.example.com to an L7 load balancer at 203.0.113.12.
  2. Records:
    ; zone: example.com
    api IN A 203.0.113.12
    api IN AAAA 2001:db8::12
    ; 5 minutes for faster change during rollout
    api IN TTL 300
  3. Trade-off: During a migration, use TTL 300 to switch targets quickly. After stability, raise to 3600 to reduce DNS traffic.

Example 2: Weighted DNS between two regions

  1. Goal: Send 80% traffic to us-east, 20% to eu-west. Providers implement weights differently; conceptually it looks like:
    api.example.com. 300 IN A 198.51.100.10 ; us-east weight=80
    api.example.com. 300 IN A 203.0.113.20 ; eu-west  weight=20
  2. Limitations: DNS caches mean users might not immediately see new weights. Use for coarse global distribution, not real-time traffic shifting.

Example 3: L7 load balancer (NGINX) with health checks

  1. Upstreams and routes:
    http {
      upstream app_backend {
        least_conn;
        server 10.0.1.10:8080 max_fails=2 fail_timeout=10s;
        server 10.0.1.11:8080 max_fails=2 fail_timeout=10s;
      }
      server {
        listen 80;
        location /healthz { return 200 'ok'; }
        location / {
          proxy_set_header Host $host;
          proxy_pass http://app_backend;
        }
      }
    }
  2. Effect: NGINX prefers backends with fewer connections and quickly avoids repeatedly failing ones.

Example 4: Blue/Green switch with weights

  1. Goal: Move from v1 to v2 gradually via L7 weights.
    upstream app_backend {
      # weighted round-robin via server weight
      server 10.0.1.10:8080 weight=9;  # v1
      server 10.0.1.20:8080 weight=1;  # v2
    }
  2. Increase v2 weight step-by-step, monitor errors and latency, then drain v1.

Hands-on practice exercises

Complete the exercises below. When ready, take the Quick Test at the bottom. Note: Tests are available to everyone; only logged-in users get saved progress.

Exercise 1: Plan DNS for a service and simulate a change

See details in the Exercises section below (Exercise 1).

Exercise 2: Configure NGINX load balancing with health checks

See details in the Exercises section below (Exercise 2).

Checklist: I can

  • Explain recursive vs authoritative DNS in one sentence.
  • Choose between A/AAAA vs CNAME vs ALIAS/ANAME at apex.
  • Set TTLs appropriately for migrations and steady state.
  • Pick L4 vs L7 based on requirements (protocol vs content routing).
  • Enable health checks and connection draining on a load balancer.
  • Run a blue/green or canary rollout with weights or routing rules.
  • Diagnose DNS issues with dig/nslookup and LB issues with curl/logs.

Common mistakes and how to self-check

  • Using CNAME at zone apex: Not allowed. Self-check: Verify SOA/NS exist at apex; use ALIAS/ANAME if needed.
  • TTL too high during migrations: Changes propagate slowly. Self-check: Lower TTL at least one TTL period before changes.
  • No health checks: Users hit dead backends. Self-check: Intentionally stop one backend and see if traffic shifts automatically.
  • Sticky sessions without reason: Reduces balancing quality. Self-check: Remove stickiness unless the app truly needs it.
  • Ignoring IPv6: Some users prefer AAAA. Self-check: Add AAAA and test with dig -6 and curl -6.
  • Per-request "balancing" via DNS only: DNS is coarse. Self-check: Use L7/L4 LB for fast failover and granular control.

Practical projects

  • High-availability web app: Put two app instances behind NGINX with health checks and demonstrate failover.
  • Global read distribution: Use DNS weights to send a portion of traffic to a read-only replica in another region; measure cache effects.
  • Blue/green release: Automate weight shifts and connection draining; record error rates and rollback steps.

Learning path

  1. DNS basics: understand records, TTL, caching.
  2. Hands-on: dig/nslookup to see resolution paths.
  3. Load balancer fundamentals: algorithms, L4 vs L7.
  4. Deploy a lab NGINX/HAProxy and configure health checks.
  5. Practice blue/green and canary with controlled weights.
  6. Add observability: logs and simple health endpoints.

Next steps

  • Finish the Exercises below and verify the Checklist items.
  • Take the Quick Test to confirm understanding. Tests are available to everyone; only logged-in users get saved progress.
  • Apply the Practical projects at work or in a lab.

Exercises

Exercise 1 — Plan DNS and simulate a change

Goal: Design records for api.example.com and simulate a migration with safe TTLs.

Show task
  1. Create a zone file snippet for example.com that includes:
    • SOA/NS placeholders, and records for api.example.com with A and AAAA.
    • TTL 300 for api during migration.
  2. Simulate a change: switch api.example.com from 198.51.100.10 to 203.0.113.12 and note propagation expectations.
  3. Use dig to show what you expect to see before and after the TTL window.

Exercise 2 — NGINX load balancer with health checks

Goal: Balance across two backends with least connections, add a health endpoint, and test failover.

Show task
  1. Write an NGINX config with an upstream of two servers, least_conn, and basic fail parameters.
  2. Add a /healthz that returns 200.
  3. Use curl in a loop to observe distribution, then stop one backend and verify automatic failover.

Mini challenge

You must move 30% of traffic to a new region for a read-only feature test within 1 hour, minimizing risk. What’s your plan using DNS and the load balancer?

Sample approach
  • Lower DNS TTL to 300 at least 5–10 minutes in advance.
  • Add a weighted DNS record for the new region at 0.3 effective weight.
  • Keep regional L7 health checks strict; monitor errors and latency.
  • If errors rise, drop weight to 0 or remove record; if stable, consider small increments.

Practice Exercises

2 exercises to complete

Instructions

Design records and simulate a migration of api.example.com from old to new IP with safe TTL settings.

  1. Create a zone snippet for example.com with minimal viable SOA/NS (placeholders ok) and these records:
    ; example.com
    $TTL 3600
    @        IN SOA ns1.example.com. admin.example.com. (
                 2026012301 ; serial
                 3600       ; refresh
                 900        ; retry
                 1209600    ; expire
                 300 )      ; minimum
    @        IN NS  ns1.example.com.
    @        IN NS  ns2.example.com.
    api      IN A   198.51.100.10
    api      IN AAAA 2001:db8::10
    ; temporarily lower TTL during migration
    api      300 IN A   198.51.100.10
    api      300 IN AAAA 2001:db8::10
  2. Simulate switch to new IP 203.0.113.12 by updating the A/AAAA at api and bumping the SOA serial.
  3. Predict dig outputs before and after 5 minutes (300s). Run locally if possible:
    dig +short api.example.com A
    sleep 300
    dig +short api.example.com A
Expected Output
Before TTL expiry: 198.51.100.10. After ~300s: 203.0.113.12. AAAA answers update similarly.

DNS And Load Balancing — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about DNS And Load Balancing?

AI Assistant

Ask questions about this tool