Topic Not Found

Who this is for

Platform Engineers and Backend Engineers who need reliable, automated ways for services to find and talk to each other across environments (local, staging, production, multi-cluster, or multi-region).

Prerequisites

Basic networking: IP, DNS, ports, TCP/HTTP.
Containers and orchestration basics (Docker, Kubernetes concepts).
Familiarity with load balancing and health checks.

Why this matters

Real platform tasks depend on service discovery:

Rolling out new microservice versions without breaking callers.
Autoscaling instances and letting clients find new ones instantly.
Failing over between zones/regions during incidents.
Routing traffic to healthy instances only.
De-risking blue/green and canary releases.

Concept explained simply

Service discovery is how one service (the caller) finds the network location of another service (the callee) at runtime. In dynamic environments, IPs change often; discovery gives you a stable name that resolves to the right, healthy endpoints.

Mental model

Think of a phone book that updates itself. Services register their current number (IP:port) and health. Clients look up a name (like payments) and get a current list of healthy numbers. A load balancer or the client chooses one to call.

Key building blocks

Naming

Stable names (e.g., payments) map to dynamic endpoints. In Kubernetes, a Service name becomes a DNS name (e.g., payments.default.svc.cluster.local).

Service registry

A database of service instances and their health. Examples in practice: Kubernetes Endpoints/EndpointSlice, Consul catalog, etcd-backed systems, or service mesh catalogs.

Health and liveness

Registries use health checks (HTTP/TCP checks, heartbeats, TTLs) to include only healthy instances. Unhealthy instances are removed until they recover.

Discovery patterns

Client-side discovery: Clients query the registry/DNS, pick an instance, and connect. Pro: flexible; Con: client must implement logic.
Server-side discovery: Clients call a stable load balancer or proxy; the proxy looks up endpoints and forwards. Pro: centralized logic; Con: extra hop.
DNS-based discovery: Clients use DNS A/SRV records. Simple and widely supported; use TTLs thoughtfully to control staleness.
Service mesh: Sidecar proxies (e.g., Envoy) handle discovery, mTLS, retries, and traffic policies on behalf of the app.

Resilience policies

Discovery works best with: timeouts, retries with jitter, circuit breakers, load balancing (round-robin, least-request), and backoff during failures.

Consistency and freshness

Registries may be eventually consistent. Expect brief staleness. Use health checks, short-but-safe TTLs, and retry policies to cope with changes.

Security

Combine discovery with auth/mTLS so only authorized services resolve and connect to backends. Limit who can register endpoints.

Worked examples

Example 1: Kubernetes Service (ClusterIP) with DNS

Create a Deployment for payments with 3 replicas.
Create a ClusterIP Service named payments on port 8080.
Clients call http://payments:8080 inside the same namespace. Kubernetes DNS resolves payments to the Service IP, and kube-proxy forwards to healthy pods.

Sample manifests

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payments
  template:
    metadata:
      labels:
        app: payments
    spec:
      containers:
      - name: svc
        image: payments:1.0
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet: { path: /ready, port: 8080 }
---
apiVersion: v1
kind: Service
metadata:
  name: payments
spec:
  selector:
    app: payments
  ports:
  - name: http
    port: 8080
    targetPort: 8080

Example 2: Kubernetes headless Service with SRV

For stateful services (e.g., databases), use a headless Service to return pod records directly so clients can connect to specific instances.

Sample manifest

apiVersion: v1
kind: Service
metadata:
  name: db
spec:
  clusterIP: None
  selector:
    app: db
  ports:
  - name: tcp
    port: 5432
    targetPort: 5432

DNS returns A records for each pod (e.g., db-0.db, db-1.db). Stateful clients can select primaries/replicas as needed.

Example 3: Client-side discovery with a registry (Consul-like)

Each service registers itself with name, address, port, tags (e.g., version=2), and a health check.
Clients query the registry (or local sidecar) for healthy instances of payments.
Client picks one using least-request and calls it. On failures, it retries with jitter.

Example registration payload

{
  "Name": "payments",
  "Address": "10.0.12.34",
  "Port": 8080,
  "Tags": ["version=2"],
  "Check": {"HTTP": "http://10.0.12.34:8080/health", "Interval": "5s"}
}

Hands-on exercises

Note: The quick test is available to everyone. If you log in, your progress will be saved automatically.

Exercise 1: Design a safe discovery plan in Kubernetes

Mirror of Exercise ex1 below. Draft a plan for three services (api, payments, notifications) with suitable Service types, selectors, and health probes. Choose DNS names and explain how clients resolve them.

Exercise 2: Implement client-side choice logic

Mirror of Exercise ex2 below. Given a small in-memory registry, decide which instance to call next, considering health and weights. Show your selection algorithm.

Self-check checklist

You used readiness probes to keep failing pods out of rotation.
You picked TTLs/refresh intervals that balance freshness and stability.
You included retry with jitter and a sensible timeout.
You considered version-aware routing for canaries.
You planned for partial registry staleness.

Common mistakes and how to self-check

Using liveness probes instead of readiness for traffic gating. Self-check: Are failing startup pods receiving traffic? Use readiness for routing decisions.
Ignoring TTLs. Self-check: Do clients cache DNS for too long and miss new pods? Reduce TTL or increase refresh frequency.
No timeouts/retries. Self-check: Do rare blips cause long client hangs? Add short timeouts and retry with jitter.
Hardcoding IPs. Self-check: Any configuration with literal pod IPs? Replace with names and selectors.
Single-zone thinking. Self-check: Can traffic shift if one zone fails? Use topology-aware routing or cross-zone endpoints.

Practical projects

Blue/Green via labels:
- Run payments v1 and v2 behind a single Service.
- Use labels (version=v1/v2) and a mesh or ingress rule to shift 10% to v2.
- Verify discovery updates as replicas scale.
Headless DB with client pinning:
- Deploy a 3-replica StatefulSet database.
- Use headless Service and teach the app to pin to db-0 for writes, others for reads.
- Simulate a pod restart and verify failover.
Consul-style local registry cache:
- Write a small sidecar that polls a registry endpoint and exposes /endpoints locally.
- Clients query localhost for discovery to reduce central load.
- Add ETag or version to handle incremental updates.

Learning path

Revise DNS basics (A, AAAA, SRV, TTL).
Learn Kubernetes Services, Endpoints/EndpointSlice, and readiness probes.
Study client-side vs server-side discovery; add retries and timeouts.
Introduce version-aware routing (blue/green, canary).
Explore service mesh patterns (sidecars, mTLS, traffic policies).

Next steps

Complete the quick test to validate understanding.
Implement one practical project in a dev cluster.
Document your team’s standard for discovery (naming, TTL, probes, retries).

Mini challenge

Your app calls search and profile services. During a partial outage, 30% of search pods fail readiness. Outline, in 5–7 bullet points, how your client should behave (timeouts, retries, fallback) and how the registry/DNS should reflect the change.

Menu

Service Discovery Concepts

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Mental model

Key building blocks

Worked examples

Example 1: Kubernetes Service (ClusterIP) with DNS

Example 2: Kubernetes headless Service with SRV

Example 3: Client-side discovery with a registry (Consul-like)

Hands-on exercises

Exercise 1: Design a safe discovery plan in Kubernetes

Exercise 2: Implement client-side choice logic

Common mistakes and how to self-check

Practical projects

Learning path

Next steps

Mini challenge

Practice Exercises

Design a Kubernetes discovery plan for three services

Instructions

Expected Output

Client-side selection with health and weights

Service Discovery Concepts — Quick Test

Have questions about Service Discovery Concepts?

AI Assistant