Menu

Topic 4 of 8

Service Discovery Concepts

Learn Service Discovery Concepts for free with explanations, exercises, and a quick test (for Platform Engineer).

Published: January 23, 2026 | Updated: January 23, 2026

Who this is for

Platform Engineers and Backend Engineers who need reliable, automated ways for services to find and talk to each other across environments (local, staging, production, multi-cluster, or multi-region).

Prerequisites

  • Basic networking: IP, DNS, ports, TCP/HTTP.
  • Containers and orchestration basics (Docker, Kubernetes concepts).
  • Familiarity with load balancing and health checks.

Why this matters

Real platform tasks depend on service discovery:

  • Rolling out new microservice versions without breaking callers.
  • Autoscaling instances and letting clients find new ones instantly.
  • Failing over between zones/regions during incidents.
  • Routing traffic to healthy instances only.
  • De-risking blue/green and canary releases.

Concept explained simply

Service discovery is how one service (the caller) finds the network location of another service (the callee) at runtime. In dynamic environments, IPs change often; discovery gives you a stable name that resolves to the right, healthy endpoints.

Mental model

Think of a phone book that updates itself. Services register their current number (IP:port) and health. Clients look up a name (like payments) and get a current list of healthy numbers. A load balancer or the client chooses one to call.

Key building blocks

Naming

Stable names (e.g., payments) map to dynamic endpoints. In Kubernetes, a Service name becomes a DNS name (e.g., payments.default.svc.cluster.local).

Service registry

A database of service instances and their health. Examples in practice: Kubernetes Endpoints/EndpointSlice, Consul catalog, etcd-backed systems, or service mesh catalogs.

Health and liveness

Registries use health checks (HTTP/TCP checks, heartbeats, TTLs) to include only healthy instances. Unhealthy instances are removed until they recover.

Discovery patterns
  • Client-side discovery: Clients query the registry/DNS, pick an instance, and connect. Pro: flexible; Con: client must implement logic.
  • Server-side discovery: Clients call a stable load balancer or proxy; the proxy looks up endpoints and forwards. Pro: centralized logic; Con: extra hop.
  • DNS-based discovery: Clients use DNS A/SRV records. Simple and widely supported; use TTLs thoughtfully to control staleness.
  • Service mesh: Sidecar proxies (e.g., Envoy) handle discovery, mTLS, retries, and traffic policies on behalf of the app.
Resilience policies

Discovery works best with: timeouts, retries with jitter, circuit breakers, load balancing (round-robin, least-request), and backoff during failures.

Consistency and freshness

Registries may be eventually consistent. Expect brief staleness. Use health checks, short-but-safe TTLs, and retry policies to cope with changes.

Security

Combine discovery with auth/mTLS so only authorized services resolve and connect to backends. Limit who can register endpoints.

Worked examples

Example 1: Kubernetes Service (ClusterIP) with DNS

  1. Create a Deployment for payments with 3 replicas.
  2. Create a ClusterIP Service named payments on port 8080.
  3. Clients call http://payments:8080 inside the same namespace. Kubernetes DNS resolves payments to the Service IP, and kube-proxy forwards to healthy pods.
Sample manifests
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payments
  template:
    metadata:
      labels:
        app: payments
    spec:
      containers:
      - name: svc
        image: payments:1.0
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet: { path: /ready, port: 8080 }
---
apiVersion: v1
kind: Service
metadata:
  name: payments
spec:
  selector:
    app: payments
  ports:
  - name: http
    port: 8080
    targetPort: 8080

Example 2: Kubernetes headless Service with SRV

For stateful services (e.g., databases), use a headless Service to return pod records directly so clients can connect to specific instances.

Sample manifest
apiVersion: v1
kind: Service
metadata:
  name: db
spec:
  clusterIP: None
  selector:
    app: db
  ports:
  - name: tcp
    port: 5432
    targetPort: 5432

DNS returns A records for each pod (e.g., db-0.db, db-1.db). Stateful clients can select primaries/replicas as needed.

Example 3: Client-side discovery with a registry (Consul-like)

  1. Each service registers itself with name, address, port, tags (e.g., version=2), and a health check.
  2. Clients query the registry (or local sidecar) for healthy instances of payments.
  3. Client picks one using least-request and calls it. On failures, it retries with jitter.
Example registration payload
{
  "Name": "payments",
  "Address": "10.0.12.34",
  "Port": 8080,
  "Tags": ["version=2"],
  "Check": {"HTTP": "http://10.0.12.34:8080/health", "Interval": "5s"}
}

Hands-on exercises

Note: The quick test is available to everyone. If you log in, your progress will be saved automatically.

Exercise 1: Design a safe discovery plan in Kubernetes

Mirror of Exercise ex1 below. Draft a plan for three services (api, payments, notifications) with suitable Service types, selectors, and health probes. Choose DNS names and explain how clients resolve them.

Exercise 2: Implement client-side choice logic

Mirror of Exercise ex2 below. Given a small in-memory registry, decide which instance to call next, considering health and weights. Show your selection algorithm.

Self-check checklist
  • You used readiness probes to keep failing pods out of rotation.
  • You picked TTLs/refresh intervals that balance freshness and stability.
  • You included retry with jitter and a sensible timeout.
  • You considered version-aware routing for canaries.
  • You planned for partial registry staleness.

Common mistakes and how to self-check

  • Using liveness probes instead of readiness for traffic gating. Self-check: Are failing startup pods receiving traffic? Use readiness for routing decisions.
  • Ignoring TTLs. Self-check: Do clients cache DNS for too long and miss new pods? Reduce TTL or increase refresh frequency.
  • No timeouts/retries. Self-check: Do rare blips cause long client hangs? Add short timeouts and retry with jitter.
  • Hardcoding IPs. Self-check: Any configuration with literal pod IPs? Replace with names and selectors.
  • Single-zone thinking. Self-check: Can traffic shift if one zone fails? Use topology-aware routing or cross-zone endpoints.

Practical projects

  1. Blue/Green via labels:
    • Run payments v1 and v2 behind a single Service.
    • Use labels (version=v1/v2) and a mesh or ingress rule to shift 10% to v2.
    • Verify discovery updates as replicas scale.
  2. Headless DB with client pinning:
    • Deploy a 3-replica StatefulSet database.
    • Use headless Service and teach the app to pin to db-0 for writes, others for reads.
    • Simulate a pod restart and verify failover.
  3. Consul-style local registry cache:
    • Write a small sidecar that polls a registry endpoint and exposes /endpoints locally.
    • Clients query localhost for discovery to reduce central load.
    • Add ETag or version to handle incremental updates.

Learning path

  1. Revise DNS basics (A, AAAA, SRV, TTL).
  2. Learn Kubernetes Services, Endpoints/EndpointSlice, and readiness probes.
  3. Study client-side vs server-side discovery; add retries and timeouts.
  4. Introduce version-aware routing (blue/green, canary).
  5. Explore service mesh patterns (sidecars, mTLS, traffic policies).

Next steps

  • Complete the quick test to validate understanding.
  • Implement one practical project in a dev cluster.
  • Document your team’s standard for discovery (naming, TTL, probes, retries).

Mini challenge

Your app calls search and profile services. During a partial outage, 30% of search pods fail readiness. Outline, in 5–7 bullet points, how your client should behave (timeouts, retries, fallback) and how the registry/DNS should reflect the change.

Practice Exercises

2 exercises to complete

Instructions

You run three services: api, payments, and notifications. All are HTTP on port 8080. The api calls both payments and notifications. Design Service objects and readiness probes to ensure only healthy pods receive traffic. Explain DNS names clients will use and how scale-up/rollout is handled without downtime.

  1. Propose labels/selectors for each Service.
  2. Include readiness probe endpoints.
  3. Choose Service types (ClusterIP vs headless) and justify.
  4. Describe how api discovers payments and notifications.
Expected Output
A brief plan with 3 Service specs, readiness probe paths, and an explanation of the runtime DNS names and rollout behavior.

Service Discovery Concepts — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Service Discovery Concepts?

AI Assistant

Ask questions about this tool