Who this is for
API Engineers and backend developers who build distributed systems, microservices, or event-driven integrations where data must converge over time rather than instantly.
Prerequisites
- Comfort with HTTP APIs and status codes.
- Basic knowledge of databases (transactions, indexes) and message brokers.
- Familiarity with asynchronous processing concepts (queues, workers).
Learning path
- Understand eventual consistency vs strong consistency.
- Learn core patterns: Outbox, Sagas, Idempotency, Retries with Backoff, Read Models/CQRS, Versioning & Conflict Resolution.
- Practice API designs that expose asynchronous workflows cleanly.
- Build a small project and add observability to verify convergence.
Why this matters
In real systems, payments, inventory, notifications, and analytics rarely update in a single transaction. As an API Engineer, you will:
- Design endpoints that accept work now and finish later (202 Accepted + status polling).
- Coordinate multi-service workflows without global transactions.
- Guarantee no double-charges or duplicate orders under retries.
- Recover safely from partial failures and transient outages.
Concept explained simply
Eventual consistency means: not all parts of the system will be up-to-date immediately, but if no new changes happen, all parts will eventually agree. We trade instant global correctness for availability, resilience, and scale. To do this safely, we use patterns that:
- Make operations repeatable (idempotency) so retries are safe.
- Record intent reliably (outbox) so events arenβt lost.
- Coordinate steps (saga) and compensate when something fails.
- Expose the right API responses so clients know what to expect.
Mental model
Think of a postal service:
- You drop a letter (request accepted), get a receipt (idempotency key), and later check delivery status (status endpoint).
- The letter moves through hubs (services) with scans (events) that ensure it doesnβt get lost (outbox + retries).
- If delivery fails, itβs returned to sender (saga compensation).
Core patterns (practical)
Outbox Pattern
Write domain changes and outgoing events in the same database transaction. A background publisher reads the outbox table and delivers events. This avoids the "saved DB but forgot to publish" bug.
- Tables: domain table + outbox(event_id, aggregate_id, type, payload_json, created_at, published_at).
- Publisher: scans un-published rows, publishes to broker, marks published.
Saga Pattern (orchestration or choreography)
Split a business process into steps with forward actions and compensations. Either use a central orchestrator that sends commands, or let services react to events (choreography).
- Example steps: Reserve inventory β Authorize payment β Create shipment.
- Compensations: Release inventory, void payment, cancel shipment.
Idempotency Keys
Clients send a unique key per logical request (e.g., header Idempotency-Key). The server stores the first result keyed by that value and returns the same result for retries.
- Store: key, request hash, response status/body, expiry.
- Return 201/200 same body when retried with same key.
Retries with Backoff + DLQ
Use exponential backoff and jitter for transient failures. After N attempts, move the message to a dead-letter queue (DLQ) for manual handling or automated compensation.
Read Models / CQRS
Separate write model (source of truth) from read models optimized for queries. Read models update asynchronously from events and may be briefly stale.
Versioning & Conflict Resolution
Use optimistic version numbers, ETags, or vector timestamps. Common conflict policies: last-write-wins (simple), merge by business rules, or reject with 409/412 for client retry.
Worked example 1: Order Saga (orchestrator)
- Client: POST /orders β server returns 202 Accepted with a status URL: /orders/{id}/status.
- Orchestrator steps:
- Create order PENDING.
- Reserve inventory. If fail β mark order FAILED, done.
- Authorize payment. If fail β compensate: release inventory β mark FAILED.
- Arrange shipment. If fail β compensate: void payment, release inventory β mark FAILED.
- On success β mark COMPLETED.
- Status endpoint returns: {state: "PENDING"|"IN_PROGRESS"|"COMPLETED"|"FAILED", reason?}
Worked example 2: Outbox + CDC
- Service writes Order(id=123, status='PAID') and Outbox(event='OrderPaid', aggregate_id=123, payload) in one transaction.
- Publisher reads outbox, publishes to broker, marks row as published.
- Downstream services build their read models from OrderPaid events. If publisher crashes, un-published rows remain and will be retried.
Worked example 3: Idempotent POST
- Client sends POST /payments with Idempotency-Key: abc-123 and body {order_id: 42, amount: 100}.
- Server checks store:
- No record β process, save result keyed by abc-123 β return 201 with receipt_id R1.
- Retry with same key β return 200 same body and R1, no double-charge.
API design tips for eventual consistency
- For long operations: return 202 Accepted with a status resource that reveals progress and final outcome.
- Use problem details in failure states so clients can react (e.g., insufficient_inventory).
- Use ETag or version fields with 412 Precondition Failed for conditional updates.
- Prefer 409 Conflict when business invariants would be violated; clients may retry later.
Common mistakes and how to self-check
- Mistake: Publishing an event after a commit without outbox β lost events on crash.
- Self-check: Kill the process between DB commit and publish in a test; do you lose messages?
- Mistake: Non-idempotent consumers.
- Self-check: Deliver the same message twice; does state remain correct?
- Mistake: No compensation in sagas.
- Self-check: Force step 3 to fail; do earlier steps roll back logically?
- Mistake: Clients assume immediate consistency.
- Self-check: Hide write model for 3 seconds; do clients handle pending states?
Practical projects
- Build a mini order system with Inventory, Payment, Shipping services using a saga. Add compensations for every step.
- Implement Outbox in one service and a lightweight publisher. Prove zero-loss by killing the publisher mid-run.
- Create an idempotent payments API with status tracking and demonstrate safe retries under network timeouts.
- Add metrics: count of retries, DLQ depth, time-to-consistency (write to read model lag).
Exercises
These mirror the interactive exercises below. Draft your answers here, then compare with the solutions provided in each exercise.
Exercise 1 β Design a saga for order placement
- Define steps, events, and compensations.
- Provide a state transition table for Order: PENDING β RESERVED β PAID β SHIPPED β COMPLETED or FAILED.
- Describe how the orchestrator handles timeouts and retries.
Exercise 2 β Idempotent async create endpoint
- Design POST /transfers that returns 202 and exposes /transfers/{id}/status.
- Show how Idempotency-Key is stored and reused to return the same response.
- Describe error handling for duplicates and eventual failure.
- [ ] I can explain outbox, saga, idempotency, and retries with backoff.
- [ ] I can design a 202 + status endpoint and document states.
- [ ] I added compensations for every forward step in a workflow.
- [ ] I validated consumers are idempotent under duplicate deliveries.
Take the quick test
Available to everyone. Sign in to save your progress and see it in your learning path.
Mini challenge
You must propagate a user profile change from the Identity service to Billing and Analytics. Design the event schema, outbox row, and a retry policy. Show how each consumer ensures idempotency and what happens if Analytics is down for 10 minutes.
Next steps
- Apply these patterns to one endpoint you own today. Ship a small improvement: add Idempotency-Key or a status resource.
- Instrument time-to-consistency and retry counts to observe real behavior.
- Prepare for production by rehearsing failure drills: duplicate deliveries, broker downtime, partial outages.