Why this matters
In microservices, a single user action often touches multiple services (order, payment, inventory, billing). You must keep data consistent even when networks, services, or messages fail. Distributed transactions help you avoid double-charging, ghost orders, or stuck reservations.
- Real task: Ensure an order is either fully processed (inventory reserved, payment captured) or safely rolled back.
- Real task: Prevent duplicate charges when a payment request is retried.
- Real task: Publish reliable events so downstream services never miss critical updates.
Good to know
Quick Test is available to everyone; only logged-in users get saved progress.
Who this is for
- Backend engineers building microservices with messaging or service-to-service calls.
- Developers migrating from monoliths and ACID database transactions to event-driven systems.
Prerequisites
- Comfort with REST/gRPC and message brokers (e.g., Kafka, RabbitMQ, SNS/SQS).
- Understanding of basic transactions and ACID in a single database.
- Familiarity with retries, timeouts, and idempotency at an API level.
Concept explained simply
A distributed transaction coordinates changes across multiple services so the system ends in a correct state even when parts fail. Instead of one big ACID transaction, you use patterns like Sagas, the Outbox, and idempotency to reach eventual consistency.
Mental model
Think of it like placing an online order while keeping receipts:
- Intent log (Outbox): you write down what you plan to tell others before telling them.
- Choreographed dance (Saga): each service reacts to events and performs its step.
- Receipts (Idempotency keys): if someone asks again, you show the receipt instead of repeating the action.
- Undo moves (Compensations): if a later step fails, perform a business action to reverse earlier steps.
Core building blocks
- Two-Phase Commit (2PC): Coordinator asks all services to prepare, then commit. Ensures atomicity but introduces coupling and can block. Often avoided across independent microservices.
- Saga Pattern: A sequence of local transactions with compensating actions if a step fails. Variants:
- Choreography: Services react to events. Simple but can become hard to reason about with many services.
- Orchestration: A central orchestrator tells each service what to do next. Clear flow but adds a central component.
- Outbox Pattern: Write domain change and an event to the same database transaction. A relay publishes the event. Prevents lost updates.
- Idempotency: Repeating the same request or event results in the same effect (e.g., store and check an idempotency key).
- Exactly-once vs at-least-once: Most brokers deliver at least once. Design consumers to be idempotent.
- Correlation IDs: Include a unique ID for each business operation to trace and deduplicate.
- Timeouts and retries: Use exponential backoff with jitter, and a max retry count. Combine with idempotency.
Worked examples
Example 1: Order + Payment + Inventory (Saga with Outbox)
- Order Service creates Order=PENDING and writes Outbox event OrderCreated.
- Publisher emits OrderCreated.
- Inventory Service reserves items. On success, emits InventoryReserved; on failure, emits InventoryFailed.
- Payment Service captures payment after InventoryReserved. On failure, emits PaymentFailed.
- Order Service transitions:
- InventoryFailed → compensate: set Order=CANCELLED.
- PaymentFailed → compensate: emit ReleaseInventory, set Order=CANCELLED.
- PaymentSucceeded → set Order=CONFIRMED.
Why not 2PC here?
Independent services, independent databases, and the need for availability make 2PC risky and coupling-heavy. Saga provides non-blocking, business-level consistency.
Example 2: Travel booking (Flight, Hotel, Car)
- Orchestrator starts saga with Booking=PENDING.
- Book Flight → ok → Book Hotel → fails.
- Compensate: Cancel Flight; set Booking=CANCELLED.
- If all succeed, set Booking=CONFIRMED and emit BookingConfirmed.
Each step is a local transaction. Compensations are business actions, not rollbacks.
Example 3: Idempotent payment charge with retries
POST /charges (Idempotency-Key: 7b9...)
- Server checks store for key 7b9...
- If found, return saved result
- Else, create charge, save result keyed by 7b9..., returnIf the client retries due to a timeout, the server returns the same charge result without double-charging.
Design steps you can follow
- Define the business outcome and acceptable eventual consistency window.
- Identify services and local transactions required.
- Choose style: choreography for simple flows; orchestration for complex branching.
- Plan compensations for each step that can fail after others succeed.
- Add idempotency keys and deduplication for commands and events.
- Adopt Outbox to publish reliable events alongside state changes.
- Set timeouts, retries (exponential backoff + jitter), and dead letter handling.
- Observe with correlation IDs and metrics for success/failure/latency.
Common mistakes
- Assuming exactly-once delivery. Fix: Design idempotent consumers and use dedup storage.
- Missing compensation. Fix: Write compensations at the same time as forward steps; test them.
- Not persisting intent before publish. Fix: Use Outbox to avoid lost messages.
- Overusing 2PC. Fix: Prefer sagas unless you control all participants and can tolerate blocking.
- Retry storms. Fix: Backoff with jitter, circuit breakers, and max retry limits.
Self-check
- Can you explain how your system avoids double-processing if a message is delivered twice?
- Do all forward steps have compensations? Are they idempotent?
- Can you trace a single order end-to-end using a correlation ID?
Exercises
Do these, then compare with the solutions below.
Exercise 1: Design a simple saga
Services: Order, Inventory, Payment. Design the steps, events, and compensations so that a payment failure releases inventory and cancels the order. Include how you avoid duplicates.
Exercise 2: Idempotent consumer
Write pseudocode for a message consumer handling PaymentCaptured events that might be delivered twice. Ensure the order is confirmed only once.
- Checklist:
- Includes correlation or idempotency key.
- Persists processing result before side effects.
- Handles retry and duplicate safely.
Need a hint?
- Use a processed_messages table keyed by event ID.
- Wrap state change and dedup record in a transaction.
- Emit follow-up events via Outbox.
Practical projects
- Build an order microservice set (Order, Inventory, Payment) using an Outbox table and a background publisher.
- Add idempotency to your payment API using a request key and result cache.
- Implement both choreography and orchestration for the same flow; compare failures and logs.
Learning path
- Start: Idempotency and retries at the API level.
- Next: Outbox pattern and transactional messaging.
- Then: Saga choreography vs orchestration, with compensations.
- Advanced: Exactly-once semantics myths, dedup strategies, and backpressure.
Next steps
- Add correlation IDs across services for observability.
- Define SLAs for consistency and recovery time.
- Chaos-test your saga by failing random steps.
Mini challenge
Your shipping service is temporarily down. Describe how your saga should behave to avoid charging customers without guaranteeing shipment. What events and compensations do you expect?