Why this matters
As an API Engineer, you often need to save data and notify other services about that change. If you publish a message and then save the data (or vice versa) in separate steps, crashes can cause missed events or duplicates. The Outbox Pattern solves this by writing the event into the same database transaction as your business update, then reliably publishing it later.
- Real tasks you’ll face: syncing Order and Payment services; emitting user profile changes to a search index; propagating inventory updates to a warehouse system; sending audit events safely.
- Impact: fewer bugs in distributed systems, no silent data loss, simpler recovery after failures.
Concept explained simply
Outbox Pattern: whenever you modify business data, you also insert a record into an outbox table in the same database transaction. A separate relay process scans outbox rows, publishes them to a message bus (or HTTP webhook), and marks them as published. This ensures you never commit business data without also recording the event that must be sent.
Mental model: The postcard in the same envelope
Imagine you place your order (business data) and a postcard (event) into the same envelope (database transaction). If the envelope is mailed, both are in; if it isn’t, neither is. A mail clerk (relay) later picks the postcard and sends it to the recipient (message bus/consumer). If the clerk drops the postcard, they can pick it up and resend—so receivers must tolerate duplicates.
Core flow
Design essentials
- Outbox table fields:
event_id (UUID),aggregate_type,aggregate_id,event_type,payload (JSON),headers/meta (JSON),status(NEW|PUBLISHED|FAILED),attempts,next_attempt_at,created_at,published_at. - Atomicity: write business data and outbox row in the same DB transaction.
- Delivery semantics: at-least-once. Expect duplicates; ensure idempotency on consumers.
- Indexes: on
status, next_attempt_at, and optionallycreated_atfor pagination. - Retention: delete or archive published rows after a safe retention period.
- Ordering: per-aggregate ordering by partitioning messages with
aggregate_idkey on the bus. - Schema versioning: include
schema_versionin payload; never break consumers. - Security/PII: store minimal necessary data in the payload; consider encryption for sensitive fields.
Worked examples
Example 1: Place Order with transactional outbox
-- Within your API request handler
BEGIN;
INSERT INTO orders (order_id, customer_id, total_cents, status, created_at)
VALUES ($1, $2, $3, 'PLACED', NOW());
INSERT INTO outbox (
event_id, aggregate_type, aggregate_id, event_type, payload,
status, attempts, created_at
) VALUES (
gen_random_uuid(), 'Order', $1, 'OrderPlaced',
jsonb_build_object(
'order_id', $1,
'customer_id', $2,
'total_cents', $3,
'schema_version', 1
),
'NEW', 0, NOW()
);
COMMIT;
Outcome: either both the order and outbox are saved, or neither is.
Example 2: Relay publishes with retries
-- Pseudocode for relay loop
while true:
rows = select * from outbox
where status in ('NEW','FAILED')
and (next_attempt_at is null or next_attempt_at <= now())
order by created_at asc
limit 100
for row in rows:
try:
publish_to_bus(topic='orders', key=row.aggregate_id, payload=row.payload)
-- Mark as published (separate DB tx; duplicates are acceptable)
update outbox set status='PUBLISHED', published_at=now()
where event_id=row.event_id
except TemporaryError as e:
backoff = compute_exponential_backoff(row.attempts)
update outbox
set status='FAILED', attempts=attempts+1, next_attempt_at=now()+backoff
where event_id=row.event_id
except NonRetryableError as e:
-- park the message for manual review
update outbox set status='FAILED', attempts=attempts+1
where event_id=row.event_id
sleep(1000ms)
Duplicates can occur if the process crashes after publish but before marking as published. That’s fine; consumers must be idempotent.
Example 3: Idempotent consumer
-- Pseudocode with dedup table
BEGIN;
-- idempotency key = event_id from the message
if exists(select 1 from processed_messages where event_id = :event_id):
ROLLBACK; return; -- already applied
-- apply the side effect (e.g., update read model)
update inventory set reserved = reserved + :delta
where sku = :sku;
insert into processed_messages(event_id, processed_at)
values (:event_id, now());
COMMIT;
Even if the consumer receives the message again, the operation won’t be applied twice.
Alternative: CDC vs polling
- Polling outbox: simple and explicit. Application writes outbox rows; relay polls.
- CDC (Change Data Capture): a log-based tool reads DB changes and forwards rows to a bus. If you use CDC for the outbox table, you often get lower latency and less custom code.
Who this is for
- API Engineers and Backend Developers building event-driven or microservice systems.
- Teams integrating with message brokers (Kafka, RabbitMQ, SQS) or webhooks.
Prerequisites
- Comfort with transactions and SQL (INSERT/UPDATE, indexes).
- Basic knowledge of message brokers and retry patterns.
- Familiarity with JSON payloads and versioning.
Learning path
- Understand at-least-once delivery and idempotency.
- Design an outbox schema that matches your domain aggregates.
- Implement transactional writes and a relay worker with retries.
- Make consumers idempotent and verify ordering where it matters.
- Add monitoring, retention, and backpressure handling.
Exercises
Do these now; they mirror the tasks below. You can compare with the provided solutions.
Exercise 1 — Design the outbox and write the transactional SQL
Goal: on order placement, insert both the order and an OrderPlaced outbox event atomically. Include event_id, aggregate_id, event_type, payload, timestamps, and status.
- Write the
CREATE TABLE outbox (...)DDL (simplified). - Write the
BEGIN ... COMMITblock that inserts the order and the outbox row. - Add the index you need for the relay to scan efficiently.
Exercise 2 — Relay retries and idempotent consumer
Goal: sketch the relay logic with exponential backoff and the consumer logic that deduplicates by event_id.
- Show how you compute
next_attempt_atand cap max backoff. - Show the consumer transaction that checks a
processed_messagestable. - Explain how you would guarantee per-order message ordering.
Checklist before submitting
- Business write and outbox insert happen in the same DB transaction.
- Outbox rows have a stable idempotency key.
- Relay marks published and retries failures with backoff.
- Consumer ignores duplicates safely.
- Retention/monitoring plan noted.
Common mistakes and how to self-check
- Publishing inside the same request before committing the DB transaction. Fix: only publish from relay after commit.
- No idempotency at consumer. Fix: use
event_idor a natural key and aprocessed_messagestable. - No backoff and unlimited retries. Fix: implement exponential backoff and parking of poison messages.
- Missing indexes. Fix: index
(status, next_attempt_at)for scanning. - Breaking event schema. Fix: add
schema_version; only additive changes. - Assuming global total ordering. Fix: guarantee ordering per aggregate via partition key =
aggregate_id.
Self-check prompts
- If your relay crashes after publishing but before marking published, what happens? Answer: The message may be republished; consumers must be idempotent.
- How do you prove atomicity? Answer: Show a single DB transaction containing both business and outbox inserts.
- Where is your retention policy documented? Answer: State the TTL or archival process for published rows.
Practical projects
- Order Service → Payment and Email: implement outbox + relay, publish two topics, add idempotent consumers.
- User Profile updates → Search index: outbox events feed a denormalized read model; include schema versioning and reindex tooling.
- Inventory reservations: simulate concurrent orders, verify per-sku ordering by partition key and consumer dedupe.
Check your understanding
Try the quick test below. The test is available to everyone; only logged-in users get saved progress.
Mini challenge
Your system emits OrderPlaced v1 and later v2 (adds delivery_window). Design a rollout plan so existing consumers keep working, new consumers get v2, and ordering for a given order_id is preserved. Write the event headers and partitioning choice. Keep it idempotent.
Next steps
- Add metrics: relay lag, publish failures, oldest NEW event age.
- Introduce dead-letter/parking for non-retryable failures with a manual replay tool.
- Explore CDC-based relays when latency or throughput grows.