Why this matters
Backend Engineers regularly deal with traffic spikes, slow or flaky integrations, and long-running work. Message queues help you:
- Decouple services so one service can continue even if another is slow or temporarily down.
- Absorb spikes and smooth load with buffering and asynchronous processing.
- Increase reliability via retries, dead-letter queues (DLQs), and backoff policies.
- Control ordering and concurrency for data consistency and predictable flows.
Real tasks you will face:
- Sending thousands of emails/SMS during a campaign without overloading providers.
- Processing images, invoices, or reports without blocking API requests.
- Handling webhooks and ensuring idempotency to avoid double-charging or duplicate records.
- Scaling consumers when queue lag grows, then scaling back down after the surge.
Concept explained simply
A message queue is a buffer between producers (who send messages) and consumers (who process them).
- Producer: sends a message (work item).
- Queue/Topic: stores messages until consumed.
- Consumer: pulls or receives messages and processes them.
- Acknowledgment (ack): consumer confirms processing so the queue can remove the message.
- Retry & DLQ: failed messages can be retried; persistent failures go to a separate dead-letter queue.
Mental model
Imagine a post office:
- Mail slot (producer) drops letters (messages).
- Sorting center (queue) holds letters until sorters (consumers) pick them up.
- Each sorter marks processed letters (ack). If a sorter drops a letter, it is returned to sorting (retry). Letters that keep failing go to a special bin (DLQ) for investigation.
Delivery semantics
- At-most-once: no retries; messages may be lost but are never duplicated.
- At-least-once: retries happen; duplicates are possible; most common for reliability.
- Exactly-once (effectively-once): achieved by combining at-least-once delivery with idempotent consumers and deduplication.
Ordering
- FIFO queue: strict order but limited throughput.
- Partitioned topics: ordering per partition key (e.g., user_id) while allowing high throughput.
Key building blocks
- Queue vs Topic: Queue sends each message to one consumer. Topic broadcasts to consumer groups; each group processes the message once.
- Push vs Pull: Push delivers to consumers; Pull lets consumers fetch at their pace.
- Visibility timeout: time during which a message is hidden after delivery; if not acked, it becomes visible again.
- Prefetch/Max in-flight: how many messages a consumer receives ahead of time to keep workers busy.
- Batching: ack or process messages in groups to improve throughput.
- Idempotency keys: detect duplicates so reprocessing is safe.
- Backoff with jitter: retry after increasing delays plus randomness to avoid thundering herds.
- Dead-letter queue (DLQ): isolate poison messages after a retry limit.
Worked examples
1) Email sending without overloading provider
- API receives a request to send N emails. It enqueues N messages.
- Consumers process messages and call the email provider respecting rate limits (e.g., 50 req/s).
- On success: ack. On transient failure: retry with backoff. On repeated failure: move to DLQ.
Notes: Use at-least-once delivery. Make the consumer idempotent using a message id to avoid duplicate emails.
2) Image processing pipeline
- Upload triggers a message with file location and desired transformations.
- Consumers download, transform, and upload results, then ack.
- Partition key by user_id keeps each users images in order if needed.
Notes: For large files, pass references not blobs; use retries for network hiccups; DLQ for corrupted files.
3) Payment webhooks
- Webhook receiver enqueues an event quickly and returns 200 to the provider.
- Consumer applies business logic using an idempotency key (event_id) and a processed-events table.
- Duplicates are safely ignored; failures retry; suspicious events go to DLQ.
4) Log/metrics ingestion
- Apps produce events with partition key by service_id.
- Consumers aggregate and batch-write to storage.
- Lag-based autoscaling adds consumers when lag or processing time grows.
Design decisions checklist
- [ ] What delivery semantics do you need? (at-most, at-least, effectively-once)
- [ ] Is ordering required globally or per key?
- [ ] How will you ensure idempotency?
- [ ] What are your rate limits and throughput targets?
- [ ] Visibility timeout sized for longest processing path?
- [ ] Retry policy (count, backoff, jitter) and DLQ threshold defined?
- [ ] Autoscaling signals (lag, processing time, error rate)?
- [ ] Observability: metrics, logs, DLQ inspection workflow?
Exercises
Complete these exercises, then check your answers below. You can take the quick test afterward. Note: The quick test is available to everyone; only logged-in users get saved progress.
Exercise 1 Design a resilient email queue
Goal: Handle campaign spikes without exceeding 50 req/s provider limit, ensure no message loss, and avoid duplicate emails.
- Choose delivery semantics.
- Pick partitioning/ordering strategy.
- Define retry + backoff + DLQ.
- Explain idempotency handling.
Hints
- Think at-least-once + idempotency.
- Partition by recipient domain or user_id for balanced throughput.
- Rate limit inside consumers with a token bucket or sleep between calls.
Exercise 2 Throughput math and backpressure
Given:
- Incoming: 200 messages/second.
- Processing time: 250 ms per message.
- Each consumer instance has 4 workers (one message per worker at a time).
- Ack in batches of 10.
- Target 30% headroom (provision for 1.3x peak).
Compute:
- Minimum consumer instances.
- Reasonable prefetch per instance.
- Retry backoff sequence and DLQ threshold.
Hints
- Throughput per worker 1 / 0.25s.
- Multiply by workers and headroom.
- Prefetch should keep all workers busy for at least ~1 second.
Common mistakes and self-check
- Mistake: Assuming exactly-once from the queue. Fix: Design idempotent consumers with dedup keys.
- Mistake: Global ordering when not needed. Fix: Order per key (e.g., per user) to scale.
- Mistake: Too-short visibility timeout. Fix: Set to worst-case processing time plus buffer.
- Mistake: Infinite retries. Fix: Use capped retries and DLQ with alerts.
- Mistake: No rate limiting. Fix: Enforce provider limits inside consumers.
- Mistake: Ignoring observability. Fix: Track lag, retry counts, DLQ size, processing latency.
Self-check prompts
- Can I restart consumers without losing or duplicating side effects?
- Do I know how to reprocess DLQ messages safely?
- What metrics will trigger scaling up or down?
Practical projects
- Build a job queue for sending notifications with retry, DLQ, and idempotency checks.
- Create an image resize pipeline with partitioned ordering by user and batch acknowledgments.
- Implement a webhook handler that stores event_ids and safely ignores duplicates.
Mini challenge
Design a queueing plan for ride requests during surge hours. Requirements: keep per-rider request order, support 10x spikes, and ensure retries dont assign two drivers to the same ride. Write your partition key choice, consumer scaling trigger, and idempotency strategy.
Who this is for
- Backend Engineers who need reliable async processing.
- Engineers preparing for system design interviews.
Prerequisites
- Comfort with HTTP APIs and REST basics.
- Basic understanding of concurrency and processes/threads.
- Familiarity with persistence concepts (databases or key-value stores).
Learning path
- Before: Scalability and reliability fundamentals.
- Now: Message Queues Basics (this lesson).
- Next: Stream processing, event-driven patterns, and autoscaling strategies.
Next steps
- Do the exercises and review the solutions below.
- Take the quick test to confirm understanding.
- Pick one practical project and implement it end-to-end.