luvv to helpDiscover the Best Free Online Tools
Topic 5 of 8

Exactly Once At Least Once Semantics

Learn Exactly Once At Least Once Semantics for free with explanations, exercises, and a quick test (for Data Platform Engineer).

Published: January 11, 2026 | Updated: January 11, 2026

Why this matters

As a Data Platform Engineer, you will choose and implement delivery semantics for streaming pipelines. This impacts money movement, counters, search indexes, inventory, recommendations, and alerts. You will:

  • Decide where at-least-once is fine and where exactly-once is mandatory.
  • Configure Kafka producers/consumers and streaming jobs to avoid duplicates and data loss.
  • Design idempotent writes and deduplication at sinks like Kafka, object storage, databases, and Elasticsearch.
  • Operate pipelines safely during restarts, redeploys, and failures.

Quick Test is available to everyone; sign in to save your progress.

Concept explained simply

Delivery semantics describe how many times an event's effect happens at the destination:

  • At-most-once: deliver 0 or 1 time. Fast, but can lose messages.
  • At-least-once: deliver 1 or more times. No loss, duplicates possible.
  • Exactly-once (effectively-once): each event's effect happens once. No loss, no duplicates at the sink.

Important: "Exactly-once" in streaming is about the effect, not that the event was only delivered once over the network. Retries may happen; idempotency and transactions ensure the final state is as if each event applied once.

Mental model

Imagine dropping numbered coins into a box:

  • At-least-once: you may drop coin #42 twice; the box holds two coins unless the box rejects duplicates.
  • Exactly-once: the box accepts coin #42 only once, even if you try twice.
  • The "box" is your sink (Kafka topic, DB table, object store). The key to exactly-once is making the sink idempotent or transactional.

Implementation patterns

KAFKA exactly-once overview (producer and processors)
  • Idempotent producer: enable.idempotence=true, acks=all, retries>0, max.in.flight.requests.per.connection<=5 reduces reordering.
  • Transactions: producer sets transactional.id; beginTransaction() → send() → sendOffsetsToTransaction() → commitTransaction(). This ties produced records and consumed offsets atomically → no lost or duplicated effects between Kafka source and Kafka sink.
  • Processors (e.g., Flink) use checkpointing + transactional sinks to align state and committed outputs.
Flink exactly-once with Kafka
  • Enable checkpointing with exactly-once mode.
  • Use Kafka source with committed offsets on checkpoints.
  • Use Kafka sink with two-phase commit; Flink commits the transaction only after checkpoint success.
  • End-to-end exactly-once holds for Kafka→Flink→Kafka. For external sinks, the sink must provide idempotency or two-phase commit.
Idempotent sinks and dedup
  • Databases: upserts, primary key constraints, INSERT ... ON CONFLICT DO NOTHING/UPDATE semantics.
  • Object storage: write files with deterministic names (event_id/date) and commit manifests once.
  • Search: upserts by document id instead of increments.
  • Stateful dedup: keep a set of seen event_ids with TTL; beware memory growth and late arrivals.

Choosing semantics by use case

  • Billing, inventory, payouts: exactly-once or idempotent writes required.
  • Search index, cache rebuilds, downstream read models: at-least-once with idempotent upsert is typically fine.
  • Metrics counters and aggregates: at-least-once plus dedup, or design for idempotent aggregation (e.g., recompute per window, not running increments).
  • Alerts/notifications: at-least-once with suppression (dedup by alert key and time window).

Worked examples

Example 1: Payments to ledger DB

Goal: each payment event should affect the ledger exactly once.

  • Source: Kafka topic payments.
  • Process: Flink job with checkpointing.
  • Sink: Ledger DB table with primary key payment_id.
  • Approach: Use at-least-once transport + idempotent sink (UPSERT by payment_id). If using Kafka sink intermediate, use Kafka transactions; the final DB write remains idempotent.
  • Result: even if a payment is retried, the DB row for payment_id is updated once.

Example 2: Page view counts to Redis

Naive increments (INCR) are not idempotent. To get effectively-once:

  • Generate stable event_id per page view.
  • Dedup before increment: keep a short TTL set of seen event_ids per user/session or per page. If not seen → INCR; else skip.
  • Alternative: aggregate counts in Flink per window and write final window totals idempotently.

Example 3: CDC to Elasticsearch

Replicate DB changes to search:

  • Process CDC events with Kafka→Flink→Elasticsearch.
  • Use document id = primary key. Upsert documents (replace) rather than doing arithmetic updates.
  • Duplicates overwrite the same doc → effectively-once effect in the index.

Configuration cheat-sheet

  • Kafka producer (idempotent): enable.idempotence=true, acks=all, retries>0, max.in.flight.requests.per.connection<=5
  • Kafka producer (transactions): transactional.id set; beginTransaction → send → sendOffsetsToTransaction → commitTransaction
  • Flink: enable checkpointing (exactly-once), use KafkaSource with committed offsets on checkpoints, KafkaSink with two-phase commit
  • DB sink: use primary keys, UPSERT/merge, or INSERT ON CONFLICT DO NOTHING/UPDATE

Observability tips

  • Track duplicate rate: ratio of events with already-seen event_id at sink.
  • Missing rate proxy: compare counts between source and sink per time window.
  • Monitor transactional aborts, commit latency, checkpoint duration, and consumer group lag.

Common mistakes and self-check

  • Assuming exactly-once without an idempotent or transactional sink. Self-check: can your sink accept the same event twice without double effect?
  • Forgetting sendOffsetsToTransaction in Kafka processors. Self-check: after crash, could messages be reprocessed without corresponding output?
  • Using non-deterministic operations (e.g., random values) inside stateful processing that will replay differently. Self-check: are results deterministic on replay?
  • No stable event_id. Self-check: does each event carry an immutable key that uniquely identifies it?
  • Unbounded dedup state. Self-check: do you enforce TTLs/windows, and handle late events explicitly?

Exercises

Do these before the quick test. They mirror the tasks below. You can check your answers in the collapsible solutions.

Exercise 1: Pick semantics per pipeline

For three pipelines, choose at-least-once or exactly-once and justify:

  1. Orders topic → Flink → Payments topic → Ledger DB
  2. Clickstream topic → Flink → Redis counters for dashboard
  3. Products CDC topic → Flink → Search index
Show guidance
  • Think about monetary impact, idempotency of the sink, and user-facing tolerance to duplicates.

Exercise 2: Design dedup strategy

You process events with fields: event_id, user_id, ts, action. Design dedup so each event updates analytics once when writing to a DB sink.

  • Describe the key, state/TTL, and DB write pattern.
Show guidance
  • Prefer a stable event_id. Consider windowed state vs long TTL. DB upsert vs insert-ignore.
  • I identified which pipelines need exactly-once effects
  • I specified an event_id or composite key
  • I chose idempotent write semantics at the sink
  • I considered state TTL and late events

Mini challenge

Pick one of your existing or hypothetical pipelines. In 5 sentences, define: the required semantics, the idempotent key, the sink write mode, how retries are safe, and how you will measure duplicate rate.

Who this is for

  • Aspiring or current Data Platform Engineers building Kafka/Flink/Spark Streaming pipelines.
  • Data Engineers who need reliable real-time outputs.

Prerequisites

  • Basic understanding of Kafka topics, partitions, and consumer groups.
  • Familiarity with at-least-once delivery and checkpoints in a stream processor.
  • Ability to design primary keys and upserts in a database.

Learning path

  1. Grasp semantics and idempotency (this lesson).
  2. Practice Kafka idempotent producer and transactions in a sandbox.
  3. Build a small Flink job with exactly-once Kafka sink and verify on restart.
  4. Add an idempotent DB sink and measure duplicates and missing rates.

Practical projects

  1. Kafka→Flink→Kafka pipeline with EOS: simulate failures, prove no duplicates at the sink topic.
  2. Clickstream dedup: implement event_id dedup with TTL in Flink and write upserts to a DB.
  3. CDC to search: upsert documents by primary key and validate effect-only-once with duplicate replays.

Next steps

  • Run the Quick Test below to check your understanding. Everyone can take it; sign in to save your progress.
  • Apply one project idea this week; document your decisions and metrics.

Practice Exercises

2 exercises to complete

Instructions

For each pipeline, pick at-least-once or exactly-once and explain why in 2–3 sentences:

  1. Orders → Flink → Payments topic → Ledger DB
  2. Clickstream → Flink → Redis counters
  3. Products CDC → Flink → Search index
Expected Output
A short rationale per pipeline with the chosen semantics and a one-line sink strategy (upsert/transaction/dedup).

Exactly Once At Least Once Semantics — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Exactly Once At Least Once Semantics?

AI Assistant

Ask questions about this tool