How to learn Exactly Once At Least Once Semantics for Streaming Platform Basics in Data Platform Engineer for free

Why this matters

As a Data Platform Engineer, you will choose and implement delivery semantics for streaming pipelines. This impacts money movement, counters, search indexes, inventory, recommendations, and alerts. You will:

Decide where at-least-once is fine and where exactly-once is mandatory.
Configure Kafka producers/consumers and streaming jobs to avoid duplicates and data loss.
Design idempotent writes and deduplication at sinks like Kafka, object storage, databases, and Elasticsearch.
Operate pipelines safely during restarts, redeploys, and failures.

Quick Test is available to everyone; sign in to save your progress.

Concept explained simply

Delivery semantics describe how many times an event's effect happens at the destination:

At-most-once: deliver 0 or 1 time. Fast, but can lose messages.
At-least-once: deliver 1 or more times. No loss, duplicates possible.
Exactly-once (effectively-once): each event's effect happens once. No loss, no duplicates at the sink.

Important: "Exactly-once" in streaming is about the effect, not that the event was only delivered once over the network. Retries may happen; idempotency and transactions ensure the final state is as if each event applied once.

Mental model

Imagine dropping numbered coins into a box:

At-least-once: you may drop coin #42 twice; the box holds two coins unless the box rejects duplicates.
Exactly-once: the box accepts coin #42 only once, even if you try twice.
The "box" is your sink (Kafka topic, DB table, object store). The key to exactly-once is making the sink idempotent or transactional.

Implementation patterns

KAFKA exactly-once overview (producer and processors)

Idempotent producer: enable.idempotence=true, acks=all, retries>0, max.in.flight.requests.per.connection<=5 reduces reordering.
Transactions: producer sets transactional.id; beginTransaction() → send() → sendOffsetsToTransaction() → commitTransaction(). This ties produced records and consumed offsets atomically → no lost or duplicated effects between Kafka source and Kafka sink.
Processors (e.g., Flink) use checkpointing + transactional sinks to align state and committed outputs.

Flink exactly-once with Kafka

Enable checkpointing with exactly-once mode.
Use Kafka source with committed offsets on checkpoints.
Use Kafka sink with two-phase commit; Flink commits the transaction only after checkpoint success.
End-to-end exactly-once holds for Kafka→Flink→Kafka. For external sinks, the sink must provide idempotency or two-phase commit.

Idempotent sinks and dedup

Databases: upserts, primary key constraints, INSERT ... ON CONFLICT DO NOTHING/UPDATE semantics.
Object storage: write files with deterministic names (event_id/date) and commit manifests once.
Search: upserts by document id instead of increments.
Stateful dedup: keep a set of seen event_ids with TTL; beware memory growth and late arrivals.

Choosing semantics by use case

Billing, inventory, payouts: exactly-once or idempotent writes required.
Search index, cache rebuilds, downstream read models: at-least-once with idempotent upsert is typically fine.
Metrics counters and aggregates: at-least-once plus dedup, or design for idempotent aggregation (e.g., recompute per window, not running increments).
Alerts/notifications: at-least-once with suppression (dedup by alert key and time window).

Worked examples

Example 1: Payments to ledger DB

Goal: each payment event should affect the ledger exactly once.

Source: Kafka topic payments.
Process: Flink job with checkpointing.
Sink: Ledger DB table with primary key payment_id.
Approach: Use at-least-once transport + idempotent sink (UPSERT by payment_id). If using Kafka sink intermediate, use Kafka transactions; the final DB write remains idempotent.
Result: even if a payment is retried, the DB row for payment_id is updated once.

Example 2: Page view counts to Redis

Naive increments (INCR) are not idempotent. To get effectively-once:

Generate stable event_id per page view.
Dedup before increment: keep a short TTL set of seen event_ids per user/session or per page. If not seen → INCR; else skip.
Alternative: aggregate counts in Flink per window and write final window totals idempotently.

Example 3: CDC to Elasticsearch

Replicate DB changes to search:

Process CDC events with Kafka→Flink→Elasticsearch.
Use document id = primary key. Upsert documents (replace) rather than doing arithmetic updates.
Duplicates overwrite the same doc → effectively-once effect in the index.

Configuration cheat-sheet

Kafka producer (idempotent): enable.idempotence=true, acks=all, retries>0, max.in.flight.requests.per.connection<=5
Kafka producer (transactions): transactional.id set; beginTransaction → send → sendOffsetsToTransaction → commitTransaction
Flink: enable checkpointing (exactly-once), use KafkaSource with committed offsets on checkpoints, KafkaSink with two-phase commit
DB sink: use primary keys, UPSERT/merge, or INSERT ON CONFLICT DO NOTHING/UPDATE

Observability tips

Track duplicate rate: ratio of events with already-seen event_id at sink.
Missing rate proxy: compare counts between source and sink per time window.
Monitor transactional aborts, commit latency, checkpoint duration, and consumer group lag.

Common mistakes and self-check

Assuming exactly-once without an idempotent or transactional sink. Self-check: can your sink accept the same event twice without double effect?
Forgetting sendOffsetsToTransaction in Kafka processors. Self-check: after crash, could messages be reprocessed without corresponding output?
Using non-deterministic operations (e.g., random values) inside stateful processing that will replay differently. Self-check: are results deterministic on replay?
No stable event_id. Self-check: does each event carry an immutable key that uniquely identifies it?
Unbounded dedup state. Self-check: do you enforce TTLs/windows, and handle late events explicitly?

Exercises

Do these before the quick test. They mirror the tasks below. You can check your answers in the collapsible solutions.

Exercise 1: Pick semantics per pipeline

For three pipelines, choose at-least-once or exactly-once and justify:

Orders topic → Flink → Payments topic → Ledger DB
Clickstream topic → Flink → Redis counters for dashboard
Products CDC topic → Flink → Search index

Show guidance

Think about monetary impact, idempotency of the sink, and user-facing tolerance to duplicates.

Exercise 2: Design dedup strategy

You process events with fields: event_id, user_id, ts, action. Design dedup so each event updates analytics once when writing to a DB sink.

Describe the key, state/TTL, and DB write pattern.

Show guidance

Prefer a stable event_id. Consider windowed state vs long TTL. DB upsert vs insert-ignore.

I identified which pipelines need exactly-once effects
I specified an event_id or composite key
I chose idempotent write semantics at the sink
I considered state TTL and late events

Mini challenge

Pick one of your existing or hypothetical pipelines. In 5 sentences, define: the required semantics, the idempotent key, the sink write mode, how retries are safe, and how you will measure duplicate rate.

Who this is for

Aspiring or current Data Platform Engineers building Kafka/Flink/Spark Streaming pipelines.
Data Engineers who need reliable real-time outputs.

Prerequisites

Basic understanding of Kafka topics, partitions, and consumer groups.
Familiarity with at-least-once delivery and checkpoints in a stream processor.
Ability to design primary keys and upserts in a database.

Learning path

Grasp semantics and idempotency (this lesson).
Practice Kafka idempotent producer and transactions in a sandbox.
Build a small Flink job with exactly-once Kafka sink and verify on restart.
Add an idempotent DB sink and measure duplicates and missing rates.

Practical projects

Kafka→Flink→Kafka pipeline with EOS: simulate failures, prove no duplicates at the sink topic.
Clickstream dedup: implement event_id dedup with TTL in Flink and write upserts to a DB.
CDC to search: upsert documents by primary key and validate effect-only-once with duplicate replays.

Next steps

Run the Quick Test below to check your understanding. Everyone can take it; sign in to save your progress.
Apply one project idea this week; document your decisions and metrics.

Menu

Exactly Once At Least Once Semantics

Table of Contents