How to learn Kafka Concepts Topics Partitions for Streaming Platform Basics in Data Platform Engineer for free

Why this matters

As a Data Platform Engineer, you often design topics, choose partition counts, and set replication and retention so teams can reliably stream data at scale. Getting topics and partitions right impacts throughput, ordering guarantees, cost, and the ability for multiple consumers to process data in parallel.

Real task: Size a topic for clickstream traffic with peak bursts.
Real task: Preserve per-customer ordering while scaling consumers.
Real task: Tune replication and ISR to withstand broker failures without data loss.

Who this is for

Aspiring and current Data Platform Engineers
Data Engineers building real-time pipelines
Backend Engineers integrating services with Kafka

Prerequisites

Basic understanding of distributed systems (brokers, replication)
Familiarity with producers/consumers concepts
Comfort with throughput and storage units (KB/MB/GB)

Concept explained simply

Topic: a named data stream (like a file category) where producers write events and consumers read them.

Partition: a topic is split into partitions. Each partition is an append-only log with strictly increasing offsets. Ordering is guaranteed within a partition, not across partitions.

Key: used by the partitioner to route messages. With the default partitioner, all messages with the same key go to the same partition (important for per-key ordering).

Replication factor (RF): number of copies of each partition across brokers. One replica is the leader; others are followers. Consumers and producers interact with the leader.

Consumer group: multiple consumers sharing a group read a topic's partitions in parallel. Each partition is consumed by at most one consumer in the same group at a time.

Retention: how long Kafka keeps data. Kafka stores data for a configured time/size, independent of whether consumers read it.

Compaction (optional): keeps the latest record per key and discards older ones, useful for changelog-style topics.

Mental model

Picture a topic as a highway made of multiple lanes (partitions). Each lane has a strict order of cars (messages). Cars with the same plate (key) always use the same lane, so their order is preserved. More lanes mean more throughput and more cars driving in parallel, but a single lane preserves the ordering. Replication are parallel highways in different locations: one is open (leader), others mirror it and take over if needed.

Quick self-check: Do I need more partitions?

Need more parallel consumers? Add partitions.
Need higher write/read throughput? Add partitions (until broker/network limits).
Need strict global order across all messages? Keep one partition (but throughput limited).

Worked examples

Example 1: Clickstream topic

Requirement: Peak 300k events/sec, ~1 KB each. Want parallel processing with 10–20 consumer instances and room to grow. Target per-partition throughput around 5 MB/s.

Peak write: 300k KB/s ≈ 300 MB/s
Partitions for throughput: 300 / 5 = 60 → round up to 64 partitions
RF = 3 for resilience; min.insync.replicas (ISR) = 2
Key = user_id to preserve per-user order

Outcome: 64 partitions allow parallel consumers and handle bursts while preserving per-user order.

Example 2: IoT devices

Requirement: 50k devices, ~1 msg/sec each, 500 B per message. Need per-device ordering. Team runs 25 consumers.

Throughput: 50k * 0.5 KB/s = 25 MB/s
Parallelism: at least 25 partitions for 25 consumers
Throughput target 3 MB/s per partition → 25 / 3 ≈ 9 (parallelism is higher driver)
Choose 36 partitions to cover 25 consumers plus growth
Key = device_id to keep ordering per device

Outcome: 36 partitions balance parallelism, ordering, and headroom.

Example 3: Payments with ordering constraints

Requirement: Strict per-account ordering (credits/debits). Traffic: 15 MB/s peak. Only 4 consumers allowed by downstream service limits.

At least 4 partitions (one per consumer)
Throughput per partition: 15 / 4 ≈ 3.75 MB/s (acceptable)
Key by account_id to preserve ordering
Avoid increasing partitions later unless you can handle potential key-to-partition remapping impacts

Outcome: 4 partitions with account_id key ensures per-account ordering under consumer constraints.

Sizing partitions: quick method

Compute peak write rate in MB/s.
Pick a safe per-partition throughput target (commonly 2–10 MB/s; depends on hardware).
Partitions for throughput = peak MB/s / per-partition target.
Partitions for parallelism = max concurrent consumers in a group.
Choose the larger of the two, then round up (often to a convenient number like 24, 36, 48, 64).
Validate with a small load test; adjust.

Tip: Replication and capacity

With RF=3, each partition’s data is stored three times. Size storage and network for write amplification. Set min.insync.replicas to at least 2 for durability under one-broker failure.

Ordering and keys

Ordering is guaranteed only within a partition.
Using a key routes same-key messages to the same partition (default partitioner), preserving per-key order.
Changing the number of partitions can change key-to-partition mapping for future messages. Be cautious when you rely on stable per-key ordering across time.

How to choose a key

Need per-customer ordering? Key = customer_id.
Need even load and no ordering constraints? Consider a randomized or round-robin key (or null key) but expect no cross-message order.
Hot keys (very popular IDs) can overload a single partition; consider composite keys or sharding schemes.

Common mistakes and self-check

Mistake: Too few partitions → consumers bottlenecked, long catch-up times.
Self-check: Are consumers regularly at 100% CPU or lag growing at peaks?
Mistake: Too many partitions → high metadata overhead, slow rebalances.
Self-check: Are broker/controller metrics and rebalance times spiking after increases?
Mistake: No key when per-entity order is required.
Self-check: Do you observe out-of-order processing for the same entity?
Mistake: RF=1 in production.
Self-check: Can a single broker failure lose data? If yes, raise RF and set min.insync.replicas accordingly.
Mistake: Expanding partitions without considering key remapping.
Self-check: Do consumers rely on historical per-key order across partition changes?

Practical projects

Design a topic plan for three workloads: clickstream, IoT sensors, and payments. Document keys, partition counts, RF, retention, and expected consumer groups.
Create a partition sizing spreadsheet that takes peak EPS and message size to propose partition counts.
Develop a small producer + consumer demo that verifies per-key ordering and shows how increasing partitions changes key-to-partition mapping.

Exercises

These exercises are also available below as interactive blocks. Complete them here and then take the Quick Test. The test is available to everyone; log in to save your progress.

Exercise 1: Clickstream sizing

Traffic: average 120k events/sec, peak 300k events/sec for 10 minutes. Each event is ~1 KB. Target per-partition throughput 5 MB/s. You expect up to 12 consumers in one group and need room to grow. Propose:

Partition count
Replication factor and min.insync.replicas
Key choice and reason

Expected output: A short configuration proposal with concrete numbers.

Show solution

Peak write ≈ 300 MB/s. Partitions for throughput = 300 / 5 = 60 → round up to 64 for headroom. RF=3; min.insync.replicas=2. Key by user_id to preserve per-user ordering. 64 partitions support 12 consumers comfortably and allow growth.

Exercise 2: IoT ordering and parallelism

You manage 50k devices sending ~1 msg/sec at 500 B. You need per-device ordering. Team runs 25 consumers now; plan for growth. Decide:

Key
Partition count
Any notes about future expansion and ordering

Expected output: A brief plan with rationale.

Show solution

Key=device_id to preserve order. Throughput ≈ 25 MB/s; parallelism requires ≥25 partitions. Choose ~36 partitions to allow growth and smoother assignment. Note: Increasing partitions later can change key-to-partition mapping for future messages; communicate to downstreams if they rely on historical per-key ordering.

Exercise completion checklist

I calculated peak MB/s and mapped it to partitions.
I justified replication factor and ISR for durability.
I chose keys based on ordering needs and hot-key risk.
I considered consumer parallelism and future growth.

Mini challenge

Design a topic for order events: 40 MB/s peak, strict per-order-id ordering, 20 consumers today, potential to double next quarter. List your partition count, key, RF, and a note on what happens if you expand partitions later.

Hint

Balance parallelism (≥ consumers) with per-partition throughput. Key by order_id. Consider impact of partition expansion on key mapping.

Learning path

Start: Kafka basics (brokers, producers/consumers)
This subskill: Topics, partitions, keys, ordering, and sizing
Next: Delivery semantics, consumer groups tuning, and rebalancing
Later: Retention, compaction, and schema evolution

Next steps

Complete the Quick Test below to check your understanding. The test is available to everyone; log in to save your progress.
Apply the sizing method to one of your real workloads.
Document topic conventions (naming, defaults for RF/ISR, retention) for your team.

Menu

Kafka Concepts Topics Partitions

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Worked examples

Example 1: Clickstream topic

Example 2: IoT devices

Example 3: Payments with ordering constraints

Sizing partitions: quick method

Ordering and keys

Common mistakes and self-check

Practical projects

Exercises

Exercise completion checklist

Mini challenge

Learning path

Next steps

Practice Exercises

Clickstream sizing

Instructions

Expected Output

IoT ordering and parallelism

Kafka Concepts Topics Partitions — Quick Test

Have questions about Kafka Concepts Topics Partitions?

AI Assistant