Apache Kafka for Developers: Event Streaming Fundamentals
Kafka processes millions of events per second across thousands of companies. It's used for real-time data pipelines, event sourcing, log aggregation, and inter-service communication at scale. Understanding Kafka's model is valuable even if you're not running it at Google scale — the concepts apply at any size, and Kafka works well at moderate scale too.
Core Concepts
Event (Message): A record of something that happened. Has a key, value, timestamp, and optional headers.
Topic: A logical channel for events. Similar to a database table name but for event streams. Topics are partitioned for scalability.
Partition: A topic is divided into N partitions. Messages within a partition are ordered. A topic with 4 partitions can be consumed by up to 4 consumers in parallel (within a consumer group).
Producer: Application that writes events to a topic.
Consumer: Application that reads events from a topic. Maintains its position (offset) in the stream.
Consumer Group: Multiple consumers that collectively process a topic. Each partition is assigned to exactly one consumer in a group. Scale out by adding consumers (up to the number of partitions).
Offset: The position of a message within a partition. Consumers commit offsets to track their progress. On restart, they resume from the last committed offset.
Retention: Kafka retains messages for a configurable period (default 7 days) regardless of whether they've been consumed. Old messages are available to replay.
Local Development Setup
Docker Compose for local Kafka:
services:
kafka:
image: confluentinc/cp-kafka:latest
hostname: kafka
ports:
- 9092:9092
environment:
KAFKA_NODE_ID: 1
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092'
KAFKA_PROCESS_ROLES: 'broker,controller'
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka:29093'
KAFKA_LISTENERS: 'PLAINTEXT://kafka:29092,CONTROLLER://kafka:29093,PLAINTEXT_HOST://0.0.0.0:9092'
KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
kafka-ui:
image: provectuslabs/kafka-ui:latest
ports:
- 8080:8080
environment:
KAFKA_CLUSTERS_0_NAME: local
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:29092
depends_on:
- kafka
This runs KRaft mode Kafka (no ZooKeeper dependency) with the Kafka UI dashboard.
Producing Messages (Node.js / TypeScript)
Using kafkajs:
npm install kafkajs
import { Kafka, Partitioners } from "kafkajs";
const kafka = new Kafka({
clientId: "my-producer",
brokers: ["localhost:9092"],
});
const producer = kafka.producer({
createPartitioner: Partitioners.LegacyPartitioner,
});
await producer.connect();
// Send a single message
await producer.send({
topic: "user-events",
messages: [
{
key: "user-123", // Used for partitioning
value: JSON.stringify({
event: "user.registered",
userId: "user-123",
email: "[email protected]",
timestamp: new Date().toISOString(),
}),
headers: {
"content-type": "application/json",
"source": "user-service",
},
},
],
});
// Batch sending (more efficient)
await producer.send({
topic: "order-events",
messages: events.map(event => ({
key: event.orderId,
value: JSON.stringify(event),
})),
});
await producer.disconnect();
Consuming Messages
import { Kafka, Consumer, EachMessagePayload } from "kafkajs";
const kafka = new Kafka({
clientId: "my-consumer",
brokers: ["localhost:9092"],
});
const consumer = kafka.consumer({
groupId: "order-processing-service", // Consumer group ID
});
await consumer.connect();
await consumer.subscribe({
topic: "order-events",
fromBeginning: false, // true = read all historical messages on first run
});
await consumer.run({
autoCommit: true, // Commit offsets automatically after processing
eachMessage: async ({ topic, partition, message }: EachMessagePayload) => {
const key = message.key?.toString();
const value = message.value?.toString();
if (!value) return;
const event = JSON.parse(value);
console.log(`Processing ${event.event} from partition ${partition}`);
try {
await processEvent(event);
} catch (error) {
// Handle processing errors — log to dead letter queue, etc.
console.error("Processing failed:", error);
}
},
});
Consumer Groups and Scaling
Multiple instances of a service in the same consumer group split partitions:
Topic: order-events (4 partitions)
Consumer Group: order-service
Instance 1: handles partitions 0, 1
Instance 2: handles partitions 2, 3
Scale to 4 instances → each handles 1 partition. Scale to 5+ instances → extra instances are idle (no partition to consume).
Implication: Topic partition count limits consumer group parallelism. Size partitions based on expected max throughput.
Key Selection and Partitioning
Message keys determine which partition receives the message:
- Same key → same partition → ordering guaranteed for that key
- Different keys → potentially different partitions → no cross-key ordering
For event sourcing: use the entity ID as the key (e.g., userId, orderId). All events for the same entity land on the same partition in order.
// All events for user-123 go to the same partition
await producer.send({
topic: "user-events",
messages: [{ key: "user-123", value: JSON.stringify(event) }],
});
Exactly-Once vs At-Least-Once Delivery
At-least-once (default): Message is processed at least once, possibly multiple times if consumer crashes after processing but before committing the offset. Make your consumers idempotent.
Exactly-once: Kafka supports transactional exactly-once semantics between producers and consumers. More complex to implement, necessary for financial or critical applications.
For most use cases, at-least-once with idempotent consumers is the right approach.
When to Use Kafka
Use Kafka when:
- High throughput (millions of events/day or more)
- Multiple consumers of the same events (fan-out)
- Event replay needed (audit log, event sourcing)
- Message retention for days/weeks important
- Ordered processing per entity (using message keys)
Use a simpler queue (RabbitMQ, BullMQ, SQS) when:
- Work queue pattern (one consumer per job)
- Simple retry logic with dead letter queues
- Lower throughput requirements
- Simpler operational overhead matters
- Messages should be deleted after consumption
Kafka's overhead (partitions, consumer groups, replication) is worth it at scale. For "process this job and acknowledge it", a simpler queue is often better.
Kafka UI Tools
- Kafka UI (docker: provectuslabs/kafka-ui): Web UI for browsing topics, messages, consumer groups
- Conduktor Desktop: Free GUI client for local Kafka
- kafka-ui by Redpanda: Alternative web interface
- kafkacat / kcat: CLI tool for producing/consuming
# CLI: list topics
kcat -b localhost:9092 -L
# CLI: consume from topic
kcat -b localhost:9092 -t my-topic -C -o beginning