Apache Kafka for Developers: Event Streaming Fundamentals

Infrastructure 2026-03-04 · 4 min read kafka event streaming message queue node.js typescript distributed systems developer tools pub-sub

Kafka processes millions of events per second across thousands of companies. It's used for real-time data pipelines, event sourcing, log aggregation, and inter-service communication at scale. Understanding Kafka's model is valuable even if you're not running it at Google scale — the concepts apply at any size, and Kafka works well at moderate scale too.

Core Concepts

Event (Message): A record of something that happened. Has a key, value, timestamp, and optional headers.

Topic: A logical channel for events. Similar to a database table name but for event streams. Topics are partitioned for scalability.

Partition: A topic is divided into N partitions. Messages within a partition are ordered. A topic with 4 partitions can be consumed by up to 4 consumers in parallel (within a consumer group).

Producer: Application that writes events to a topic.

Consumer: Application that reads events from a topic. Maintains its position (offset) in the stream.

Consumer Group: Multiple consumers that collectively process a topic. Each partition is assigned to exactly one consumer in a group. Scale out by adding consumers (up to the number of partitions).

Offset: The position of a message within a partition. Consumers commit offsets to track their progress. On restart, they resume from the last committed offset.

Retention: Kafka retains messages for a configurable period (default 7 days) regardless of whether they've been consumed. Old messages are available to replay.

Local Development Setup

Docker Compose for local Kafka:

services:
  kafka:
    image: confluentinc/cp-kafka:latest
    hostname: kafka
    ports:
      - 9092:9092
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092'
      KAFKA_PROCESS_ROLES: 'broker,controller'
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka:29093'
      KAFKA_LISTENERS: 'PLAINTEXT://kafka:29092,CONTROLLER://kafka:29093,PLAINTEXT_HOST://0.0.0.0:9092'
      KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
      KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
      CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1

  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    ports:
      - 8080:8080
    environment:
      KAFKA_CLUSTERS_0_NAME: local
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:29092
    depends_on:
      - kafka

This runs KRaft mode Kafka (no ZooKeeper dependency) with the Kafka UI dashboard.

Producing Messages (Node.js / TypeScript)

Using kafkajs:

npm install kafkajs

import { Kafka, Partitioners } from "kafkajs";

const kafka = new Kafka({
  clientId: "my-producer",
  brokers: ["localhost:9092"],
});

const producer = kafka.producer({
  createPartitioner: Partitioners.LegacyPartitioner,
});

await producer.connect();

// Send a single message
await producer.send({
  topic: "user-events",
  messages: [
    {
      key: "user-123",              // Used for partitioning
      value: JSON.stringify({
        event: "user.registered",
        userId: "user-123",
        email: "[email protected]",
        timestamp: new Date().toISOString(),
      }),
      headers: {
        "content-type": "application/json",
        "source": "user-service",
      },
    },
  ],
});

// Batch sending (more efficient)
await producer.send({
  topic: "order-events",
  messages: events.map(event => ({
    key: event.orderId,
    value: JSON.stringify(event),
  })),
});

await producer.disconnect();

Consuming Messages

import { Kafka, Consumer, EachMessagePayload } from "kafkajs";

const kafka = new Kafka({
  clientId: "my-consumer",
  brokers: ["localhost:9092"],
});

const consumer = kafka.consumer({
  groupId: "order-processing-service",  // Consumer group ID
});

await consumer.connect();
await consumer.subscribe({
  topic: "order-events",
  fromBeginning: false,  // true = read all historical messages on first run
});

await consumer.run({
  autoCommit: true,  // Commit offsets automatically after processing
  eachMessage: async ({ topic, partition, message }: EachMessagePayload) => {
    const key = message.key?.toString();
    const value = message.value?.toString();

    if (!value) return;

    const event = JSON.parse(value);
    console.log(`Processing ${event.event} from partition ${partition}`);

    try {
      await processEvent(event);
    } catch (error) {
      // Handle processing errors — log to dead letter queue, etc.
      console.error("Processing failed:", error);
    }
  },
});

Consumer Groups and Scaling

Multiple instances of a service in the same consumer group split partitions:

Topic: order-events (4 partitions)
Consumer Group: order-service

Instance 1: handles partitions 0, 1
Instance 2: handles partitions 2, 3

Scale to 4 instances → each handles 1 partition. Scale to 5+ instances → extra instances are idle (no partition to consume).

Implication: Topic partition count limits consumer group parallelism. Size partitions based on expected max throughput.

Key Selection and Partitioning

Message keys determine which partition receives the message:

Same key → same partition → ordering guaranteed for that key
Different keys → potentially different partitions → no cross-key ordering

For event sourcing: use the entity ID as the key (e.g., userId, orderId). All events for the same entity land on the same partition in order.

// All events for user-123 go to the same partition
await producer.send({
  topic: "user-events",
  messages: [{ key: "user-123", value: JSON.stringify(event) }],
});

Exactly-Once vs At-Least-Once Delivery

At-least-once (default): Message is processed at least once, possibly multiple times if consumer crashes after processing but before committing the offset. Make your consumers idempotent.

Exactly-once: Kafka supports transactional exactly-once semantics between producers and consumers. More complex to implement, necessary for financial or critical applications.

For most use cases, at-least-once with idempotent consumers is the right approach.

When to Use Kafka

Use Kafka when:

High throughput (millions of events/day or more)
Multiple consumers of the same events (fan-out)
Event replay needed (audit log, event sourcing)
Message retention for days/weeks important
Ordered processing per entity (using message keys)

Use a simpler queue (RabbitMQ, BullMQ, SQS) when:

Work queue pattern (one consumer per job)
Simple retry logic with dead letter queues
Lower throughput requirements
Simpler operational overhead matters
Messages should be deleted after consumption

Kafka's overhead (partitions, consumer groups, replication) is worth it at scale. For "process this job and acknowledge it", a simpler queue is often better.

Kafka UI Tools

Kafka UI (docker: provectuslabs/kafka-ui): Web UI for browsing topics, messages, consumer groups
Conduktor Desktop: Free GUI client for local Kafka
kafka-ui by Redpanda: Alternative web interface
kafkacat / kcat: CLI tool for producing/consuming

# CLI: list topics
kcat -b localhost:9092 -L

# CLI: consume from topic
kcat -b localhost:9092 -t my-topic -C -o beginning