Traditional queues were designed around the idea of “consume and remove,” meaning messages disappear once acknowledged. But modern event-driven systems needed immutable event logs where data could be retained and replayed multiple times.
This led to the rise of distributed streaming platforms like Kafka. Instead of treating messages as temporary tasks inside a queue, Kafka treats them as persistent ordered event logs stored across distributed partitions. Consumers maintain their own offsets and decide how fast or from where they want to read the stream.
Kafka Architecture
Kafka fundamentally changed the architecture.
Kafka chose logs instead of traditional queues because queues fundamentally become a bottleneck for scalability, replayability, durability, and independent consumption at large scale.
The core problem with queues is that they are designed around the idea of temporary work distribution. This works well for task processing systems where a message is processed once and then discarded. However, modern distributed systems increasingly treat events as long-lived facts that multiple systems may need to process independently, replay later, or analyze historically.
PRODUCERS
↓
+----------------+
| TOPIC |
+----------------+
/ | \
/ | \
↓ ↓ ↓
Partition-1 Partition-2 Partition-3
↓ ↓ ↓
Broker-A Broker-B Broker-C
↓ ↓ ↓
CONSUMER GROUPS
Topic
Kafka's most fundamental unit of organization is the topic, which is something like a table in a relational database.
A topic is a log of events. Traditional enterprise messaging systems have topics and queues, which store messages temporarily to buffer them between the source and destination.
Since Kafka topics are logs, there is nothing inherently temporary about the data in them. The logs that underlie Kafka topics are files stored on disk. When you write an event to a topic, it is as durable as it would be if it were stored in a database.
The simplicity of logs and the immutability of the contents in it are key to Kafka's success as a critical component in modern distrbuted systems.
NOTE: Topics themselves do not store data directly. Instead topics are divided into partitions.
Partition
Kafka gives us the ability to partition topics. Partitioning takes a single topic log and breaks it into multiple logs, each of which can live on a separate node in the Kafka cluster.
Each partition is append-only, ordered, and immutable log.
class Partition {
List<Message> log;
}
Having broken a topic up into partitions, we need a way of deciding which messages to write to which partitions.
-
Typically, if a message has no key, subsequent messages will be distributed round-robin among all the topic's partitions. In this case, all partitions get an even share of the data, but we don't preserve any kind of ordering of the input messages.
-
If the message does have a key, then the destination partition will be computed from a hash of the key. This allows Kafka to guarantee that messages having the same key always land in the same partition, and therefore are always in order.
For example, if you are producing events that are all associated with the same customer, using the customer ID as the key guarantees that all of the events from a given customer will always arrive in order. This creates the possibility that a very active key will create a larger and more active partition, but this risk is small in practice and is manageable when it presents itself.
Offset
Traditional brokers track ACK state. Kafka instead lets consumers track offsets themselves.
Broker
From a physical infrastructure standpoint, Kafka is composed of a network of machines called brokers.
In a contemporary deployment, these may not be separate physical servers but containers running on pods running on virtualized servers running on actual processors in a physical data center somewhere.
Each broker hosts some sets of partitions and handles requests to write new events to those partitions or read events from them.
class Broker {
Map<TopicPartition, Partition> partitions;
}
NOTE: Brokers also handle replication of partitions between each other. However, this is not usually a process you have to think about as a developer building systems on Kafka. All you really need to know as a developer is that your data is safe, and that if one node in the cluster dies, another will take over its role.
Producer
A producer publishes events into Kafka topics. It decides which partition receives message.
interface Producer {
void send(String topic,Message message);
}
To help the producer do this all Kafka nodes can answer a request for metadata about which servers are alive and where the leaders for the partitions of a topic are at any given time to allow the producer to appropriately direct its requests.
In Java, there is a class called KafkaProducer that you use to connect to the
cluster. You give this class a map of configuration parameters, including the address of some
brokers in the cluster, any appropriate security configuration, and other settings that
determine the network behavior of the producer.
Under the covers, the library is managing connection pools, network buffering, waiting for brokers to acknowledge messages, retransmitting messages when necessary, and a host of other details which no application developer needs to worry about.
Consumer
Consumers read events from partitions.
interface Consumer {
void poll();
}
In Java, there is a class called KafkaConsumer that you use to connect to the cluster. Then
use that connection to subscribe to one or more topics.
Also, consumers need to be able to handle the scenario in which the rate of message consumption from a topic combined with the computational cost of processing a single message are together too high for a single instance of the application to keep up. That is, consumers need to scale. In Kafka, scaling consumer groups is more or less automatic.
Distributed Streaming Platforms
If the consumer exposes a service on the network, producers can make a direct HTTP or RPC request to push messages to the consumer. This is the idea behind webhooks, a pattern in which a callback URL of one service is registered with another service, and it makes a request to that URL whenever an event occurs.
Publishers A and B are producing messages to Topic X. Subscribers C and D are subscribed to Topic X.
Publishers A, B, and C are producing messages concurrently to the same topic. The system should ensure that there are no race conditions, and each message is safely enqueued in the correct order.
Design a thread-safe, topic-based message broker that supports publish-subscribe semantics. The broker should allow multiple publishers and subscribers to operate concurrently and enable subscribers to replay messages by resetting their offset.
Maintain a separate queue for each topic. Publishers can publish a message to a specific topic. Subscribers can subscribe to one or more topics and should receive every message published to those topics after they subscribe. When a message is published to a topic, all active subscribers of that topic must receive the message. Each subscriber tracks its own offset per topic; the broker must provide an operation to reset a subscriber’s offset to any previous position so the subscriber can re‑read messages. The system must handle concurrent publish and subscribe operations safely, ensuring no race conditions or data corruption. Subscribers should be able to consume messages in parallel without blocking each other. Assume messages are simple strings; persistence, replication, and fault tolerance are out of scope.
Kafka Architecture
-
Kafka runs on the Java Virtual Machine (JVM).
-
Kafka servers are referred to as brokers.
-
All of the brokers that work together are referred to as a cluster. Clusters may consist of just one broker, or thousands of brokers
-
Apache Zookeeper is used by Kafka brokers to determine which broker is the leader of a given partition and topic. Zookeeper keeps track of which brokers are part of the Kafka cluster.
-
Zookeeper stores configuration for topics and permissions (Access Control Lists - ACLs) ACLs are Permissions associated with an object. In Kafka, this typically refers to a user's permissions with respect to production and consumption, and/or the topics themselves.
-
Kafka nodes may gracefully join and leave the cluster.
How Kafka Stores Data?
It has a data directory on a disk where it stores logs of data and text files.
/var/lib/kafka/data is a directory where Kafta sorts data in your workspace and in many
Kafka productions systems.
Each topic receives its own sub-directory with the associated name of the topic. Kafka may store more than one log file for a given topic.
Data Partitioning
All Kafka topics consist of one or more partitions. Every partition has a single leader broker.
Partitions are the basic unit of parallelism in Kafka. The “right” number of partitions is highly dependent on the scenario.
The most important number to understand is desired throughput. How many MB/s do you need to achieve to hit your goal?
Determine the number of partitions you need by dividing the overall throughput you want by the throughput per single consumer partition or the throughput per single producer partition. Pick the larger of these two numbers to determine the needed number of partitions.
Partitions = Max(Overall Throughput/Producer Throughput, Overall Throughput/Consumer Throughput)
NOTE: Message ordering is only guaranteed within a partition in Kafka. If your topic has more than one partition, Kafka provides no guarantees that the messages will be consumed in the order they were produced.