Kafka is an open-source distributed event streaming platform. It allows information streams to be processed to give more refined output and monitoring possibilities. Scalability is an important feature of Kafka, making this a powerful tool for large companies, such as LinkedIn (where it originated), AirBnB and many more.

In this blog we take a look at the development of Kafka and its infrastructure. Focusing on the changes that happened from 2017 until now, based on an earlier report from desosa.

The main advantages of kafka are:

  • The high scalability due to its distributed architecture using capabilities like replication and partitioning.
  • The high durability due to kafka persisting the messages on the disks.
  • The high concurrency, kafka is able to handle thousands of messages per second and that too in low latency conditions
  • The high reliability, kafka is able to support multiple subscribers. And in the event of failure automatically balances consumers.


Julian van Dijk

An enthousiastic master student with full-stack web developer experience

Nick Dekker

A developer mostly focused on back-end frameworks in web development and tooling for ease of use for end users. Has a passion for architecture and design patterns.

Nick Tehrany

I’m currently a MSc Computer Science Student at the TUDelft with my main areas of interest in distributed systems, storage tech, and operating systems.

Asror Wali

Full-stack developer who loves solving puzzles

Apache Kafka - Distribution Analysis

Apache Kafka is a event streaming service that works as a distributed log. Before diving into how Kafka is distributed, we will look at a high level overview of Kafka’s workflow and its architecture. Distributed Components of Kafka Kafka offers a way for systems to read and write data in real-time. It has a few important components you should be aware of. To transfer events as information, in real-time and without forcing a consumer to read all data, topics are used.
March 28, 2021

Apache Kafka - Quality and evolution

Kafka is a big open source application which is trusted by 80% of all Fortune 100 companies1. The software quality processes of Kafka play a large role in generating this trust. Distributed Systems Test Testing a distributed data system with only unit tests and integration tests is hard. The issues that plague the application in production cannot be simulated well on this level of testing. This is why the Kafka community created a tool called ducktape2.
March 21, 2021

Apache Kafka - From Vision to Architecture

In this essay we explore Kafka’s architectural elements and relationships, and the principles of its design and evolution. Streaming as an Architecture Before we talk about Kafka’s architecture, we would first like to define streaming systems. Streaming systems are processing engines that are designed to process data, that is generated continuously by one or more data sources. These generated data records could be a variety of things, changes in the stock market, geo-location of a user, sensor outputs, user activity on a website and so on.
March 15, 2021

Apache Kafka - Product Vision and Problem Analysis

The arrival of social media platforms, video streaming, and large-scale system monitoring services introduced new requirements for data processing, pushing the boundaries of available software at the time. Large amounts of data were continuously uploaded to and requested from servers1. Everyone rushed towards possible solutions: GraphQL, Caching, Localized servers and more were all brought to the table. The search for optimization had started. LinkedIn is one of these platforms that faced the problem.