Can I have 100s of thousands of topics in a Kafka Cluster?

Update March 2021: With Kafka’s new KRaft mode, which entirely removes ZooKeeper from Kafka’s architecture, a Kafka cluster can handle millions of topics/partitions. See https://www.confluent.io/blog/kafka-without-zookeeper-a-sneak-peek/ for details.

*short for “Kafka Raft Metadata mode”; in Early Access as of Kafka v2.8


Update September 2018: As of Kafka v2.0, a Kafka cluster can have hundreds of thousands of topics. See https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions.


Initial answer below for posterity:

The rule of thumb is that the number of Kafka topics can be in the thousands.

Jun Rao (Kafka committer; now at Confluent but he was formerly in LinkedIn’s Kafka team) wrote:

At LinkedIn, our largest cluster has more than 2K topics. 5K topics should
be fine.
[…]

With more topics, you may hit one of those limits: (1) # dirs allowed in a
FS; (2) open file handlers (we keep all log segments open in the broker);
(3) ZK nodes.

The Kafka FAQ gives the following abstract guideline:

Kafka FAQ: How many topics can I have?

Unlike many messaging systems Kafka topics are meant to scale up arbitrarily. Hence we encourage fewer large topics rather than many small topics. So for example if we were storing notifications for users we would encourage a design with a single notifications topic partitioned by user id rather than a separate topic per user.

The actual scalability is for the most part determined by the number of total partitions across all topics not the number of topics itself (see the question below for details).

The article http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ (written by the aforementioned Jun Rao) adds further details, and particularly focuses on the impact of the number of partitions.

IMHO your use case / model is a bit of a stretch for a single Kafka cluster, though not necessarily for Kafka in general. With the little information you shared (I understand that a public forum is not the best place for sensitive discussions :-P) the only off-the-hip comment I can provide you with is to consider using more than one Kafka cluster because you mentioned that customer data must be very much isolated anyways (including the processing steps).

I hope this helps a bit!

Leave a Comment