What does “Rebalancing” mean in Apache Kafka context?

When a new consumer joins a consumer group the set of consumers attempt to “rebalance” the load to assign partitions to each consumer. If the set of consumers changes while this assignment is taking place the rebalance will fail and retry. This setting controls the maximum number of attempts before giving up. the command for … Read more

Difference between session.timeout.ms and max.poll.interval.ms for Kafka >= 0.10.1

Before KIP-62, there is only session.timeout.ms (ie, Kafka 0.10.0 and earlier). max.poll.interval.ms is introduced via KIP-62 (part of Kafka 0.10.1). KIP-62, decouples heartbeats from calls to poll() via a background heartbeat thread, allowing for a longer processing time (ie, time between two consecutive poll()) than heartbeat interval. Assume processing a message takes 1 minute. If … Read more

Kafka: Consumer API vs Streams API

Update January 2021: I wrote a four-part blog series on Kafka fundamentals that I’d recommend to read for questions like these. For this question in particular, take a look at part 3 on processing fundamentals. Update April 2018: Nowadays you can also use ksqlDB, the event streaming database for Kafka, to process your data in … Read more

Is key required as part of sending messages to Kafka?

Keys are mostly useful/necessary if you require strong order for a key and are developing something like a state machine. If you require that messages with the same key (for instance, a unique id) are always seen in the correct order, attaching a key to messages will ensure messages with the same key always go … Read more