How to get data from old offset point in Kafka?

The consumers belong always to a group and, for each partition, the Zookeeper keeps track of the progress of that consumer group in the partition.

To fetch from the beginning, you can delete all the data associated with progress as Hussain refered

ZkUtils.maybeDeletePath(${zkhost:zkport}", "/consumers/${group.id}");

You can also specify the offset of partition you want, as specified in core/src/main/scala/kafka/tools/UpdateOffsetsInZK.scala

ZkUtils.updatePersistentPath(zkClient, topicDirs.consumerOffsetDir + "https://stackoverflow.com/" + partition, offset.toString)

However the offset is not time indexed, but you know for each partition is a sequence.

If your message contains a timestamp (and beware that this timestamp has nothing to do with the moment Kafka received your message), you can try to do an indexer that attempts to retrieve one entry in steps by incrementing the offset by N, and store the tuple (topic X, part 2, offset 100, timestamp) somewhere.

When you want to retrieve entries from a specified moment in time, you can apply a binary search to your rough index until you find the entry you want and fetch from there.

Leave a Comment