How to send final kafka-streams aggregation result of a time windowed KTable?

Update 2

With KIP-825 (Apache Kafka 3.3), you can specify a “emit strategy” via windowedBy(...). Default is EMIT_EAGER but you can also specify EMIT_FINAL to only get a single result per key and window when a window closes (ie, at point window-end + grace-period.

Update 1

With KIP-328 (Apache Kafka 2.1), a KTable#suppress() operator is added, that will allow to suppress consecutive updates in a strict manner and to emit a single result record per window; the tradeoff is an increase latency.

Original Answer

In Kafka Streams there is no such thing as a “final aggregation”. Windows are kept open all the time to handle out-of-order records that arrive after the window end-time passed. However, windows are not kept forever. They get discarded once their retention time expires. There is no special action as to when a window gets discarded.

See Confluent documentation for more details: http://docs.confluent.io/current/streams/

Thus, for each update to an aggregation, a result record is produced (because Kafka Streams also update the aggregation result on out-of-order records). Your “final result” would be the latest result record (before a window gets discarded). Depending on your use case, manual de-duplication would be a way to resolve the issue (using lower lever API, transform() or process())

This blog post might help, too: https://timothyrenner.github.io/engineering/2016/08/11/kafka-streams-not-looking-at-facebook.html

Another blog post addressing this issue without using punctuations: http://blog.inovatrend.com/2018/03/making-of-message-gateway-with-kafka.html

Leave a Comment