Kafka-node: Manually commiting messages feature - question

Created on 30 Jul 2018  路  6Comments  路  Source: SOHU-Co/kafka-node

Hello, i have a question about library implementation - is there an important reason why there is no possibility to commit message offset manually just after consuming ? We are waiting for done event, which is emitted when no messages are ready to read. I suppose, that main reason is performance (no need for shifting commit offset by one, when we can shift by many after read all), but why it is not even possible?

Most helpful comment

@twawszczak yeah this clients ConsumerGroup overall is a flawed design.
I personally use ConsumerGroupStream with auto commit false, as its .commit() method is more accurate (though it has issues too, see: https://github.com/SOHU-Co/kafka-node/pull/896 )

All 6 comments

it is. the consumers have an option for autoCommit: false. You then commit manually.

@aikar Hmm, i have analyzed repository and if i am right, it is not exactly true - commit() method is just alias for autoCommit() method. click
And autoCommit works like that - checking offsets of current payloads and committing them. BUT offsets of current payloads are updated only when there is no more new messages in fetch loop. click (done event updating offsets)
So, when i committing during reading, it gives no effect. (offset of payload is not updated yet) Correct me if i am wrong.

@twawszczak yeah this clients ConsumerGroup overall is a flawed design.
I personally use ConsumerGroupStream with auto commit false, as its .commit() method is more accurate (though it has issues too, see: https://github.com/SOHU-Co/kafka-node/pull/896 )

We've had to get pretty creative with kafka-node to work around similar issues. The best approach so far has been to effectively subclass of ConsumerGroup such that the commit/autoCommit methods are no-ops, with a small change to handleSync to capture assigned partitions during rebalances. Then in our own wrapper class we handle the rebalancing in a similar way to ConsumerGroupStream and deal with control flow ourselves as best we can.

ConsumerGroupStream seems better in some ways, but as @aikar says it has its own quirks -- and IMO it's also critical to be able to manage control flow per partition (example: "do not process the next message on this partition until I commit the current message" or pipelining: "process up to N messages for this partition before we require a successful commit"). Then there's stuff like waiting for in-flight messages to process before allowing the consumer to commit offsets one last time & close, backpressure to manage if message processing slows down or whatever. I'd love to roll a lot of this stuff into kafka-node itself, but it's easier said than done.

At least some of this is discussed in other tickets floating around in the issue tracker -- @aikar I think I owe you some responses actually. Soon! @hyperlink do we have any concrete designs for v3?

@aikar one person's flawed design is another person's unsupported use case 馃槈

@thomaslee we should use v3 as an opportunity to break the existing API and fix these "quirks" user's complained about. Though we should set a target to publish v3 soon there are some good changes in there.

btw I love the discussions & PRs currently being driven by you and @aikar and I will contribute whenever I can.

I love the discussions & PRs

Cheers @hyperlink -- likewise, appreciate you taking the time to read my whining & reviewing my half-baked PRs. 馃槈

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cheungwsj picture cheungwsj  路  5Comments

juhanishen picture juhanishen  路  7Comments

AnnisaNurika picture AnnisaNurika  路  5Comments

kameshwari-suresh picture kameshwari-suresh  路  3Comments

sergeyjsg picture sergeyjsg  路  4Comments