Kafka-node: Too may "'ignoring message due to it being from an old group" error

Created on 25 Feb 2019  路  16Comments  路  Source: SOHU-Co/kafka-node

Bug Report

I have a client setup using ConsumerGroupStream. It is getting too many errors with message starting with "'ignoring message due to it being from an old group".
But the consumer seems to be working fine, but this error is polluting my logs and alerts.

There is "Heartbeat error" happens quite frequently just before this error. Based on an issue already reported I understand that it is a debug log.

But the error which with starts "ignoring message due to being an old group - " is logged as "ERROR".

Based on the code it appears that this can happen every time the client rebalances.
Shouldn't it be logged as DEBUG if this error is expected?
https://github.com/SOHU-Co/kafka-node/blob/949c594bbe268f75fd8c56f55394f661007f6297/lib/kafkaClient.js#L1221

Environment

  • Node version: 10.15.0
  • Kafka-node version: 4.0.2
  • Kafka version: 1.1

For specific cases also provide

  • Number of Brokers: 1
  • Number partitions for topic: 4

Include output with Debug turned on

timestamp=2019-02-25T21:15:46.824Z,message="... Heartbeat error:",errorCode=27,logger="kafka-node:ConsumerGroup",meta={"message":"Kafka Error Code: 27"},level="WARN"
timestamp=2019-02-25T21:15:46.824Z,message="tryToRecoverFrom heartbeat",errorCode=27,logger="kafka-node:ConsumerGroupRecovery",meta={"message":"Kafka Error Code: 27"},level="DEBUG"
timestamp=2019-02-25T21:15:46.824Z,message="RECOVERY from heartbeat: ... retrying in 1000 ms",errorCode=27,logger="kafka-node:ConsumerGroupRecovery",meta={"message":"Kafka Error Code: 27"},level="DEBUG"
timestamp=2019-02-25T21:15:46.838Z,logger="kafka-node:KafkaClient",level="ERROR",message="ignoring message due to it being from an old group - memberId: 6ef2fee1-0c44-4537-89e1-ce1bb439013b !=6ef2fee1-0c44-4537-89e1-ce1bb439013b - generationId: 55373!=55373"
timestamp=2019-02-25T21:15:46.838Z,logger="kafka-node:KafkaClient",level="ERROR",message="ignoring message due to it being from an old group - memberId: 6ef2fee1-0c44-4537-89e1-ce1bb439013b !=6ef2fee1-0c44-4537-89e1-ce1bb439013b - generationId: 55373!=55373"

Appreciate your help!

Most helpful comment

Though I will agree this should be debug and not error.

All 16 comments

Looks like those messages were received while the consumer is rebalancing. Does your consumer do this often?

Yes, my consumer does this quite often every 10s.

Yes, my consumer does this quite often every 10s.

That does not seem normal. Have you looked at the group describe periodically to see if the partition ownership is really changing that often?

That does not seem normal. Have you looked at the group describe periodically to see if the partition ownership is really changing that often?

Yes, I checked at the group describe and the rebalancing happens quite often.

Got "Warning: Consumer group '' is rebalancing." about every 10s when consumer group is described.

Are new ConsumerGroups instantiated based on requests ?

I didn't fully understand the question.
new ConsumerGroup are not instantiated based on requests to the web server if that is what you meant. I am actually using ConsumerGroupStream and there is only one instance of it listening to multiple topics.

Hapenning here the same with ConsumerGroup

We're experiencing a similar issue. Sometimes (quite often) after rebalancing (not after each rebalancing, but after some) there's no consumers left assigned for a partition. We consume from AWS Lambda, so consumers being recreated quite often - every 5m. The interesting thing is that this issue happens more often for partition#0, less often for the 1st one, and almost never for the 2nd. Can protocol: ['range'] be the cause of such uneven split of failures?

Screen Shot 2019-04-09 at 14 48 02

Also if there're any ideas why the issue happens after rebalancing I'm glad to investigate further or try other approaches. Any help is appreciated.

The message is correct.

We don't want to emit messages in the middle of a rebalance as when rebalances occur, group partition assignment is changing, the generation ID is changing, and your consumer will not be allowed to even COMMIT that message back to the group.

If you receive a message on generationID 1, and rebalance occurs, and you had an inflight fetch request that came back with a message from gen1 and a rebalance is now in progress, if you try to commit that message to the group while the generation is now 2, it will be an error.

The member may not even be assigned that partition anymore!

You really need to solve this yourself and stop so many rebalances from occurring.

Though I will agree this should be debug and not error.

:+1: for debug

For my understanding, this error is generated by the library while doing internal operations and there's no specific action being done by the end user (committing, producing etc.) that would specifically trigger it? Ie. it will just happen sometimes whenever a rebalance occurs?

@aikar @hyperlink how to resolve this issue? is known issue? it's critical bug.....

@kgneng2 just noticed the ping, sorry no idea, I gave up on this library and kafka.

However, the logical idea of this issue means you just need to build support into your app for it, that when you get this error, to drop your processing and abort processing the message.

@aikar thanks for your comment. I also gave up on this library 馃お

we encountered the same, remain unresolved with 2.5.1 kafka

I didn't fully understand the question.
new ConsumerGroup are not instantiated based on requests to the web server if that is what you meant. I am actually using ConsumerGroupStream and there is only one instance of it listening to multiple topics.

same setup here

Was this page helpful?
0 / 5 - 0 ratings

Related issues

juhanishen picture juhanishen  路  7Comments

sergeyjsg picture sergeyjsg  路  4Comments

ashishnetworks picture ashishnetworks  路  4Comments

Sonivaibhav26 picture Sonivaibhav26  路  5Comments

chetandev picture chetandev  路  5Comments