Kafka-node: HighLevelConsumer leaving partitions without owner on multiple consumers startup

Created on 23 Dec 2016  路  3Comments  路  Source: SOHU-Co/kafka-node

Hi,

when I startup several consumers normally my queue ends up with several partitions without owner (checking with kafka-consumer-offset-checker.sh).

In my scenario, I have more than double number of consumers than partitions. For example, 38 consumers for 16 partitions on my queue. And when starting (or restarting) all my consumers I end up always with partitions left with no owner.

Most helpful comment

Absolutely. Upon my investigation on startup of my multiple servers, I have serious suspects that the problem is because of contention of servers trying to get ownership of partitions at the same time.

I also noticed that with the default configuration, the period used to retry is always the same, so I focused on having each consumer using a different period of time for the retries, set up when it starts (this mean this period will remain the same, but should be different for each consumer). I was able to get this with this config:

rebalanceRetry: {
  retries:    20,
  factor:     1,
  minTimeout: Math.random() * 20000,
  maxTimeout: 20000 + 1,
  randomize:  false
}

And the results so far are really good.

All 3 comments

Could you go into more detail on how your issue was fixed by exposing the retry configs?

Absolutely. Upon my investigation on startup of my multiple servers, I have serious suspects that the problem is because of contention of servers trying to get ownership of partitions at the same time.

I also noticed that with the default configuration, the period used to retry is always the same, so I focused on having each consumer using a different period of time for the retries, set up when it starts (this mean this period will remain the same, but should be different for each consumer). I was able to get this with this config:

rebalanceRetry: {
  retries:    20,
  factor:     1,
  minTimeout: Math.random() * 20000,
  maxTimeout: 20000 + 1,
  randomize:  false
}

And the results so far are really good.

@hyperlink Should I close the issue? Or you prefer to leave it open for more feedback?

Was this page helpful?
0 / 5 - 0 ratings