Hi,
when I startup several consumers normally my queue ends up with several partitions without owner (checking with kafka-consumer-offset-checker.sh).
In my scenario, I have more than double number of consumers than partitions. For example, 38 consumers for 16 partitions on my queue. And when starting (or restarting) all my consumers I end up always with partitions left with no owner.
Could you go into more detail on how your issue was fixed by exposing the retry configs?
Absolutely. Upon my investigation on startup of my multiple servers, I have serious suspects that the problem is because of contention of servers trying to get ownership of partitions at the same time.
I also noticed that with the default configuration, the period used to retry is always the same, so I focused on having each consumer using a different period of time for the retries, set up when it starts (this mean this period will remain the same, but should be different for each consumer). I was able to get this with this config:
rebalanceRetry: {
retries: 20,
factor: 1,
minTimeout: Math.random() * 20000,
maxTimeout: 20000 + 1,
randomize: false
}
And the results so far are really good.
@hyperlink Should I close the issue? Or you prefer to leave it open for more feedback?
Most helpful comment
Absolutely. Upon my investigation on startup of my multiple servers, I have serious suspects that the problem is because of contention of servers trying to get ownership of partitions at the same time.
I also noticed that with the default configuration, the period used to retry is always the same, so I focused on having each consumer using a different period of time for the retries, set up when it starts (this mean this period will remain the same, but should be different for each consumer). I was able to get this with this config:
And the results so far are really good.