Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.
| Sarama | Kafka | Go |
|--------|-------|----|
| | | |
What configuration values are you using for Sarama and Kafka?
When filing an issue please provide logs from Sarama and Kafka if at all
possible. You can set sarama.Logger to a log.Logger to capture Sarama debug
output.
logs: CLICK ME
Hey @meiyang1990
What are the arguments that you're using in order to test the consumer?
what is your topic configuration? how many partitions does the topic have?
i can confirm this is also happening.
TL;DR :: sarama "rebalancing" is not working correctly (if it ever did)
specs:: kafka 2.2
sarama:: 1.24.1
docker-compose based network/install (one kafka container, not multiple, but also confirmed on a real production system with 7 kafka nodes)
config :: all the defaults sarama sets except these
(the below also fails on all versions from 1.0.0 -> 2.3.0)
conf.Metadata.RefreshFrequency = 120 * time.Second
conf.Metadata.Full = true
conf.Consumer.Fetch.Min = 1
conf.Consumer.Fetch.Max = 1000
conf.ClientID = "some string"
conf.Consumer.Group.Rebalance.Strategy = sarama.BalanceStrategyRange
conf.Consumer.Offsets.CommitInterval = time.Second
conf.Version = (1.0.0)
Sent 5200 messages so far to topic test ...
Sent 5220 messages so far to topic test ...
Sent 5240 messages so far to topic test ...
2019-11-13T03:35:41.120Z INFO example log/with.go:13 claimed kafka topics{"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"], "claims": {"test":[0,1,2,3,4,5,6,7]}}
2019-11-13T03:35:41.120Z INFO example log/with.go:13 starting message consumer loop {"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"], "topic": "test", "partition": 7}
watch -t -d kafka-consumer-groups.sh --bootstrap-server=localhost:9092 --describe --group group-example
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID
HOST CLIENT-ID
test 6 732 732 0 example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6 example-group-consumer
test 0 716 716 0 example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6 example-group-consumer
test 7 705 707 2 example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6 example-group-consumer
test 5 707 708 1 example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6 example-group-consumer
test 1 704 704 0 example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6 example-group-consumer
test 4 695 695 0 example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6 example-group-consumer
test 3 691 691 0 example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6 example-group-consumer
test 2 736 736 0 example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6 example-group-consumer
fire up another consumer (one expects given the default "RANGE" stragetgy for things to split evenly across the same group :: 0->3 partitions one, 4 -> 7 on the other)
what really happens
5a. kafka goes into re-balence mode
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test 6 760 760 0 - - -
test 0 742 745 3 - - -
test 7 725 727 2 - - -
test 5 741 745 4 - - -
test 1 720 722 2 - - -
test 4 726 727 1 - - -
test 3 713 714 1 - - -
test 2 768 772 4 - - -
then moments later still only ONE consumer is consuming (in this case the new one)
5b. Consumer 2 claims "everything"
2019-11-13T03:38:49.601Z INFO example log/with.go:13 claimed kafka topics{"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"], "claims": {"test":[0,1,2,3,4,5,6,7]}}
5b. the watcher on the kafka node itself confirming that the new one claimed everything
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID
HOST CLIENT-ID
test 6 780 781 1 example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6 example-group-consumer
test 0 759 760 1 example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6 example-group-consumer
test 7 744 744 0 example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6 example-group-consumer
test 5 763 763 0 example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6 example-group-consumer
test 1 736 736 0 example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6 example-group-consumer
test 4 738 738 0 example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6 example-group-consumer
test 3 734 734 0 example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6 example-group-consumer
test 2 784 784 0 example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6 example-group-consumer
2019-11-13T03:41:39.495Z INFO example log/with.go:13 consumer cleanup {"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"]}
Consumer stopped:
Messages consumed: 361
Errors received: 0
the consumer Number 1 never "takes over" and fills in the gaps, but sits dead
Consumer group 'group-example' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test 6 811 847 36 - - -
test 0 793 828 35 - - -
test 7 772 819 47 - - -
test 5 792 820 28 - - -
test 1 767 809 42 - - -
test 4 763 794 31 - - -
test 3 752 775 23 - - -
test 2 806 838 32 - - -
Same issue. @d1egoaz @meiyang1990
Same with me.
The above is a symptom of not looping on Consume(). Consume() will exit without error when a rebalancing occurs and it is up to the user to call it again when this occurs.
Under the hood it seems like when a re-balance occurs all sessions are torn down completely (briefly no members exist and therefore no partitions are handled by anyone) and when you re-call Consume() a new session is brought up which should get its share of the partitions.
Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.
some weeks ago we added some comments to what @pavius commented above.
It was confusing, I had the same issue once.
https://github.com/Shopify/sarama/pull/1602
@meiyang1990 @wyndhblb @vikrampunchh @meiyang1990 @gauravds
could you please confirm that this was the issue?
Most helpful comment
Same issue. @d1egoaz @meiyang1990