Sarama: why example consumergroup can not work correctly ? only one client can consume message, other client can not work

Created on 23 Oct 2019  路  7Comments  路  Source: Shopify/sarama

Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.

| Sarama | Kafka | Go |
|--------|-------|----|
| | | |

Configuration

What configuration values are you using for Sarama and Kafka?


Logs

When filing an issue please provide logs from Sarama and Kafka if at all
possible. You can set sarama.Logger to a log.Logger to capture Sarama debug
output.

logs: CLICK ME


Problem Description
needs-more-info question

Most helpful comment

Same issue. @d1egoaz @meiyang1990

All 7 comments

Hey @meiyang1990
What are the arguments that you're using in order to test the consumer?

what is your topic configuration? how many partitions does the topic have?

i can confirm this is also happening.

TL;DR :: sarama "rebalancing" is not working correctly (if it ever did)

specs:: kafka 2.2
sarama:: 1.24.1
docker-compose based network/install (one kafka container, not multiple, but also confirmed on a real production system with 7 kafka nodes)
config :: all the defaults sarama sets except these
(the below also fails on all versions from 1.0.0 -> 2.3.0)

conf.Metadata.RefreshFrequency = 120 * time.Second
conf.Metadata.Full = true
conf.Consumer.Fetch.Min = 1
conf.Consumer.Fetch.Max = 1000
conf.ClientID = "some string"
conf.Consumer.Group.Rebalance.Strategy = sarama.BalanceStrategyRange
conf.Consumer.Offsets.CommitInterval = time.Second
conf.Version = (1.0.0)  
  1. start a test producer

Sent 5200  messages so far to topic test ...
Sent 5220  messages so far to topic test ...
Sent 5240  messages so far to topic test ...
  1. fire up a consumer

2019-11-13T03:35:41.120Z    INFO    example log/with.go:13  claimed kafka topics{"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"], "claims": {"test":[0,1,2,3,4,5,6,7]}}
2019-11-13T03:35:41.120Z    INFO    example log/with.go:13  starting message consumer loop  {"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"], "topic": "test", "partition": 7}
  1. watch the kafka groups items

watch -t -d kafka-consumer-groups.sh --bootstrap-server=localhost:9092 --describe --group group-example

TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID
           HOST            CLIENT-ID
test            6          732             732             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            0          716             716             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            7          705             707             2               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            5          707             708             1               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            1          704             704             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            4          695             695             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            3          691             691             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            2          736             736             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
  1. fire up another consumer (one expects given the default "RANGE" stragetgy for things to split evenly across the same group :: 0->3 partitions one, 4 -> 7 on the other)

  2. what really happens

5a. kafka goes into re-balence mode


TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
test            6          760             760             0               -               -               -
test            0          742             745             3               -               -               -
test            7          725             727             2               -               -               -
test            5          741             745             4               -               -               -
test            1          720             722             2               -               -               -
test            4          726             727             1               -               -               -
test            3          713             714             1               -               -               -
test            2          768             772             4               -               -               -

then moments later still only ONE consumer is consuming (in this case the new one)

5b. Consumer 2 claims "everything"

2019-11-13T03:38:49.601Z    INFO    example log/with.go:13  claimed kafka topics{"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"], "claims": {"test":[0,1,2,3,4,5,6,7]}}

5b. the watcher on the kafka node itself confirming that the new one claimed everything

CONTAINER ID        IMAGE                                                         COMMAND                  CREATED             STATUS                            PORTS                                                        NAMES

TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID
           HOST            CLIENT-ID
test            6          780             781             1               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            0          759             760             1               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            7          744             744             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            5          763             763             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            1          736             736             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            4          738             738             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            3          734             734             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            2          784             784             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
  1. Kill (stop) the consumer number 2
2019-11-13T03:41:39.495Z    INFO    example log/with.go:13  consumer cleanup    {"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"]}
Consumer stopped:
 Messages consumed:  361
 Errors received:  0
  1. the consumer Number 1 never "takes over" and fills in the gaps, but sits dead

    1. Kafka watcher is also empty of consumers
Consumer group 'group-example' has no active members.

TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
test            6          811             847             36              -               -               -
test            0          793             828             35              -               -               -
test            7          772             819             47              -               -               -
test            5          792             820             28              -               -               -
test            1          767             809             42              -               -               -
test            4          763             794             31              -               -               -
test            3          752             775             23              -               -               -
test            2          806             838             32              -               -               -

Same issue. @d1egoaz @meiyang1990

Same with me.

The above is a symptom of not looping on Consume(). Consume() will exit without error when a rebalancing occurs and it is up to the user to call it again when this occurs.

Under the hood it seems like when a re-balance occurs all sessions are torn down completely (briefly no members exist and therefore no partitions are handled by anyone) and when you re-call Consume() a new session is brought up which should get its share of the partitions.

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

some weeks ago we added some comments to what @pavius commented above.
It was confusing, I had the same issue once.

https://github.com/Shopify/sarama/pull/1602

@meiyang1990 @wyndhblb @vikrampunchh @meiyang1990 @gauravds
could you please confirm that this was the issue?

Was this page helpful?
0 / 5 - 0 ratings