Sarama: Idempotent producer broken on broker reconnect

Created on 18 Jul 2019  路  4Comments  路  Source: Shopify/sarama

Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.
Sarama Version: v1.23.0
Kafka Version: sarama.V2_1_0_0
Go Version:

Configuration

conf.Producer.Idempotent = true
conf.Net.MaxOpenRequests = 1
conf.Producer.Return.Errors = true
conf.Producer.Return.Successes = true
conf.Producer.RequiredAcks = sarama.WaitForAll

What configuration values are you using for Sarama and Kafka?

Logs

At start-up time:

{"level":"info","msg":"Initializing new client","time":"2019-07-18T14:50:32Z"}
{"level":"info","msg":"ClientID is the default of 'sarama', you should consider setting it to something application-specific.","time":"2019-07-18T14:50:32Z"}
{"level":"info","msg":"ClientID is the default of 'sarama', you should consider setting it to something application-specific.","time":"2019-07-18T14:50:32Z"}
{"level":"info","msg":"client/metadata fetching metadata for all topics from broker kafka:9092\n","time":"2019-07-18T14:50:32Z"}
{"level":"info","msg":"Connected to broker at kafka:9092 (unregistered)\n","time":"2019-07-18T14:50:32Z"}
{"level":"info","msg":"client/brokers registered new broker #1001 at kafka:9092","time":"2019-07-18T14:50:32Z"}
{"level":"info","msg":"Successfully initialized new client","time":"2019-07-18T14:50:32Z"}
{"level":"info","msg":"Obtained a ProducerId: 5000 and ProducerEpoch: 0\n","time":"2019-07-18T14:50:32Z"}

After taking down broker we get expected error:

{"error":"kafka: client has run out of available brokers to talk to (Is your cluster reachable?)","level":"error","msg":"Error producing message to kafka","time":"2019-07-18T14:53:29Z"}

After bringing broker back up, we get unexpected error:

{"level":"info","msg":"Connected to broker at kafka:9092 (unregistered)\n","time":"2019-07-18T14:54:02Z"}
{"level":"info","msg":"client/brokers registered new broker #1001 at kafka:9092","time":"2019-07-18T14:54:02Z"}
{"level":"info","msg":"ClientID is the default of 'sarama', you should consider setting it to something application-specific.","time":"2019-07-18T14:54:02Z"}
{"level":"info","msg":"producer/broker/1001 starting up\n","time":"2019-07-18T14:54:02Z"}
{"level":"info","msg":"producer/broker/1001 state change to [open] on json_events/0\n","time":"2019-07-18T14:54:02Z"}
{"level":"info","msg":"producer/leader/json_events/0 selected broker 1001\n","time":"2019-07-18T14:54:02Z"}
{"level":"info","msg":"Connected to broker at kafka:9092 (registered as #1001)\n","time":"2019-07-18T14:54:02Z"}
{"error":"kafka server: The broker received an out of order sequence number.","level":"error","msg":"Error producing message to kafka","time":"2019-07-18T14:54:02Z"}

Problem Description

When using idempotent sync producer and connection to broker is lost and regained, the producer is in an unusable state. Every call to SendMessage returns ErrOutOfOrderSequenceNumber.

From skimming the code I suspect that re-connection should call newTransactionManager however I have not done any extensive debugging

To be reproduced stalexempt

Most helpful comment

@nicklipple @NickCiao I took a crack at reproducing and fixing this issue in https://github.com/Shopify/sarama/pull/1661 - if this is still a problem for you and you want to have a crack using my branch, would appreciate the feedback!

All 4 comments

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

Would still like this to be reviewed

for what it's worth, I think we literally ran into the issue described here. For posterity, we were on sarama v1.23.1

@nicklipple @NickCiao I took a crack at reproducing and fixing this issue in https://github.com/Shopify/sarama/pull/1661 - if this is still a problem for you and you want to have a crack using my branch, would appreciate the feedback!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

korjavin picture korjavin  路  3Comments

qiuyesuifeng picture qiuyesuifeng  路  3Comments

male110 picture male110  路  6Comments

amitgurav04 picture amitgurav04  路  7Comments

damiannolan picture damiannolan  路  7Comments