Confluent-kafka-dotnet: Silent failure on producing when broker communication issues arise

Created on 4 Oct 2018  路  3Comments  路  Source: confluentinc/confluent-kafka-dotnet

Description

Certain broker communication failures are not bubbled up as exceptions on the producing side, when producing via ProduceAsync(). This means the failure to produce to a topic goes unreported, leading to potential loss of data.

How to reproduce

Unfortunately I do not have clear replication steps. We stumbled upon the issue after a few days of normal Kafka operation, no idea what caused it. Restarting the machine where our Kafka brokers live fixed the communication problem.
After we figured out no messages are getting across, the only indication of a problem was obtained by adding the debug flag to the producer config ({ "debug", "broker,topic,msg"}):

BROKERFAIL| [thrd:xxx.xxx.xxx.xxx:9092/bootstrap]: xxxx.xxx.xxx.xxx:9092/0: failed: err: Local: Broker transport failure: (errno: No error)

This is very cryptic and not very useful, further than signaling a problem. It is also buried in a sea of other messages.

Checklist

Please provide the following information:

  • [0.11.5 ] Confluent.Kafka nuget version:
  • [ ] Apache Kafka version:
  • [ ] Client configuration:
  • [ ] Operating system:
  • [ ] Provide logs (with "debug" : "..." as necessary in configuration)
  • [ ] Provide broker log excerpts
  • [ ] Critical issue
question

All 3 comments

Are you checking the Error property on the Message returned from ProduceAsync? In 0.11.x, produce errors do not result in exceptions - the task completes successfully, with the Error field set to something other than NoError. In 1.0-beta, this has been changed to behave as you are expecting.

@mhowlett that is a very good point, I'm not testing the result message right now, I was assuming exceptions would be raised. Thank you for pointing out how it actually works, I'll start checking the message for now.

another thing that's not obvious/confusing: OnError messages should mostly not be acted on (in 0.11.x, no events emitted by this event should be considered fatal - the client will recover from them all). 1.0-beta adds an IsFatal flag to this event to make how these should be handled clear.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

snober picture snober  路  3Comments

ietvijay picture ietvijay  路  3Comments

jeffreycruzana picture jeffreycruzana  路  3Comments

keggster101020 picture keggster101020  路  4Comments

Eibwen picture Eibwen  路  3Comments