Confluent-kafka-dotnet: Support of SSL/Kerberos

Created on 17 Feb 2017  Â·  96Comments  Â·  Source: confluentinc/confluent-kafka-dotnet

Hello!

Could someone tell me if this lib supports SSL/Kerberos? At the momoent I can't find any .Net Kafka Client which is supporting SSL and(or) Kerberos.

bug librdkafka question

Most helpful comment

Thanks for advice! I'll look into it.
Handshake issue was fixed, and producer performance - is a story for another day.

UPD: producing was so slow because we produced each message synchronously :-| My bad. Now it sends >25k of messages in 5-6seconds

All 96 comments

Hi!

Yes, the upcoming first version of Confluent's .NET client will support both SSL and SASL.

For instructions on how to use SSL, see
See https://github.com/edenhill/librdkafka/wiki/Using-SSL-with-librdkafka

For SASL authentication the support is platform dependent:

The client will be released very soon.

@edenhill will the client be released with Confluent 3.2.0 or independently? Is there a version we can test in the meantime?

@simplesteph It will be released with Confluent 3.2.0
You could try the latest preview here: https://www.nuget.org/packages/Confluent.Kafka/0.9.4-preview4

In continuation of my Issue ah-/rdkafka-dotnet#112
I should say that i'm getting same ssl errors:
3|2017-03-01 15:23:23.161|Test@localhost#consumer-2|FAIL| [thrd:ssl://x.x.x.x:9093/1]: ssl://x.x.x.x:9093/1: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)

Error: Local_Ssl ssl://x.x.x.x:9093/1: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)

Should i open new issue here?

@TheMidgardWatcher Did you check the broker logs as instructed?

@edenhill I'm not sure what you mean, but yes - we checked logs from brokers and also we checked java ssl debug logs. Actually I think I have already written about the results in previous issue.

@edenhill I would like to use SASL/PLAIN for authentication. Do we have an example on how to do that by using Confluent.Kafka .net client? Thanks

Depends on your platform.
SASL/PLAIN is currently not supported on Windows, but works on OSX and Linux, etc.

What platform are you on?

@edenhill I will run Kafka on Linux, but publisher and consumer which use Confluent.Kafka .net client will be running on windows servers. Would SASL/PLAIN work in this case?

Thanks

@simon406 Sorry to say that SASL/PLAIN is not yet implemented for Windows (exists on Linux and OSX).

You could use SSL authentication or SASL Kerberos though (Windows AD)

Thanks @edenhill

I am trying to integrate Kafka authentication with a token-based authentication web api. My plan is to have my own implementation of javax.security.sasl.SaslServer which would send tokens to the auth web api for authentication. This might not be possible without SASL/PLAIN.

Could you please add a code sample which shows KafkaConsumer with enabled SSL

Could you please add a code sample which shows KafkaConsumer with enabled SSL

This would be nice, because i'm still getting this annoying SSL Handshake exception

Any updates on Using SSL on windows? I suggest someone to change type of this issue to a bug at least.

Hi there! Just checked topic consumption by ssl configured kafka-console-consumer on local and remote kafka environments. As a result - both environments was successfully read by it. But locally built librdkafka_example didn't read from any environment. So reason of 'SSL handshake failed' definitely not in bad environments or its configuration, and not in confluent-kafka-dotnet wrapper. But i should say that producer works fine - without any problems on both local and remote kafka environments.

Also verified self-signed certificates on local machine - same issue Consume failed: Local: Communication failure with broker

If you enable SSL debugging in the broker (-Djavax.net.debug=all) and look at stderr, does it tell you anything interesting about the connecting client?

Should I look something special? All output looks fine. I got this lines at the end of output:

kafka-network-thread-0-ListenerName(SSL)-SSL-4, READ: TLSv1.2 Alert, length = 26
Padded plaintext after DECRYPTION:  len = 2
0000: 01 00                                              ..
kafka-network-thread-0-ListenerName(SSL)-SSL-4, RECV TLSv1.2 ALERT:  warning, close_notify
kafka-network-thread-0-ListenerName(SSL)-SSL-4, closeInboundInternal()
kafka-network-thread-0-ListenerName(SSL)-SSL-4, closeOutboundInternal()
kafka-network-thread-0-ListenerName(SSL)-SSL-4, SEND TLSv1.2 ALERT:  warning, description = close_notify
Padded plaintext before ENCRYPTION:  len = 2
0000: 01 00                                              ..
kafka-network-thread-0-ListenerName(SSL)-SSL-4, WRITE: TLSv1.2 Alert, length = 26
kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeOutbound()
kafka-network-thread-0-ListenerName(SSL)-SSL-4, closeOutboundInternal()
[Raw write]: length = 31

Okay, so that looks like it might be the client closing the connect.

Is debug=security,broker,protocol telling you anything on the client?

This what i got:

LOG-7-TOPBRK: [thrd::0/internal]: :0/internal: Topic topic.test [0]: joining broker (rktp 0000024322141700)
Consume failed: Local: Communication failure with broker
LOG-7-UPDATE: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/bootstrap: NodeId changed from -1 to 0
LOG-7-UPDATE: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Name changed from ssl://localhost:9093/bootstrap to ssl://localhost:9093/0
LOG-7-LEADER: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Mapped 1 partition(s) to broker
LOG-7-STATE: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Broker changed state UP -> UPDATE
LOG-7-TOPBRK: [thrd::0/internal]: :0/internal: Topic topic.test [0]: leaving broker (0 messages in xmitq, next leader ssl://localhost:9093/0, rktp 0000024322141700)
LOG-7-SEND: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Sent MetadataRequest (v0, 44 bytes @ 0, CorrId 2)
LOG-7-STATE: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Broker changed state UPDATE -> UP
LOG-7-TOPBRK: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Topic topic.test [0]: joining broker (rktp 0000024322141700)
LOG-7-RECV: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Received MetadataResponse (v0, 78 bytes, CorrId 2, rtt 16.00ms)
LOG-7-DESTROY: [thrd:app]: Terminating instance
LOG-7-DESTROY: [thrd:main]: Destroy internal
LOG-7-DESTROY: [thrd:main]: Removing all topics
LOG-7-TERMINATE: [thrd::0/internal]: :0/internal: Handle is terminating: failed 0 request(s) in retry+outbuf
LOG-7-BROKERFAIL: [thrd::0/internal]: :0/internal: failed: err: Local: Broker handle destroyed: (errno: No error)
LOG-7-STATE: [thrd::0/internal]: :0/internal: Broker changed state UP -> DOWN
LOG-7-TOPBRK: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Topic topic.test [0]: leaving broker (0 messages in xmitq, next leader (none), rktp 0000024322141700)
LOG-7-TOPBRK: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Topic topic.test [0]: no next leader, failing 0 message(s) in partition queue
LOG-7-TERMINATE: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Handle is terminating: failed 0 request(s) in retry+outbuf
LOG-7-BROKERFAIL: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: failed: err: Local: Broker handle destroyed: (errno: No error)
LOG-7-STATE: [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/0: Broker changed state UP -> DOWN

There are no SSL errors in the above log, it succesfully connects to broker 0 and manages to acquire Metadata, which means the SSL layer is working.

After that (there are no timestamps so hard to say how long) it seems like you are bringing down the consumer (Terminating instance), possibly triggered by Consume failed: Local: Communication failure with broker.

Could you provide the full logs from client startup to shutdown?

Do you mean client logs or from broker?

Client

Den 16 maj 2017 5:37 em skrev "TheMidgardWatcher" <[email protected]

:

Do you mean client logs or from broker?

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/confluentinc/confluent-kafka-dotnet/issues/61#issuecomment-301822335,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAgCvvYt2RvA6keeLpR9dpWm452QKQVDks5r6cLAgaJpZM4MEHNK
.

Here you are:

client.log.txt

There are no errors in the log, which makes the Consume error weird.
I'm guessing you are shutting down your app when you get a consume error, can you let it continue to run to see if it works despite the consume error, possibly providing more logs?

It would also be great if you could provide the source code of your program, the parts interacting with the Kafka library

I'm using only librd sources, no other apps, just console command. I've compiled it locally with vs 2013.
Windows 10 Enterprise x64
OpenSSL 1.0.2k
Latest Kafka 2.12-0.10.2.1
Zookeeper 3.4.10

Console command I'm running:
C:\Sources\librdkafka-master\win32\outdir\v120\x64\Release>rdkafka_example -C -t topic.test -b localhost:9093 -X security.protocol=ssl -X ssl.ca.location=d:\\Certificates\\TEST\\ca-cert -X ssl.certificate.location=d:\\Certificates\\TEST\\client.pem -X ssl.key.location=d:\\Certificates\\TEST\\client.key -X ssl.key.password=123456 -X debug=security,broker,protocol -p 0 -o 0 2>client.log
Full console output:
consumer.log.txt

Hi there, any updates on this Issue?

This looks like a problem with rdkafka_example..

Is topic.test an existing topic?
Can you try -L (metadata list) or -P (produce) instead of -C ?

Is topic.test an existing topic?
Can you try -L (metadata list) or -P (produce) instead of -C ?

Yes it exists.
-L works fine.

Okay, good, that means your SSL configuration is working as it should.
I think you can disregard the consumer error (please file a librdkafka issue for it!), it is an example tool bug.

Ok, I'll start issue in librd repo. But what to do next with "SSL Handshake" error?

rdkafka_example isnt showing SSH Handshake error, right?
Do you get the SSH Handshake error with dotnet using the exact same configuration you use for rdkafka_example?

Hi, @edenhill !

After my testing i found that:

  1. On my local kafka everything works fine - both .net consumer and kafka's console consumer.
  2. On remote confluent 3.2.0 platform - .net client throws 'SSL handshake' and 'Shutdown while init' errors, but kafka's console consumer works properly!

I'm going crazy with this issue...

Can you provide logs from the non-working client with
debug=security,protocol,broker?

The client might be unable to verify the broker's key due to no default ca
cert locations, specifying the ca.location might do the trick.

This commit fixes that :
https://github.com/edenhill/librdkafka/commit/10f9de510cf088381d7199bea8bd7a65b97ab5f1#diff-226483c4c83ab9400939e815bcb19564

Den 24 maj 2017 12:49 skrev "TheMidgardWatcher" notifications@github.com:

Hi, @edenhill https://github.com/edenhill !

After my testing i found that:

  1. On my local kafka everything works fine - both .net consumer and
    kafka's console consumer.
  2. On remote confluent 3.2.0 platform - .net client throws 'SSL
    handshake' and 'Shutdown while init' errors, but kafka's console consumer
    works properly!

I'm already went crazy with this issue...

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/confluentinc/confluent-kafka-dotnet/issues/61#issuecomment-303689196,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAgCvow8JCiVH2fr9mo20svPmPD2rgEiks5r9As_gaJpZM4MEHNK
.

I'm passing ssl.ca.location. You can see it in attached config dump.
client_config_dump.txt
client_debug_log.txt

Thanks,

what if you run the openssl client tool, it should give you some more information what is wrong:
openssl s_client -connect <broker>:<port>

openssl output:
C:\OpenSSL-Win64\bin>openssl s_client -connect broker1:9093 CONNECTED(000001A8) depth=1 C = XX, L = YYY, O = ZZZ verify error:num=19:self signed certificate in certificate chain 79952:error:1408E0F4:SSL routines:ssl3_get_message:unexpected message:.\ssl\s3_both.c:408:

That's strange, because if i call
openssl s_client -CAfile D:\Certificates\Sandbox\ca.crt -cert D:\Certificates\Sandbox\client.crt -key D:\Certificates\Sandbox\client.key -connect broker1:9093
I got:

CONNECTED(000001AC)
depth=1 C = XX, L = YYY, O = ZZZ
verify return:1
depth=0 CN = *.zzz.com
verify return:1
[...]
Verify return code: 0 (ok)

Looking at the logs it seems to be able to connect with SSL to broker1 and broker3, but not broker2.
Try running the s_client thing for all three brokers and see if you get different results for broker2

All brokers are returning:
Verify return code: 0 (ok)
That's only if i pass -cert and -key

Try running the Kafka client with just one bootstrap broker at the time, and see if there is a difference.
If it manages to connect and acquire metadata it means the config/certs are correct and it would indicate an issue with reusing the same SSL context.

Nothing changed. Here is the logs and config:
client_config_dump.txt
client_debug_log.txt

Is this with Kafka v 0.10.2.1 and librdkafka v0.9.5?

As i know it is:
Confluent 3.2.0
Kafka 0.10.2.0

And librdkafka v0.9.5

If you enable SSL debugging on the broker, will it tell you anything interesting?
Add JVM flags: -D java.security.debug=all

Sorry for delay. Here is error from broker logs (if u need - I can send you full logs from 3 brokers):

[2017-05-25 18:20:15,693] DEBUG SSLEngine.closeInBound() raised an exception. (org.apache.kafka.common.network.SslTransportLayer:733)
javax.net.ssl.SSLException: Inbound closed before receiving peer's close_notify: possible truncation attack?
    at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
    at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666)
    at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1634)
    at sun.security.ssl.SSLEngineImpl.closeInbound(SSLEngineImpl.java:1561)
    at org.apache.kafka.common.network.SslTransportLayer.handshakeFailure(SslTransportLayer.java:731)
    at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:314)
    at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:69)
    at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:350)
    at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
    at kafka.network.Processor.poll(SocketServer.scala:494)
    at kafka.network.Processor.run(SocketServer.scala:432)
    at java.lang.Thread.run(Thread.java:745)
[2017-05-25 18:20:15,694] DEBUG Connection with workstation/10.6.XX.XX disconnected (org.apache.kafka.common.network.Selector:375)
javax.net.ssl.SSLHandshakeException: certificate verify message signature error
    at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431)
    at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
    at sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1214)
    at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186)
    at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469)
    at org.apache.kafka.common.network.SslTransportLayer.handshakeWrap(SslTransportLayer.java:382)
    at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:243)
    at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:69)
    at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:350)
    at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
    at kafka.network.Processor.poll(SocketServer.scala:494)
    at kafka.network.Processor.run(SocketServer.scala:432)
    at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLHandshakeException: certificate verify message signature error
    at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
    at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666)
    at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304)
    at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:292)
    at sun.security.ssl.ServerHandshaker.clientCertificateVerify(ServerHandshaker.java:1658)
    at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:293)
    at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
    at sun.security.ssl.Handshaker$1.run(Handshaker.java:919)
    at sun.security.ssl.Handshaker$1.run(Handshaker.java:916)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1369)
    at org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:336)
    at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:417)
    at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:270)
    ... 6 more

Did you look into the exception being thrown?

[2017-05-25 18:20:15,694] DEBUG Connection with workstation/10.6.XX.XX disconnected (org.apache.kafka.common.network.Selector:375)
javax.net.ssl.SSLHandshakeException: certificate verify message signature error

Our DevOps had some tests. And that what they reported:

  1. Java driver with enabled -D java.security.debug=all on brokers doesn't throw any ssl errors.
  2. openssl tool - works fine.
  3. librdkafka_example compiled under linux OS - works fine.

So they definitely sure that problem not in Brokers or certificates. Also python librdkafka - works fine too.

Is that python librdkafka on Windows or Linux?

Also, did you find the cause of the broker exception?

[2017-05-25 18:20:15,694] DEBUG Connection with workstation/10.6.XX.XX disconnected (org.apache.kafka.common.network.Selector:375)
javax.net.ssl.SSLHandshakeException: certificate verify message signature error

Is that python librdkafka on Windows or Linux?

python librdkafka runs on Linux OS

@TheMidgardWatcher. @edenhill
is there any example for .net consumer/producer using SASL?

@kavyashivakumar Sorry - we don't use SASL consumers.

@edenhill Also, did you find the cause of the broker exception?

No, after long investigations I'm not able to determine the cause of it...

@edenhill Hi, Author. I also hit this issue in my server environment. The first time of running telegraf(with kafka_consumer re-writted, using librdkafka for ssl connection) is successful, but after once re-start service, the handshake of ssl connection always failed. It reports:

%3|1497259160.209|FAIL|rdkafka#consumer-1| [thrd:ssl://158.85.44.247:9093/bootstrap]: ssl://158.85.44.247:9093/bootstrap: SSL handshake failed: s3_both.c:406: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)

I used:
librdkafka : master branch
kafka : kafka_2.11-0.10.2.0.tgz
telegraf : telegraf_1.1.0_amd64.deb
openssl : OpenSSL 1.0.2g 1 Mar 2016

I check the source code, the error happens on rd_kafka_transport_ssl_handhsake function in rdkafka_transport.c file. When executing "SSL_do_handshake", it always return "unexpected message" and error-return-value 2, which means "SSL_ERROR_WANT_READ".

My openssl connection result is ok, but it indeed report unexpected message.

openssl s_client -connect *:9093 :

CONNECTED(00000003)
verify return:1
140414289491608:error:1408E0F4:SSL routines:ssl3_get_message:unexpected message:s3_both.c:406:

Server public key is 1024 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : DHE-DSS-AES128-GCM-SHA256
Key-Arg : None
PSK identity: None
PSK identity hint: None
SRP username: None
Start Time: 1497260475
Timeout : 300 (sec)
Verify return code: 0 (ok)

I checked two clue below, maybe something wrong with ssl3's stranger behavior.
https://stackoverflow.com/questions/28011581/websocket-ssl-handshake-failure
https://www.openssl.org/docs/man1.1.0/ssl/SSL_get_error.html

Is it any method to allow us not use ssl3 connection ?

Hi there!

@edenhill - We found a couple of environments where simple consumer example works (Confluent kafka platform is the same). And now we are investigating why, and what the difference between working and non-working workstations.

@SStar1314 have you tried to run it on a different machines and don't get Handshake error?

UPD (2017/6/14): @edenhill We've found that people who reported that the example works fine, just didn't add the OnError handler, and they simply didn't see errors in console, but errors was there.

@SStar1314 Have you tried to update your kafka to 0.10.2.1 version?

@TheMidgardWatcher I tried to run the command on another similar environment, the handshake error happens as above attached. And after two days struggle work, the issue disappear for no-reason.
I didn't update kafka's version, I re-build librdkafka for many times to dump error process, but make no effect. Then by chance, I add dump error message on Telegraf's kafka-consumer plugin , I re-build Telegraf, and after I restart Telegraf, the issue disappear, not report handshake error anymore. And use openssl command to communicate directly also don't report error message.
So, I got two environments, both got the ssl handshake error, one fixed through re-build telegraf, another is hold for more investigation. No more clue.
That fix is quite stranger, make no sense, I tried to reboot machine several times but make no changes.

@SStar1314 we're fighting with this issue since rdlkafka-dotnet with no result. But i found This Kafka issue KAFKA-4959 that might be a reason of ssl handshake errors. So now we are upgrading our environments to check if issue is gone on kafka 0.10.2.1. I'd recommend you to do same thing.

@TheMidgardWatcher Thanks. I tried kafka 0.10.2.1 today, not fix for my environment, issue still exist.
If you config kafka server.properties to set ssl.client.auth=none, the handshake error disappear. I am wondering if there is mis-understanding usage about this config.

Unfortunately, for me 0.10.2.1 update had no success too...

Followed the docs, got the exact same problem as SStar1314; as soon as I set ssl.client.auth=required on the broker I get:
ssl://kafka1.XXXXX.com:9093/bootstrap: SSL handshake failed

Happy to provide any info required, just let me know what :)

EDIT: so I got some certs from our in house CA instead of using self signed and this seems to have helped somewhat. I only intermittently get the handshake error from each of the brokers in my cluster but can still consume everything fine.

EDIT2: So if I send my test client direct to a single broker I get the handshake/shutdown errors for every other broker in the cluster. This seems to be the case regardless of which one I point it at.

Still battling with this, things I've tried:

  • a cluster on Ubuntu 16.04 LTS
  • a cluster Windows Server 2012
  • enabling only TLSv1, then TLS v1.1 and finally TLSv1.2
  • disabling hostname verification (ssl.endpoint.identification.algorithm)
  • explicitly setting ssl.secure.random.implementation to SHA1PRNG
  • re-created certs and triple checked the keystores & truststores

I can consume all records from all topics the majority of the time despite the errors but it does occasionally fail completely with "5/5 brokers down".

Has anyone got any further?

@edenhill Could you comment this posts above? Seems like this issue is more global than only someone's local environment or configuration...

Please try librdkafka v0.11.0-RC2 which has some SSL error propagation fixes

To @edenhill , just checked and got a bunch of this:

Error: Local_Ssl ssl://broker1:9093/bootstrap: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
Error: Local_Ssl ssl://broker2:9093/bootstrap: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
Error: Local_Transport ssl://broker1:9093/bootstrap: Receive failed: .\ssl\ssl_lib.c:1075: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init: 
Error: Local_Ssl ssl://broker3:9093/bootstrap: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
Error: Local_AllBrokersDown 3/3 brokers are down
Error: Local_Ssl ssl://broker2:9093/bootstrap: SSL handshake failed: SSL syscall error number: 5: No error
Error: Local_Ssl ssl://broker1:9093/bootstrap: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
Error: Local_Ssl ssl://broker3:9093/bootstrap: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
Error: Local_Transport ssl://broker2:9093/bootstrap: Receive failed: .\ssl\ssl_lib.c:1075: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init: 
Error: Local_AllBrokersDown 3/3 brokers are down

When is this occuring? Directly after connect? At regular intervals (say.. the broker idle connection reaper time (10min default))? Or suddenly?
Does it happen for all brokers simultaneously?
Are there any hints in the broker logs?
Are there any occassions where this does not occur?

@edenhill

  • It's on first connect and reconnect (I paused the debugger earlier and it had to re-establish a connection). Once connected and consuming/producing it doesn't seem to re-occur.
  • It can happen for all brokers simultaneously which results in the "AllBrokersDown X/X brokers down" but it does manage to connect on the next attempt usually.
  • Nothing shows up in the broker logs.
  • If I delete all zookeeper and kafka data so I effectively have a brand new cluster, I sometimes won't get it right away but it appears later.
  • It seems completely random, but setting the client config to a single broker (any in the cluster) will usually result in errors from all other brokers. Errors from the single broker are then rare.

It would be great if you could find the most minimal test case to reproduce this, preferably a single broker on localhost or similar, with a trivial client application.

I managed to replicate on localhost, by using two brokers.

Everything run on windows10, using kafka 0.11.0.0 and librdkafka 0.11.0-RC2

Broker 0: PLAINTEXT://:9092,SSL://:9093
Broker 1: PLAINTEXT://:9095,SSL://:9094

SSL configuration done with https://github.com/edenhill/librdkafka/wiki/Using-SSL-with-librdkafka, using openssl version mentioned here: https://github.com/edenhill/librdkafka/blob/master/README.win32

server.properties :

broker.id=0
listeners=PLAINTEXT://:9092,SSL://:9093
ssl.keystore.location=D:/kafka/ssl/broker_localhost_server.keystore.jks
ssl.keystore.password=abcdefgh
ssl.keystore.type=JKS
ssl.key.password=abcdefgh
ssl.truststore.location=D:/kafka/ssl/broker_localhost_server.truststore.jks
ssl.truststore.password=abcdefgh
ssl.truststore.type=JKS
ssl.protocol = TLS
ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
ssl.client.auth=required

server2.properties:

broker.id=1
listeners=PLAINTEXT://:9095,SSL://:9094
ssl.keystore.location=D:/kafka/ssl/broker_localhost2_server.keystore.jks
ssl.truststore.location=D:/kafka/ssl/broker_localhost2_server.truststore.jks
...

librdkafka config:


                { "bootstrap.servers", brokerList },
                { "security.protocol", "ssl" },
                { "ssl.ca.location", @"D:/kafka/ssl/ca-cert" },
                { "ssl.certificate.location", @"D:/kafka/ssl/client_local_client.pem" },
                { "debug" , "security" },
                { "ssl.key.location", @"D:/kafka/ssl/client_local_client.key" },
                { "ssl.key.password", "abcdefgh" }

using simpleProducer (just modifying config and reporting error). Behaviour seems similar on 0.9.5 and 0.11.0.0-RC2 (tested on confluent.kafka 0.11.x branch, but it shouldn't change anything)

7|2017-06-29 23:53:40.027|rdkafka#producer-1|SSL| [thrd:app]: Loading CA certificate(s) from file D:/kafka/ssl/ca-cert
7|2017-06-29 23:53:40.083|rdkafka#producer-1|SSL| [thrd:app]: Loading certificate from file D:/kafka/ssl/client_local_client.pem
7|2017-06-29 23:53:40.084|rdkafka#producer-1|SSL| [thrd:app]: Loading private key file from D:/kafka/ssl/client_local_client.key
7|2017-06-29 23:53:40.084|rdkafka#producer-1|SSLPASSWD| [thrd:app]: Private key file "D:/kafka/ssl/client_local_client.key" requires password
rdkafka#producer-1 producing on test2. q to exit.
7|2017-06-29 23:53:40.143|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/bootstrap: Broker SSL certificate verified
7|2017-06-29 23:53:40.238|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9094/1]: ssl://DESKTOP-LNQ6K3V:9094/1: Broker SSL certificate verified
3|2017-06-29 23:53:40.245|rdkafka#producer-1|FAIL| [thrd:ssl://DESKTOP-LNQ6K3V:9093/0]: ssl://DESKTOP-LNQ6K3V:9093/0: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
3|2017-06-29 23:53:40.246|rdkafka#producer-1|FAIL| [thrd:ssl://DESKTOP-LNQ6K3V:9094/1]: ssl://DESKTOP-LNQ6K3V:9094/1: Receive failed: .\ssl\ssl_lib.c:1075: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init:
ssl://DESKTOP-LNQ6K3V:9093/0: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
ssl://DESKTOP-LNQ6K3V:9094/1: Receive failed: .\ssl\ssl_lib.c:1075: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init:
7|2017-06-29 23:53:40.593|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9093/0]: ssl://DESKTOP-LNQ6K3V:9093/0: Broker SSL certificate verified
7|2017-06-29 23:53:40.597|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9094/1]: ssl://DESKTOP-LNQ6K3V:9094/1: Broker SSL certificate verified

I can produce messages normally without other error

With only one broker up when lauching the app, the handshake failed does not appear

7|2017-06-30 00:06:38.758|rdkafka#producer-1|SSL| [thrd:app]: Loading CA certificate(s) from file D:/kafka/ssl/ca-cert
7|2017-06-30 00:06:38.764|rdkafka#producer-1|SSL| [thrd:app]: Loading certificate from file D:/kafka/ssl/client_local_client.pem
7|2017-06-30 00:06:38.764|rdkafka#producer-1|SSL| [thrd:app]: Loading private key file from D:/kafka/ssl/client_local_client.key
7|2017-06-30 00:06:38.764|rdkafka#producer-1|SSLPASSWD| [thrd:app]: Private key file "D:/kafka/ssl/client_local_client.key" requires password
rdkafka#producer-1 producing on test2. q to exit.
7|2017-06-30 00:06:38.802|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9094/bootstrap]: ssl://localhost:9094/bootstrap: Broker SSL certificate verified
7|2017-06-30 00:06:38.867|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9094/1]: ssl://DESKTOP-LNQ6K3V:9094/1: Broker SSL certificate verified
3|2017-06-30 00:06:39.780|rdkafka#producer-1|FAIL| [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/bootstrap: Connect to ipv4#127.0.0.1:9093 failed: Aucune connexion n’a pu être établie car l’ordinateur cible l’a expressément refusée.

ssl://localhost:9093/bootstrap: Connect to ipv4#127.0.0.1:9093 failed: Aucune connexion n’a pu être établie car l’ordinateur cible l’a expressément refusée.

Making the other broker up, an any other connection/disconnection won't produce error - only at startup, and not always same error. Below 4 consecutive run with the two brokers alive, the error SSL handshake failed sometimes comes with a Receive failed, and sometimes no error at all:

$ dotnet run
7|2017-06-30 00:13:50.015|rdkafka#producer-1|SSL| [thrd:app]: Loading CA certificate(s) from file D:/kafka/ssl/ca-cert
7|2017-06-30 00:13:50.020|rdkafka#producer-1|SSL| [thrd:app]: Loading certificate from file D:/kafka/ssl/client_local_client.pem
7|2017-06-30 00:13:50.021|rdkafka#producer-1|SSL| [thrd:app]: Loading private key file from D:/kafka/ssl/client_local_client.key
7|2017-06-30 00:13:50.021|rdkafka#producer-1|SSLPASSWD| [thrd:app]: Private key file "D:/kafka/ssl/client_local_client.key" requires password
rdkafka#producer-1 producing on test2. q to exit.
7|2017-06-30 00:13:50.046|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/bootstrap: Broker SSL certificate verified
ssl://localhost:9094/bootstrap: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
7|2017-06-30 00:13:50.062|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9093/0]: ssl://DESKTOP-LNQ6K3V:9093/0: Broker SSL certificate verified
7|2017-06-30 00:13:50.063|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9094/1]: ssl://DESKTOP-LNQ6K3V:9094/1: Broker SSL certificate verified
7|2017-06-30 00:13:51.067|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9094/bootstrap]: ssl://localhost:9094/bootstrap: Broker SSL certificate verified
q
7|2017-06-30 00:14:34.851|rdkafka#producer-1|DESTROY| [thrd:app]: Terminating instance
7|2017-06-30 00:14:34.851|rdkafka#producer-1|DESTROY| [thrd:main]: Destroy internal
7|2017-06-30 00:14:34.851|rdkafka#producer-1|DESTROY| [thrd:main]: Removing all topics

$ dotnet run
7|2017-06-30 00:14:38.129|rdkafka#producer-1|SSL| [thrd:app]: Loading CA certificate(s) from file D:/kafka/ssl/ca-cert
7|2017-06-30 00:14:38.134|rdkafka#producer-1|SSL| [thrd:app]: Loading certificate from file D:/kafka/ssl/client_local_client.pem
7|2017-06-30 00:14:38.134|rdkafka#producer-1|SSL| [thrd:app]: Loading private key file from D:/kafka/ssl/client_local_client.key
7|2017-06-30 00:14:38.134|rdkafka#producer-1|SSLPASSWD| [thrd:app]: Private key file "D:/kafka/ssl/client_local_client.key" requires password
rdkafka#producer-1 producing on test2. q to exit.
7|2017-06-30 00:14:38.149|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9094/bootstrap]: ssl://localhost:9094/bootstrap: Broker SSL certificate verified
ssl://localhost:9093/bootstrap: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
ssl://localhost:9094/bootstrap: Receive failed: .\ssl\ssl_lib.c:1075: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init:
2/2 brokers are down
7|2017-06-30 00:14:39.172|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/bootstrap: Broker SSL certificate verified
7|2017-06-30 00:14:39.180|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9094/bootstrap]: ssl://localhost:9094/bootstrap: Broker SSL certificate verified
7|2017-06-30 00:14:39.192|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9093/0]: ssl://DESKTOP-LNQ6K3V:9093/0: Broker SSL certificate verified
7|2017-06-30 00:14:39.201|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9094/1]: ssl://DESKTOP-LNQ6K3V:9094/1: Broker SSL certificate verified
q
7|2017-06-30 00:15:51.985|rdkafka#producer-1|DESTROY| [thrd:app]: Terminating instance
7|2017-06-30 00:15:51.985|rdkafka#producer-1|DESTROY| [thrd:main]: Destroy internal
7|2017-06-30 00:15:51.985|rdkafka#producer-1|DESTROY| [thrd:main]: Removing all topics
$ dotnet run
7|2017-06-30 00:15:56.326|rdkafka#producer-1|SSL| [thrd:app]: Loading CA certificate(s) from file D:/kafka/ssl/ca-cert
7|2017-06-30 00:15:56.331|rdkafka#producer-1|SSL| [thrd:app]: Loading certificate from file D:/kafka/ssl/client_local_client.pem
7|2017-06-30 00:15:56.331|rdkafka#producer-1|SSL| [thrd:app]: Loading private key file from D:/kafka/ssl/client_local_client.key
7|2017-06-30 00:15:56.331|rdkafka#producer-1|SSLPASSWD| [thrd:app]: Private key file "D:/kafka/ssl/client_local_client.key" requires password
rdkafka#producer-1 producing on test2. q to exit.
7|2017-06-30 00:15:56.355|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9094/bootstrap]: ssl://localhost:9094/bootstrap: Broker SSL certificate verified
ssl://localhost:9093/bootstrap: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
ssl://localhost:9094/bootstrap: Receive failed: .\ssl\ssl_lib.c:1075: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init:
2/2 brokers are down
7|2017-06-30 00:15:57.382|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9094/bootstrap]: ssl://localhost:9094/bootstrap: Broker SSL certificate verified
7|2017-06-30 00:15:57.387|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/bootstrap: Broker SSL certificate verified
7|2017-06-30 00:15:57.402|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9094/1]: ssl://DESKTOP-LNQ6K3V:9094/1: Broker SSL certificate verified
ssl://DESKTOP-LNQ6K3V:9093/0: SSL handshake failed: .\ssl\s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
ssl://DESKTOP-LNQ6K3V:9094/1: Receive failed: .\ssl\ssl_lib.c:1075: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init:
7|2017-06-30 00:15:57.673|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9094/1]: ssl://DESKTOP-LNQ6K3V:9094/1: Broker SSL certificate verified
7|2017-06-30 00:15:57.678|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9093/0]: ssl://DESKTOP-LNQ6K3V:9093/0: Broker SSL certificate verified

no error :

$ dotnet run
7|2017-06-30 00:18:12.710|rdkafka#producer-1|SSL| [thrd:app]: Loading CA certificate(s) from file D:/kafka/ssl/ca-cert
7|2017-06-30 00:18:12.716|rdkafka#producer-1|SSL| [thrd:app]: Loading certificate from file D:/kafka/ssl/client_local_client.pem
7|2017-06-30 00:18:12.716|rdkafka#producer-1|SSL| [thrd:app]: Loading private key file from D:/kafka/ssl/client_local_client.key
7|2017-06-30 00:18:12.716|rdkafka#producer-1|SSLPASSWD| [thrd:app]: Private key file "D:/kafka/ssl/client_local_client.key" requires password
rdkafka#producer-1 producing on test2. q to exit.
7|2017-06-30 00:18:12.734|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9094/bootstrap]: ssl://localhost:9094/bootstrap: Broker SSL certificate verified
7|2017-06-30 00:18:12.742|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://localhost:9093/bootstrap]: ssl://localhost:9093/bootstrap: Broker SSL certificate verified
7|2017-06-30 00:18:12.751|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9093/0]: ssl://DESKTOP-LNQ6K3V:9093/0: Broker SSL certificate verified
7|2017-06-30 00:18:12.757|rdkafka#producer-1|SSLVERIFY| [thrd:ssl://DESKTOP-LNQ6K3V:9094/1]: ssl://DESKTOP-LNQ6K3V:9094/1: Broker SSL certificate verified

Will try to do more tests this we, but I assume there is some kind of race when trying to contact multiple brokers at startup.

Also, I don't have any more error with ssl.client.auth=none, and those are just debug messages (OnError does get called)

@edenhill did you try linking openssl 1.1.0 instead of 1.0.2?

Thanks alot, this is very helpful and leads me to believe there is a
concurrency problem with multiple simultaneos ssl sessions.
Will investigate

Den 30 jun 2017 01:02 skrev "treziac" notifications@github.com:

@edenhill https://github.com/edenhill did you try linking openssl 1.1.0
instead of 1.0.2?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/confluentinc/confluent-kafka-dotnet/issues/61#issuecomment-312131142,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAgCvlIwF_ewtcPdFQo0Stwf68WdqTjgks5sJC0GgaJpZM4MEHNK
.

Hi, @edenhill !

Any updates on this issue?

Hi @edenhill, I'm working on connecting a .NET Client on a Windows host using SASL_SSL. Successfully able to connect with the Java client on a Unix host on SASL_SSL. Is there a configuration template for Windows hosts using this protocol?

In addition, tests are being run using the kafka-console-consumer bat file.

Hi, @edenhill !

Any updates on this issue?

Hi guys, any news about fixing this issue?

This issue is a mix of SSL problems and feature request for SASL Kerberos support, the latter is explained here: https://github.com/edenhill/librdkafka/wiki/Using-SASL-with-librdkafka-on-Windows

The former should have its own issue.

Hi there, @edenhill, is there any news about this "Ssl handshake failed" issue?

How can we force to fix this ASAP? This issue lasts from January ((

@TheMidgardWatcher Can you try out librdkafka master and verify this fixes the problem?
Artifacts are available here:
https://ci.appveyor.com/project/edenhill/librdkafka/build/job/tdlfq2w6jii8t1y1/artifacts

Thanks

@edenhill Seems like it works. But to be 100% sure - could you publish this package into pre-release nuget feed?

Hi, @edenhill ! I'm testing your fix, and i don't see any SSL or Handshake Exceptions - that's great!
We are using confluent 3.2.1 with 3 brokers.

PS: producer is horribly slow - 1k of avro records are sent in 15-20 minutes.

SSL: That's great news, thanks!

Perf: try setting linger.ms to 100ms or more.

I've started producer with linger.ms=1000

But, as we see from consumer log screenshot - producer sends ~3-5 messages in ~3-5 seconds almost 1 message/sec
consumer_log

I suggest focusing only on the producer if you are troubleshooting producer performance.
Register a delivery report handler and measure the message rate there.
To get an insight into what is happening under the hood, enable debug property with value msg,protocol and keep an eye on the number of messages per MessageSet (batch) and the size of ProduceRequests.

Thanks for advice! I'll look into it.
Handshake issue was fixed, and producer performance - is a story for another day.

UPD: producing was so slow because we produced each message synchronously :-| My bad. Now it sends >25k of messages in 5-6seconds

Hi @edenhill, could you pls speed up release of this handshake fix? We're really demanding it, kinda blocker for our team ((

The final release will be a week or two, but we can get an RC up on NuGet mid this week.

Oh, that would be perfect!

we can get an RC up on NuGet mid this week

Hi @edenhill, any news??

sorry for the delay. I can assure you he's actively working on this... we're doing a lot of work to streamline librdkafka releases in general, and this is part of that effort.

Hey there!

@edenhill or @mhowlett, will the next release be compatible with .Net Core 2.0?

I will test that, yes. related: #291.

Kindly please share how this issue was resolved. I am using confluent 3.3.0 and I am seeing similar issue with my Python Avro producer and consumer

%3|1540503888.323|FAIL|rdkafka#producer-1| [thrd:ssl://xxxx.hostname.com:9093/bootstrap]: ssl://xxxx.hostname.com:9093/bootstrap: Connect to ipv4#x.x.x.x:9093 failed: Connection refused
%3|1540503888.323|ERROR|rdkafka#producer-1| [thrd:ssl://xxxx.hostname.com:9093/bootstrap]: ssl://xxxx.hostname.com:9093/bootstrap: Connect to ipv4#x.x.x.x:9093 failed: Connection refused
%3|1540503888.409|FAIL|rdkafka#producer-1| [thrd:ssl://yyyy.hostname.com:9093/bootstrap]: ssl://yyyy.hostname.com:9093/bootstrap: SSL handshake failed: s3_both.c:408: error:1408E0F4:SSL routines:ssl3_get_message:unexpected message: : client authentication might be required (see broker log)
.......................................

@buntyray you maybe found a culprit?

Was this page helpful?
0 / 5 - 0 ratings