Beats: Heartbeat using closed connections for body checks with TLS

Created on 5 Oct 2018 · 4Comments · Source: elastic/beats

Looking at this thread https://discuss.elastic.co/t/heartbeat-http-use-of-closed-network-connection/147876 it appears we may have a bug with our socket lifecycle with TLS connections.

Seems like we possibly close the conn too early.

Heartbeat bug

Source

andrewvc

Most helpful comment

OK, well, it took me longer to find than I'd like to admit, but the root cause is here: https://github.com/elastic/beats/blob/master/heartbeat/monitors/active/http/simple_transp.go#L88

Essentially we close the conn before we return from the RoundTripper. This is bad because the body is actually read later.

The two options are to:

Link the closing of the connection with the closing of the body
Immediately read the body within RoundTrip

andrewvc on 19 Oct 2018

👍2

All 4 comments

This is a tricky one to track down. Turns out this is at least as old as 6.2.0.

One avenue to investigate https://github.com/jasonwbarnett/fileserver/commit/040494994846890aa8b472a0f5339faa69253ee4

will update when I have more...

andrewvc on 19 Oct 2018

I should mention this isn't TLS specific, this is simply to do with bodies over a certain size. It appears we terminate the connection early for larger bodies.

andrewvc on 19 Oct 2018

Well, it's definitely somewhere in our custom dialer, I'm going to tear it apart tomorrow and isolate the failure.

@urso if you have an idea where the failure might be off the top of your head, LMK. If not, I'm glad to dive in and get my hands dirty with the innards of the dialer chain.

One note, I had thought it might be timeouts being incorrectly set to some tiny value that triggered after a few hundred bytes, but that doesn't seem to be it.

andrewvc on 19 Oct 2018

OK, well, it took me longer to find than I'd like to admit, but the root cause is here: https://github.com/elastic/beats/blob/master/heartbeat/monitors/active/http/simple_transp.go#L88

Essentially we close the conn before we return from the RoundTripper. This is bad because the body is actually read later.

The two options are to: