Looking at this thread https://discuss.elastic.co/t/heartbeat-http-use-of-closed-network-connection/147876 it appears we may have a bug with our socket lifecycle with TLS connections.
Seems like we possibly close the conn too early.
This is a tricky one to track down. Turns out this is at least as old as 6.2.0.
One avenue to investigate https://github.com/jasonwbarnett/fileserver/commit/040494994846890aa8b472a0f5339faa69253ee4
will update when I have more...
I should mention this isn't TLS specific, this is simply to do with bodies over a certain size. It appears we terminate the connection early for larger bodies.
Well, it's definitely somewhere in our custom dialer, I'm going to tear it apart tomorrow and isolate the failure.
@urso if you have an idea where the failure might be off the top of your head, LMK. If not, I'm glad to dive in and get my hands dirty with the innards of the dialer chain.
One note, I had thought it might be timeouts being incorrectly set to some tiny value that triggered after a few hundred bytes, but that doesn't seem to be it.
OK, well, it took me longer to find than I'd like to admit, but the root cause is here: https://github.com/elastic/beats/blob/master/heartbeat/monitors/active/http/simple_transp.go#L88
Essentially we close the conn before we return from the RoundTripper. This is bad because the body is actually read later.
The two options are to:
RoundTrip
Most helpful comment
OK, well, it took me longer to find than I'd like to admit, but the root cause is here: https://github.com/elastic/beats/blob/master/heartbeat/monitors/active/http/simple_transp.go#L88
Essentially we close the conn before we return from the
RoundTripper. This is bad because the body is actually read later.The two options are to:
RoundTrip