Runtime: HttpClient.GetStringAsync Suppressed Exceptions Cause Deadlock

Created on 26 Mar 2020  路  10Comments  路  Source: dotnet/runtime

HttpClient.GetStringAsync Suppressed Exceptions Cause Deadlock

.NET Core 3.1 running on Windows 10. Console application using default template, entire application code is included below.

```c#
using System.Net.Http;
using System.Threading.Tasks;
namespace tmp
{
class Program
{
static async Task Main(string[] args)
{
string test_site = "https://beautymatter.com/";
HttpClient client = new HttpClient();
await client.GetStringAsync(test_site); //permanent deadlock 10% of the time
await client.GetAsync(test_site); //throws exception 10% of the time
}
}
}

```

I found a website, beautymatter.com, that causes an unusual deadlock when using HttpClient. The site has some odd configurations that result in random I/O connection failures (~10% of the time) that incidentally make it an excellent platform to test for edge-cases.

What I discovered is that the HttpClient.GetStringAsync appears to be suppressing exceptions and entering permanent deadlock when the transport stream fails to close properly. HttpClient.GetAsync correctly throws exceptions for these transport errors.

area-System.Net.Http bug

Most helpful comment

I reported this issue before at #31259 馃憤

Here is the relevant code: https://github.com/dotnet/runtime/blob/4b03513e20e0caf9ee34817eb74356903f1bb33e/src/libraries/System.Net.Http/src/System/Net/Http/HttpClient.cs#L158

In dotnet core 3.1 these methods do not have the cancellation token overloads yet, but still specify HttpCompletionOption.ResponseHeadersRead.

The fact is that any usage of HttpCompletionOption.ResponseHeadersRead without a timed cancellation token has the potential to permanently deadlock your app.

Here is a live repro of the issue that shows the dangerous behavior https://dotnetfiddle.net/UpFwOD - I had to set the timeout to 3 seconds due to dotnet fiddle runtime limitations but if you copy the code you will see that it permanently deadlocks indefinitely.

Full framework had a 300 second default timeout for idle sockets, but for some reason that was never added to dotnet core.

I strongly maintain that in order to make httpclient safe for customers to use, either a way to set a global idle socket timeout must be provided or all calls that use HttpCompletionOption.ResponseHeadersRead must be forced to specify a timed cancellation token ( which at this point would obviously be a huge breaking change ).

Let me give an example of how many of your customers this current behavior is endangering:

Did you know that the AWS SDK, with 52,965,649 downloads total and close to 30k downloads per day forces HttpCompletionOption.ResponseHeadersRead on all outbound .net core requests?

And how ( until 8 days ago - omg they added their first async overload ) their MD5 Wrapper stream that wraps all their responses does not provide async method overloads

And how the base Stream.cs Async methods that are then used evaluate the cancellation token on method call and then completely ignore it, meaning it can never do it's job?

And how this all results in customers thinking their usage of a Stream is safe against deadlocks because they passed in a cancellation token, which silently and unbeknownst to them actually doesn't do anything because some stream implementation in the chain didn't overload one of the async methods?

AWS Services are fairly reliable, but I've experienced multiple time - on production services mind you - that S3 got into a bad state where it never closes the connection, deadlocking our production service. Worse yet, it seems to take a few minutes for AWS to realize something is wrong and take the faulty S3 service out of rotation. Since calls to S3 are behind a round robin load balancer it means that literally every service in the entire AZ gets deadlocked. Oh yeah, restarting the services right away doesn't work either since the faulty S3 service is still in rotation.

The point I am trying to make is that if AWS makes a single mistake with the configuration of their servers, all dotnet core apps using AWS will deadlock. Period. And will remain deadlocked until they are restarted - even if AWS fixes the issue on their end right away.

Just ... let us configure a timeout for idle sockets & give us a IAsyncStream interface so we can start building modern, fully-async streams 馃槩 Or at the very least, for non-overloaded async methods with cancellation token register a callback that closes the stream if the cancellation token is cancelled prior to completion, like so: cancellationToken.Register(src => ((Stream)src).Close(), this)

Edit: also this has been reported over a year ago at https://github.com/dotnet/runtime/issues/28404#issuecomment-456418490 but the root issues have yet to be addressed.

All 10 comments

When you say "deadlock" you mean the request simply never returns?

What happens when you decrease HttpClient.Timeout property? It defaults to 100 seconds.

https://www.ssllabs.com/ssltest/analyze.html?d=beautymatter.com

reports some unusual errors in the TLS server certificate chain. Not sure if this might be related to the reported "hang" problem with HttpClient,

image

including an additional certificate being returned from the server that doesn't appear to be in the proper chain of trust for the server certificate:

image

@wfurt @bartonjs

I believe that "Certificate #2" is "the certificate we get if we don't include the SNI header", which matches what I see in openssl s_client -connect beautymatter.com:443 (no SNI) vs openssl s_client -connect beautymatter.com:443 -servername beautymatter.com (with SNI)

When you say "deadlock" you mean the request simply never returns?

What happens when you decrease HttpClient.Timeout property? It defaults to 100 seconds.

Decreasing timeout has no discernable effect.

The GetAsync call throws the following exceptions. I'm not experienced enough to read into it.

The issue is not so much about the exception or hang, it is the missing exception by the GetStringAsync that is the issue. Deadlock is the worst outcome.

Unhandled exception. System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.
   --- End of inner exception stack trace ---
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
   at System.Net.Security.SslStream.<FillBufferAsync>g__InternalFillBufferAsync|215_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)
   at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
   at System.Net.Http.HttpConnection.FillAsync()
   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.CopyToAsyncCore(Stream destination, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.CopyToAsyncCore(Stream destination, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionResponseContent.SerializeToStreamAsync(Stream stream, TransportContext context, CancellationToken cancellationToken)
   at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at tmp.Program.Main(String[] args) in C:\Users\JOEL\desktop\tmp\Program.cs:line 13
   at tmp.Program.<Main>(String[] args)

and this...

Unhandled exception. System.Net.Http.HttpRequestException: Error while copying content to a stream.
 ---> System.IO.IOException:  Received an unexpected EOF or 0 bytes from the transport stream.
   at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
   at System.Net.Http.HttpConnection.FillAsync()
   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.CopyToAsyncCore(Stream destination, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionResponseContent.SerializeToStreamAsync(Stream stream, TransportContext context, CancellationToken cancellationToken)
   at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at tmp.Program.Main(String[] args) in C:\Users\JOEL\desktop\tmp\Program.cs:line 13
   at tmp.Program.<Main>(String[] args)

I reported this issue before at #31259 馃憤

Here is the relevant code: https://github.com/dotnet/runtime/blob/4b03513e20e0caf9ee34817eb74356903f1bb33e/src/libraries/System.Net.Http/src/System/Net/Http/HttpClient.cs#L158

In dotnet core 3.1 these methods do not have the cancellation token overloads yet, but still specify HttpCompletionOption.ResponseHeadersRead.

The fact is that any usage of HttpCompletionOption.ResponseHeadersRead without a timed cancellation token has the potential to permanently deadlock your app.

Here is a live repro of the issue that shows the dangerous behavior https://dotnetfiddle.net/UpFwOD - I had to set the timeout to 3 seconds due to dotnet fiddle runtime limitations but if you copy the code you will see that it permanently deadlocks indefinitely.

Full framework had a 300 second default timeout for idle sockets, but for some reason that was never added to dotnet core.

I strongly maintain that in order to make httpclient safe for customers to use, either a way to set a global idle socket timeout must be provided or all calls that use HttpCompletionOption.ResponseHeadersRead must be forced to specify a timed cancellation token ( which at this point would obviously be a huge breaking change ).

Let me give an example of how many of your customers this current behavior is endangering:

Did you know that the AWS SDK, with 52,965,649 downloads total and close to 30k downloads per day forces HttpCompletionOption.ResponseHeadersRead on all outbound .net core requests?

And how ( until 8 days ago - omg they added their first async overload ) their MD5 Wrapper stream that wraps all their responses does not provide async method overloads

And how the base Stream.cs Async methods that are then used evaluate the cancellation token on method call and then completely ignore it, meaning it can never do it's job?

And how this all results in customers thinking their usage of a Stream is safe against deadlocks because they passed in a cancellation token, which silently and unbeknownst to them actually doesn't do anything because some stream implementation in the chain didn't overload one of the async methods?

AWS Services are fairly reliable, but I've experienced multiple time - on production services mind you - that S3 got into a bad state where it never closes the connection, deadlocking our production service. Worse yet, it seems to take a few minutes for AWS to realize something is wrong and take the faulty S3 service out of rotation. Since calls to S3 are behind a round robin load balancer it means that literally every service in the entire AZ gets deadlocked. Oh yeah, restarting the services right away doesn't work either since the faulty S3 service is still in rotation.

The point I am trying to make is that if AWS makes a single mistake with the configuration of their servers, all dotnet core apps using AWS will deadlock. Period. And will remain deadlocked until they are restarted - even if AWS fixes the issue on their end right away.

Just ... let us configure a timeout for idle sockets & give us a IAsyncStream interface so we can start building modern, fully-async streams 馃槩 Or at the very least, for non-overloaded async methods with cancellation token register a callback that closes the stream if the cancellation token is cancelled prior to completion, like so: cancellationToken.Register(src => ((Stream)src).Close(), this)

Edit: also this has been reported over a year ago at https://github.com/dotnet/runtime/issues/28404#issuecomment-456418490 but the root issues have yet to be addressed.

Triage:

  1. We'll fix GetStringAsync and GetByteArrayAsync and make them respect HttpClient's Timeout and CancelPendingRequests to address the initial issue
  2. Fine-grained socket's timeout control (including, option to set idle timeout) requires a separate discussion

it's surprising to me that this is 'up-for-grabs' and not higher priority. this bug (and the other related missing timeout/token API issues) renders HttpClient unfit for purpose in real-world network conditions.

Just confirm, seems current issue has been fixed in PR. Is it correct? Or still has something we can participate on this, thanks :)

Was this page helpful?
0 / 5 - 0 ratings