Runtime: HttpClient throws unexpected TaskCanceledException when HttpClient is reused

Created on 8 Dec 2017  路  58Comments  路  Source: dotnet/runtime

This is the raw stack trace from the exception that occurs:

FATAL: Unhandled exception (A task was canceled.).
DEBUG:    at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<FinishSendAsyncUnbuffered>d__59.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at QuarkHttp.QNewHttpClient.<SendNewAsync>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at QuarkHttp.QHttpClient.<SendAsync>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at DataApi.<GetDataAsync>d__4.MoveNext() in C:\...\DataApi.cs:line 84
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at DataPoller.<RefreshChangesAsync>d__18.MoveNext() in C:\...\DataPoller.cs:line 121
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at DataPoller.<RunPollCycle>d__17.MoveNext() in C:\...\DataPoller.cs:line 98
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

The issue:
My problem originates with the HttpWebRequest API being depricated and a significantly lower performance version being used in .NET Core 2.0. I use a library (QuarkHttp) that is a tiny wrapper around the HttpWebRequest API. When porting this wrapper library to .NET Core, I used a shared HttpClient instance to avoid the overhead of creating new connections each time (following @geoffkizer's comment /cc @davidsh). My code looks like this:

```c#
static QNewHttpClient()
{
_defaultHttpClient = CreateHttpClient((false, null));
_proxyHttpClients = new ConcurrentDictionary<(bool, IWebProxy), HttpClient>();
}

public static async Task SendNewAsync(this QHttpRequest request, bool fetchResponse = true, bool isAllowRedirects = false, int timeout = 100000, IWebProxy proxy = null)
{
HttpClient httpClientToUse = _defaultHttpClient;
if (isAllowRedirects || proxy != null)
httpClientToUse = _proxyHttpClients.GetOrAdd((isAllowRedirects, proxy), CreateHttpClient);

DateTime startTime = DateTime.Now;

HttpRequestMessage requestMessage = CreateHttpRequestMessage(request);
HttpCompletionOption completionOption = (fetchResponse ? HttpCompletionOption.ResponseContentRead : HttpCompletionOption.ResponseHeadersRead);
HttpResponseMessage responseMessage;

try
{
    responseMessage = await httpClientToUse.SendAsync(requestMessage, completionOption, (new CancellationTokenSource(timeout)).Token).ConfigureAwait(false);
}
catch (Exception ex) when (ex is TaskCanceledException || ex is OperationCanceledException)
{
    Log.Error($"{ex.GetType()} thrown (Time taken: {(DateTime.Now - startTime).TotalSeconds}s).");
    Log.Debug($"URL: {request.Url}\tMethod: {request.Method}");
    throw;
}

return new QHttpWrapNewResponse(responseMessage, fetchResponse);

}

private static HttpClient CreateHttpClient((bool allowRedirects, IWebProxy proxy) properties)
{
HttpClientHandler clientHandler = new HttpClientHandler
{
AllowAutoRedirect = properties.allowRedirects,
AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip,
UseCookies = false
};

if (properties.proxy != null)
{
    if (!clientHandler.SupportsProxy)
        throw new Exception("Proxy not supported.");

    clientHandler.UseProxy = true;
    clientHandler.Proxy = properties.proxy;
}

return new HttpClient(clientHandler)
{
    Timeout = TimeSpan.FromMilliseconds(Int32.MaxValue)
};

}

private static HttpRequestMessage CreateHttpRequestMessage(QHttpRequest request)
{
HttpRequestMessage requestMessage = new HttpRequestMessage(new HttpMethod(request.Method.Name()), request.Url);
requestMessage.Version = request.Version;
if (request.Referrer != null)
requestMessage.Headers.TryAddWithoutValidation("referer", request.Referrer);

foreach (string curHeader in request.Headers)
{
    string normalizedHeader = curHeader.ToLower();
    if (normalizedHeader != "content-type")
            //requestMessage.Headers.Add(curHeader, request.Headers[curHeader]);
        if (!requestMessage.Headers.TryAddWithoutValidation(curHeader, request.Headers[curHeader]))
            throw new Exception("Header invalid.");
}

if (request.Payload != null)
{
    requestMessage.Content = request.Payload;
    requestMessage.Content.Headers.ContentType = MediaTypeHeaderValue.Parse(request.ContentType);
}

return requestMessage;

}


The exception occurs on this line

responseMessage = await httpClientToUse.SendAsync(requestMessage, completionOption, (new CancellationTokenSource(timeout)).Token).ConfigureAwait(false);

In the exception handler, I measure the time taken myself to ensure that I wasn't missing something obvious. This is the output of that logging:

[2017/12/08 16:37:43.742 QuarkHttp.QNewHttpClient] ERROR: System.Threading.Tasks.TaskCanceledException thrown (Time taken: 100.0041412s).
[2017/12/08 16:37:43.742 QuarkHttp.QNewHttpClient] DEBUG: URL: https://some.site/url Method: Get
```

One point to note is that this issue occurs across atleast 5 different domains that I know of, but have never seen this issue anywhere in the past. It works as expected on the .NET framework 4.5.x,4.6.x,4.7.x and Mono 4.x.x/5.x.x. However, on .NET Core, the issue occurs very often (many times an hour in my application).

This is why I think this is a framework issue rather than my own error:
In the SendNewAsync method, the line HttpClient httpClientToUse = _defaultHttpClient; causes the reuse of the default HttpClient. When this line is changed to HttpClient httpClientToUse = CreateHttpClient((false, null)); (which is used to initialize the default HttpClient in the first place as seen in the constructor), a new HttpClient is used for every request. This makes things slower, but the issue disappears. It only occurs when the same HttpClient is reused.

I have no code that modifies the HttpClient in any way after it is initially created. In my specific application, the proxy and allowRedirects options are never changed, so the HttpClient that's used wasn't taken from _proxyHttpClients either.

I'm not sure how to debug this issue further. I could definitely test things if anyone has ideas.

My .NET Core version is 2.0.2, and it runs on Ubuntu 16.04.

area-System.Net.Http bug needs more info os-linux

Most helpful comment

Any updates on this issue?

All 58 comments

As an update, the issue appears to occur mostly on requests that have fetchResponse set to false i.e. the HttpCompletionOption.ResponseHeadersRead option is used.

Though there have been a very few exceptions to this.

@shravan2x Can you provide a short small repro app for the issue, with the exact combination of options for which this issue repros. The proxy stuff is not required for the repro right, going by your description, it would be great if only relevant code is there.

@Priya91 I'm trying to get one up, but haven't had success yet. Making the same request over and over (with HttpCompletionOption.ResponseHeadersRead) doesn't lead to the error even after 2000 requests. I believe I need a combination of factors to get this to work, but I'm really not sure which.

I've currently pushed a version of the app that replaces all common occurences of HttpCompletionOption.ResponseHeadersRead with HttpCompletionOption.ResponseContentRead. If the issue does not occur in a few hours, I can confirm that this is one aspect of the problem.

I'll post back when I have more information.

I was finally able to reproduce it!

It has nothing to do with the HttpCompletionOption. It appears to happen because of this:

  1. Send a request to xyz.com
  2. Wait some time (like 10m)
  3. Send another request to xyz.com
  4. An exception is thrown since the HttpClient tries to use the old connection but fails since it was already terminated due to some timeout.

It only happens on Linux.

The correct behavior would be for it to detect closed connections and recreate a new one. I remember @davidsh mentioning that .NET Core on linux uses a CURL wrapper, but Windows uses something else. That might lead to the differing behavior.

This explains why it didn't occur when I used a new HttpClient every time - no attempt was made to reuse old connections.

Reproduction:
```c#
HttpClient hc = new HttpClient();

while (true)
{
Console.Write(DateTime.Now + " ...");
await hc.GetAsync("https://api.steampowered.com/", (new CancellationTokenSource(TimeSpan.FromSeconds(30))).Token);
Console.WriteLine("done");
Console.ReadKey();
}

Run this, wait 5-6 minutes, then enter a key to send another request.

Result on Windows:

12/9/2017 2:12:06 AM ...done
12/9/2017 2:12:07 AM ...done
12/9/2017 2:12:08 AM ...done
12/9/2017 2:18:36 AM ...done


Result on Ubuntu 16.04:

12/9/17 8:12:02 AM ...done

12/9/17 8:12:10 AM ...done

12/9/17 8:12:10 AM ...done

12/9/17 8:18:34 AM ...
Unhandled Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1.ConfiguredTaskAwaiter.GetResult() at System.Net.Http.HttpClient.<FinishSendAsyncBuffered>d__58.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TestLabCore.Experiments.Http.HttpClientTimeoutExperiment.<Play>d__4.MoveNext() in C:\...\Data.cs:line 28 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TestLabCore.Program.<RunExperiment>d__11.MoveNext() in C:...\Data.cs:line 25
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at TestLabCore.Program.

(String[] args)
Aborted (core dumped)
```

The exception occurs after 30s, as expected from the CancellationToken we passed in.

P.S. I made a separate issue for the extra newlines in the Ubuntu output at https://github.com/dotnet/corefx/issues/25809.

I am also having the same problem but after I deploy it to Azure (Azure Container Service, Ubuntu 16.04).
In my case, I am using Docker For Windows and cannot reproduce it in my local environment. So it only happens in Azure.
I thought by using Docker I should be able to reproduce it in local as well ("It works on my machine!").... but it is not the case.
Probably also worth to mention that initially, I was using .NET Core 1.1.4, then I updated to the latest .NET Core 2.0.3.

Run this, wait 5-6 minutes, then enter a key to send another request.

In my case, wait for 4 minutes is enough to reproduce it. (TCP/IP TIME_WAIT 240 seconds?)

So, my current workaround (until the bug is fixed) is, remember last invocation time and then dispose+renew the HttpClient instance after 4 minutes as necessary, so it will still reuse the same instance within 4 minutes since the last invocation.

@jo-ninja I'm not sure a fixed 4 minute window would always work, since (apart from .NET Core's own timeout) different servers may have different connection timeouts.

@shravan2x Thanks for coming up with a more helpful repro. It sounds like a bug, we will investigate.

@karelz @Priya91 This issue is fairly serious, preventing reuse of HttpClients (since there is no guarantee when a connection will be lost, and a full 100s delay occurs when it is). Could a fix be prioritized?

This does seem like a pretty serious issue.

@shravan2x I tried this repro with latest changes from master on ubuntu1704 with curl 7.52.1, and was not able to repro it. Can you try with the latest master bits?

12/14/17 11:13:00 PM ...12/14/17 11:13:00 PM ...Before enqueue: 0
  Before enqueue: 0
  After enqueue: 1
  After enqueue: 1
  Multiagent new worker being created.
  Multiagent new worker being created.
  done
  done

  12/14/17 11:13:39 PM ...Before enqueue: 0
  After enqueue: 1
  Multiagent new worker being created.
  done

  12/14/17 11:28:43 PM ...Before enqueue: 0
  After enqueue: 1
  Multiagent new worker being created.
  done

@Priya91 Does .NET Core use the version of CURL I can find with curl --version?

If not, how would I check this?

@shravan2x
curl -V

I got the same issue.
I'm using docker container base on microsoft/aspnetcore:2.
Curl version of container is 7.52.1.

I'm on 7.47.0.

@Priya91 If you can't reproduce it, it might be something that was fixed on master since at least 2.0.2. However, I can't compile and test at the moment.

I changed the code, do not always new HttpClient, it turns well.

@Priya91 can you check your test on 2.0? It would be good to have a confirmation we are able to reproduce it on 2.0 and the fact we can't reproduce it on master (2.1) means, it is fixed.

@tiayi I don't follow - the code snippet above does not create new HttpClient instances. Just one.

@shravan2x I was not able to repro this for 2.0 as well,

lasekar@ubuntu1704:~/httpclienthang$ dotnet --version
2.0.2

lasekar@ubuntu1704:~/httpclienthang$ dotnet run
12/15/17 11:37:44 PM ...done

12/15/17 11:37:47 PM ...done

12/15/17 11:52:45 PM ...done

12/15/17 11:54:36 PM ...done

12/15/17 11:54:38 PM ...done

@shravan2x Can you share more information about your runtime environment, which linux distro version are you on, is it a docker container? Can you try your code on updated version of curl?

@Priya91

Runtime Environment: .NET Core 2.0.2 (A regular installation, not preview or anything)
Linux: Ubuntu 16.04.3 LTS
Is Container: No

I can't try an updated CURL until next Wednesday or so.

I was not able to repro on a vanilla ubuntu16.04 vm with .netcore 2.0.2

lasekar@ubuntu1604:~/httpclientperf$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

lasekar@ubuntu1604:~/httpclientperf$ dotnet --version
2.0.2

lasekar@ubuntu1604:~/httpclientperf$ dotnet run
12/18/17 8:30:39 PM ...done

12/18/17 8:30:41 PM ...done

12/18/17 9:00:52 PM ...done

12/18/17 9:00:57 PM ...done

@shravan2x Are you able to reproduce on other ubuntu16.04 vms? Looks like your machine could be in corrupted state or someother issue is interferring with your requests.

@Priya91 I've tested on 3 VMs so far. All of them have the issue. It might have something to do with the fact that I upgraded most of these from the 1.0.0 preview versions. I'll try on a vanilla VM and post back.

I was able to reproduce a similar issue with the AWS SDK using HttpClient underneath on Ubuntu 16.04. Details can be found https://github.com/aws/aws-sdk-net/issues/796

@danielmarbach why do you think it is the same problem? There is no concurrency in this case.
Were you able to reproduce the same problem with exactly identical code as posted here?

I was not able to reproduce it on the same machine with the code provided here and not even when taking concurrency into account. When you look at the stack trace the exception is also raised from FinishSendAsyncUnbuffered. I was thinking they might be related.

When you look at the stack trace the exception is also raised from FinishSendAsyncUnbuffered

Almost all exceptions will be from there, that's the method that awaits the execution of the send task that sends the request.

@Priya91 @karelz I'm not able to reproduce this on a vanilla VM. The issue still occurs on my existing machines. I will try to update to 2.0.4 and report back.

I don't think 2.0.4 will make any difference for your case.
It seems to be configuration/environmental problem. Until you can identify what exactly that is or someone else is able to reproduce the problem, sadly there is nothing we can do on our side :(

Closing for now. Feel free to reopen if there is actionable repro / more people hitting the problem.

@karelz One thing is that I have I side-by-side installation of dotnet 1.1.4 on the VMs with the issue. I'll try removing to see if that makes a difference.

@shravan2x there may be even more machine-wide configurations. Removing things one by one is usually good strategy to understand how to build a new VM from scratch to reproduce it. If you succeed, just post more details and let us know, or reopen the issue.

@karelz It doesn't happen reliably even on the same PC. The same program above fails in some cases and passes in others. I'm entirely stumped by this.

In a simple case where it's the only running program, it works as expected. But if I run it when some other .NET Core apps (the real apps) are running, it fails.

Any ideas?

But if I run it when some other .NET Core apps (the real apps) are running, it fails.

Do you mean that even your small repro fails, when other things are running on the box? Or that the code fails only in the context of larger real-world app?

Do you mean that even your small repro fails, when other things are running on the box? Or that the code fails only in the context of larger real-world app?

The code fails only in the context of larger real-world app.

Right now, I'm trying to strip away parts of the larger app to see if that gets me anywhere.

@karelz I have an interesting update to this.

  1. I took one of my new VMs where I was NOT able to reproduce the issue - i.e. it seemed to be working fine. The latest dotnet-sdk-2.1.4 was installed. I ran the test app and everything worked correctly.
  2. I moved my "larger app" to this new machine and let it run. It crashed at some point overnight due to the same issue as before.
  3. Now I ran my test app again and voila - it fails.

image

Note that this is the new, vanilla VM ^.

So I can confirm that it is an issue with the .NET Core runtime itself and not simply a corruption problem. Can you provide ideas to isolate the issue?

It is weird/suspicious that step [2] is needed to reproduce it.
If you have steps that lead to repro on any new VM and you can share them, we could take a look.

Any update or work-around on this?

I'm experiencing this issue with dotnet core runtime v2.0.4 running on Raspberry Pi with Debian 9 (Stretch).

Message: Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Net.Http.HttpClient.d__58.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

@1iveowl I've been trying to find a workable repro for months with no luck.

I just switched to Mono for now. It seems to work correctly.

@karelz Could we leave the issue open to let people know that it hasn't been fixed?

@1iveowl if you have a repro, or something we can reproduce here and investigate, we would be happy to look at it as I mentioned above.
The callstack is sadly not sufficient and @shravan2x's repro/steps were insufficient for us to reproduce locally and investigate further.

@karelz It's a tough nut. I tried to create dumbed down repo, but I can only provoke the error when I try and connect to a web service running on a particular server running in my local environment. The server is not exposed on the internet.

Is it possible to try to replicate the server setup somewhere on the internet (e.g. Azure)? We could run small repro against existing server.
The other option is to investigate it on your side - catch it under debugger, instrument CoreFX code to add additional logging, etc.

Would it be so far off to try my AWS repo? It might still lead to some insights

It should not matter where it is hosted as long as we can reproduce the problem. It is primarily about the client anyway.

@danielmarbach AFAIK your repro used AWS SDK - that is something I would like to remove from the repro (removing the chance it is some weirdness introduced by the SDK).
If the server is hosted in AWS or Azure does not matter as soon as the main app to debug is local repro (with "minimal" dependencies).

We encountered the same issue on Windows Server 2008 R2.

System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<FinishSendAsyncBuffered>d__58.MoveNext()

I reproduced the issue on a ubuntu 16.04 x64 vm with runtime 2.0.6 and sdk 2.1.101.

The issue came out with the default completionOption parameter of HttpClient.GetAsync method which is HttpCompletionOption.ResponseContentRead. When I changed it to HttpCompletionOption.ResponseHeadersRead, everything goes fine.

It seems to have something to do with the call of LoadIntoBufferAsync on the response's content in HttpClient.FinishSendAsyncBuffered method but I have not figured it out how. The direct cause of the timeout should be that Interop.Http.MultiInfoRead in CurlHandler.WorkerBodyLoop method always returns false when called after last request for 4-5 min (LoadIntoBufferAsync has been called at least once).

@lukazh do you have a simplified repro you can share?

@karelz FYI. No big difference from what shravan2x provided.
````
static HttpClient hc = new HttpClient();

static void Main(string[] args)
{
while (true)
{
Get();
Console.ReadKey();
}
}

static async void Get()
{
try
{
Console.Write(DateTime.Now + " ... ");
var r = await hc.GetAsync("https://api.steampowered.com/", /HttpCompletionOption.ResponseHeadersRead,/
(new CancellationTokenSource(TimeSpan.FromSeconds(30))).Token);
Console.WriteLine("done");
}
catch(Exception e)
{
Console.WriteLine(e.Message);
Console.WriteLine(e.StackTrace);
}
}
````
I will try it out with 2.0.3 later to check if the issue comes out again.

@lukazh we were not able to reproduce the original problem in house or on vanilla VM (see https://github.com/dotnet/corefx/issues/25800#issuecomment-352556764 and https://github.com/dotnet/corefx/issues/25800#issuecomment-354729309).
If you have a small repro that reproduces on vanilla install, please let us know. We can try to reproduce it again.

@karelz If this is too difficult to reproduce on a vanilla VM, perhaps we could try to reproduce in a container, freeze it and upload?

@shravan2x that is a good idea. It would be still good to minimize the steps which are installed / executed on the container prior to the repro.
I am very curious how @lukazh's repro steps look like for Ubuntu VM.

I ran some more tests on the same VM with different URLs and the exception never came out. Even with the URL I used to repro it can't ensure to run into the issue.
I captured the requests sent to https://api.steampowered.com/. The request that sent first established a connection with the server and the others reused the connection. After waiting for a few minutes, when the issue is likely to be encountered, a request was sent out, which tried also to reuse the connection. However, due to the time passed, the connection's status had already changed no matter how and the server just hung there for a while(a few seconds, unstable) after sending a ACK, and then sent a RST leading to a new connection. If the Timeout of CancellationToken of the task was shorter than the idle time, the task would be canceled and TaskCanceledException would be thrown. So I guess the cause of this issue is that the request is indeed incomplete within the given time for some reasons(network situation, OS environment, etc) instead of HttpClient itself. @shravan2x maybe you can make the timeout a little longer to see if the server will eventually send response.
BTW, the reason why it runs well when completionOption set to HttpCompletionOption.ResponseHeadersRead is that each request establishes a new connection for its own, which is also weird. @karelz do you know something about this?

@lukazh My timeout was 100s for an endpoint that usually completes <1s. I'm fairly confident waiting longer wouldn't change much.

Any updates on this issue?

Any updates here. Our API service (ASP.Net API) runs on Azure Classic App Service and out of every 100000 request 200 request fail with Internal Server Error with same exception. Please assist

TaskCanceledException: A task was canceled.. ExceptionDetails : System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Web.Http.Controllers.ApiControllerActionInvoker.<InvokeActionAsyncCore>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Web.Http.Controllers.ActionFilterResult.<ExecuteAsync>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Web.Http.Controllers.ExceptionFilterResult.<ExecuteAsync>d__0.MoveNext()

[EDIT] Format call stack by @karelz

@dshevani your callstack seems to be from .NET Framework (System.Web.* usage), not from .NET Core. Moreover it does not seem to use HttpClient at all, so it is likely not related to this issue. For .NET Framework help, you can use community channels (like StackOverflow), or if you have a repro / specific bug report, then VS Developer Community.

@sepehr1014 in general, we are still blocked on getting anything actionable here - repro, or deeper analysis/debugging from a repro.

The team I work with has been observing this issue on a host of .NET Core services we have - all of them have had the 100s timeout on various calls with no correlation to call size and minimal correlation to load. We also experienced the frustration of low reproducibility and inability to get the issue to occur in our dev environments. I suspect part of that stemmed from developing on Windows boxes, but our services are running in containers on Linux.

Luckily we have seen the issue resolve when we built against .NET Core 2.1 RC. I haven't had the time to completely isolate the issue, but since was present when we built against 2.1 Preview 1, we believe it was related to the fact that since preview 2, the default is to use the SocketsHttpHandler instead of the native handler (which uses libcurl). After almost 3 days running so far we haven't seen any timeouts, so we're pretty confident it's resolved.

@booradlus that's great news, thanks for confirmation that 2.1 RC helps!

@karelz I had the exact same issue on Debian Stretch Docker images. Roughly 1-3 calls in a 1000 from a static HttpClient would randomly fail with the exact same Task Canceled / timeout exception, with the same stacktrace (ending with FinishSendAsyncBuffered).

The HTTP call would never leave the box, so the target service never even recorded an incoming request. It was happening fairly consistently, but impossible to reproduce on demand though, and appeared to happen with no apparent behavior pattern.

The problem disappeared after upgrading to the .NET Core 2.1 RC runtime, so it might have been some issue with the curl handler.

do you have stanalone simple repro @filipw ? Something I can just run and get repro?
As far as the handler: You can force it back and forth via environmental variable.

Thanks @filipw for letting us know it does not reproduce for you on .NET Core 2.1 RC anymore!

The root-cause might have been in libcurl (specific version), or how we interact with it in CurlHandler. We introduced SocketsHttpHandler (default in 2.1) to help us address exactly this class of problems and make behaviors on all platforms uniform. It's good to see it is paying off.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yahorsi picture yahorsi  路  3Comments

omajid picture omajid  路  3Comments

GitAntoinee picture GitAntoinee  路  3Comments

aggieben picture aggieben  路  3Comments

EgorBo picture EgorBo  路  3Comments