Runtime: WinHttpExceptions resulting in OutOfMemoryException

Created on 4 Oct 2017  路  27Comments  路  Source: dotnet/runtime

Possibly related to dotnet/runtime#22516

A few days ago our Azure website crashed because of an OutOfMemoryException. This happened during peak load of our application (.NET Core 1.2). Looking at the stdout logs, I saw a pattern where requests were starting to fail:

System.Net.Http.HttpRequestException: Error while copying content to a stream. ---> System.IO.IOException: The read operation failed, see inner exception. ---> System.Net.Http.WinHttpException: The connection with the server was terminated abnormally
   --- End of inner exception stack trace ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.StreamToStreamCopy.<CopyAsyncAnyStreamToAnyStreamCore>d__3.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpContent.<LoadIntoBufferAsyncCore>d__48.MoveNext()
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpContent.<LoadIntoBufferAsyncCore>d__48.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<FinishSendAsync>d__58.MoveNext()

Eventually the dotnet process crashed, so our website was down. Looking at the logs the last entry was an OutOfMemoryException:

fail: Microsoft.AspNetCore.Server.Kestrel[0]
      UvAllocCb
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at Microsoft.AspNetCore.Server.Kestrel.Internal.Infrastructure.MemoryPoolSlab.Create(Int32 length)
   at Microsoft.AspNetCore.Server.Kestrel.Internal.Infrastructure.MemoryPool.AllocateSlab()
   at Microsoft.AspNetCore.Server.Kestrel.Internal.Infrastructure.MemoryPool.Lease()
   at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.SocketInput.IncomingStart()
   at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.Connection.OnAlloc(UvStreamHandle handle, Int32 suggestedSize)
   at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.Connection.<>c.<.cctor>b__47_1(UvStreamHandle handle, Int32 suggestedsize, Object state)
   at Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvStreamHandle.UvAllocCb(IntPtr handle, Int32 suggested_size, uv_buf_t& buf)
fail: Microsoft.AspNetCore.Server.Kestrel[0]
      Disposing listeners failed

I suspect this is not a coincidence and the two are somehow related. I was looking at how these WinHttpExceptions might cause an OutOfMemoryException. Only thing I can come up with, is not disposing of HttpRequestMessage and HttpResponseMessage. It seems there is a bit of confusion about this. Am I correct that in .NET Core 1.2 the HttpRequestMessage is disposed automatically after calling HttpClient.SendAsync? Or is there no relation between the exceptions and the OutOfMemoryException?

The other question is, what could cause the WinHttpExceptions? I'm quite sure we use the HttpClient correctly (we use a single instance for all requests). Only thing it does are REST calls to another Azure web app, also .NET Core 1.2. I am looking for a way to debug this properly.

All 27 comments

System.Net.Http.WinHttpException: The connection with the server was terminated abnormally

I think the root problem is that the WinHttpHandler is getting these exceptions because the server is terminating the connection. That is what this exception above is saying.

So, it appears that there are issues in the Kestrel server, perhaps, that cause the connections to the client (HttpClient) to get aborted. Eventually, the problems with Kestrel get worse and there is an OutOfMemory issue with Kestrel.

But I do not think that HttpClient/WinHttpHandler is causing the exceptions in the Kestrel server. Rather, it's quite possible that problems in the Kestrel server are causing the connection to be aborted and thus these WinHttpHandler exception are "by-design" because the server is aborting.

Can you repro HttpClient/WinHttpHandler exceptions without using Kestrel?

Can you repro HttpClient/WinHttpHandler exceptions without using Kestrel?

Unfortunately not. This happens only in production, when load is high. We are not able to reproduce it on a test environment running a simple load test. And I have no idea what the impact would be of switching to WebListener. So this is not something we can easily do, if at all.

Would a memory dump give some insights? Would we have to look in the w3wp or dotnet process?

/cc @shalter @cesarbs

Would a memory dump give some insights? Would we have to look in the w3wp or dotnet process?

A dump of the dotnet process would help.

@davidfowl You pinged the wrong Halter, Stephen is @halter73 馃槃

@halter73 @davidfowl @cesarbs

So the issue happened again and I got a good memory dump. No OutOfMemoryException this time, but I do see a lot of WinHttpExceptions. Using WinDbg I can see many threads (400+) with the following stacktrace:

Frame: 5835c890] 
5835c92c 7706081c [HelperMethodFrame_1OBJ: 5835c92c] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
5835c9ac 6a6b3203 System.Threading.Monitor.Wait(System.Object, Int32, Boolean)
5835c9bc 6a6b321c System.Threading.Monitor.Wait(System.Object, Int32)
5835c9c0 6a76adc3 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
5835ca14 6a73a6ac System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken)
5835ca54 6a73a579 System.Threading.Tasks.Task.InternalWait(Int32, System.Threading.CancellationToken)
5835cab0 6a778b66 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
5835cabc 1ee81b0b Microsoft.AspNetCore.Mvc.Rendering.HtmlHelperPartialExtensions.Partial(Microsoft.AspNetCore.Mvc.Rendering.IHtmlHelper, System.String, System.Object, Microsoft.AspNetCore.Mvc.ViewFeatures.ViewDataDictionary)
5835cad8 1ee88f48 Microsoft.AspNetCore.Mvc.Rendering.HtmlHelperPartialExtensions.Partial(Microsoft.AspNetCore.Mvc.Rendering.IHtmlHelper, System.String, System.Object)
5835caec 1eea78f3 AspNetCore._Views_Shared_Molecules_SiteNav_MainNav_cshtml+d__41.MoveNext()
5835cb28 1eea7344 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[AspNetCore._Views_Shared_Molecules_SiteNav_MainNav_cshtml+d__41, Nedwin.Staatsloterij.PrecompiledViews]](d__41 ByRef)
5835cb58 1eea72da AspNetCore._Views_Shared_Molecules_SiteNav_MainNav_cshtml.ExecuteAsync()
5835cb94 1eebf73b Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderPageCoreAsync(Microsoft.AspNetCore.Mvc.Razor.IRazorPage, Microsoft.AspNetCore.Mvc.Rendering.ViewContext)
5835cbac 1eeba653 Microsoft.AspNetCore.Mvc.Razor.RazorView+d__14.MoveNext()
5835cbec 1eeba34c System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib]].Start[[Microsoft.AspNetCore.Mvc.Razor.RazorView+d__14, Microsoft.AspNetCore.Mvc.Razor]](d__14 ByRef)
5835cc20 1eeba2de Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderPageAsync(Microsoft.AspNetCore.Mvc.Razor.IRazorPage, Microsoft.AspNetCore.Mvc.Rendering.ViewContext, Boolean)
5835cc80 1eeba0b4 Microsoft.AspNetCore.Mvc.Razor.RazorView+d__13.MoveNext()
5835ccb0 1eeb9fd4 System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.AspNetCore.Mvc.Razor.RazorView+d__13, Microsoft.AspNetCore.Mvc.Razor]](d__13 ByRef)
5835cce0 1eeb9f6d Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderAsync(Microsoft.AspNetCore.Mvc.Rendering.ViewContext)
5835cd28 1ee8223a Microsoft.AspNetCore.Mvc.ViewFeatures.HtmlHelper+d__60.MoveNext()
5835cd60 1ee8208c System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Microsoft.AspNetCore.Mvc.ViewFeatures.HtmlHelper+d__60, Microsoft.AspNetCore.Mvc.ViewFeatures]](d__60 ByRef)
5835cd90 1ee82027 Microsoft.AspNetCore.Mvc.ViewFeatures.HtmlHelper.RenderPartialCoreAsync(System.String, System.Object, Microsoft.AspNetCore.Mvc.ViewFeatures.ViewDataDictionary, System.IO.TextWriter)
5835cdf0 1ee81d8e Microsoft.AspNetCore.Mvc.ViewFeatures.HtmlHelper+d__57.MoveNext()
5835ce24 1ee81c74 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib]].Start[[Microsoft.AspNetCore.Mvc.ViewFeatures.HtmlHelper+d__57, Microsoft.AspNetCore.Mvc.ViewFeatures]](d__57 ByRef)
5835ce58 1ee81c06 Microsoft.AspNetCore.Mvc.ViewFeatures.HtmlHelper.PartialAsync(System.String, System.Object, Microsoft.AspNetCore.Mvc.ViewFeatures.ViewDataDictionary)
5835ceb4 1ee81aef Microsoft.AspNetCore.Mvc.Rendering.HtmlHelperPartialExtensions.Partial(Microsoft.AspNetCore.Mvc.Rendering.IHtmlHelper, System.String, System.Object, Microsoft.AspNetCore.Mvc.ViewFeatures.ViewDataDictionary)
5835ced0 1ee88f48 Microsoft.AspNetCore.Mvc.Rendering.HtmlHelperPartialExtensions.Partial(Microsoft.AspNetCore.Mvc.Rendering.IHtmlHelper, System.String, System.Object)
5835cee4 1eea3f40 AspNetCore._Views_Shared_Components_SiteNavBar_Default_cshtml+d__50.MoveNext()

The MainNav_cshtml looks like this:

@model SomeModel

<main-nav inline-template>
    <nav class="main-nav" v-bind:class="{ 'sub-nav-open' : subNavIsOpen }" v-on:click.stop="onClick">
        <ul class="items-list">
            @for (int i = 0; i < Model.MainNav.Items.Count; i++)
            {
                string ariaId = "sub-nav-items-" + i;

                <li class="item">
                    <button class="btn-toggle-sub-nav" :aria-expanded="openSubNav == '@ariaId' ? 'true' : 'false'" aria-controls="@ariaId" v-on:click="toggleSubNav('@ariaId')">@Model.MainNav.Items[i].LabelText</button>
                    @{
                        ViewData["ariaId"] = ariaId;
                    }
                    @Html.Partial("Molecules/SiteNav/MainSubNav", Model.MainNav.Items[i])
                </li>
            }
        </ul>
    </nav>
</main-nav>

So we call the MainSubNav Partial, guess that corresponds with the HtmlHelperPartialExtensions.Partial call in the dump. That Partial is slightly larger, I won't share it publicly. I can say however that we invoke a ViewComponent in there (which performs an Http call every minute if cache expires) and use a Taghelper. But I assume that if these components would be to blame, I could see it in the dump, right? How I read the dump is that the Partial itself throws an exception and then it blocks the thread?

The only thing we are doing wrong (I guess) is that MainNav.cshtml (the code above) is rendered by an async ViewComponent, but in turns calls Html.Partial (not async) but that partial calls an async funtion. Not sure if that is the root cause though. I'm also not sure yet how this relates to the WinHttpExceptions. Maybe threads are getting blocked because of the sync / async issue?
Which in turn causes Http calls to fail because of thread starvation? I really want to find the faulty line of code. Thanks.

It looks to me like there is an async hang when rendering the MainSubNav. This very well could be explained by long-running HttpClient requests. You can verify this by checking if there are a large number of Task<T> objects using !dumpheap -stat where T is some HttpClient-related type.

Ensure the Tasks holding up the calls to @Html.Partial(MainSubNav... completes quickly, and you should be golden. It's also probably better to call @await Html.PartialAsync instead of @Html.Partial to ensure running Tasks don't block threads. When each running Task uses up an entire thread/stack's worth of memory, you can run out of memory pretty quick.

This is more @NTaylorMullen's area, so he might have some corrections.

The ManNav_cshtml looks fine. Any shenanigans going on in your MainSubNav partial?

Thanks for your time, really appreciate it. I can share dumps, give access to our repositories, anything. This issue where our whole production environment is only serving ugly 503 errors has been going on for way too long (months) :-)

@halter73 I checked the dump and could see this:

1bf4ab40 1420 90880 System.Net.Http.HttpClient+<FinishSendAsync>d__58

So 1420 objects, this is quite a lot I guess, but also not really surprising since our application uses a lot of external API's. I don't see many HttpClient instances itself in the dump, which is correct since we use it as a singleton.

@NTaylorMullen Ok so here is the source code of MainSubNav. As you can see we invoke a component (which does do an HttpRequest and caches the result for a minute) but I would assume that we would see many threads stuck in the component instead of the partial view. The same goes for the taghelpers that we use in the partial. Instead it seems to go wrong after the call to HtmlHelperPartialExtensions.Partial so I read that as something in the view itself. I could be wrong though.

@model MainNavItem

<nav class="main-sub-nav" id="@ViewData["ariaId"]" :aria-hidden="openSubNav == '@ViewData["ariaId"]' ? 'false' : 'true'">
    <div class="sub-nav">
        <div class="site-wrap grid-wrapper">
            @foreach (var item in Model.Items)
            {
                <section class="grid">
                    <h1>@item.CategoryName</h1>

                    <ul>
                        @if (item.ShowLiveData) {
                            @await Component.InvokeAsync(nameof(LiveDataComponent))
                        }

                        @if (item.SubPages != null)
                        {
                            foreach (var link in item.SubPages)
                            {
                                <li>
                                    @* TagHelpers *@
                                    <cmslink to="link" class="link-with-icon"><svg-icon-razor name="arrow-next"></svg-icon-razor> @link.Text</cmslink>
                                </li>
                            }
                        }
                    </ul>
                </section>
            }

            @if (Model.NavImage != null)
            {
                <section class="grid upsell-image">
                    <cmslink to="Model.NavImage.Link"> @* Another taghelper *@
                        @*Added empty gif so no image is loaded on mobile/tablet*@
                        <img src="@AssetUrlHelper.UrlForAssetId(Model.NavImage.Image.Id, 225)"
                             srcset="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw== 1w, @AssetUrlHelper.UrlForAssetId(Model.NavImage.Image.Id, 225) 225w, @AssetUrlHelper.UrlForAssetId(Model.NavImage.Image.Id, 450) 450w"
                             sizes="(min-width: 1024px) 225px, 0vw"
                             alt="@Model.NavImage.Image.Alt" />
                    </cmslink>
                </section>
            }
        </div>

        <footer>
            <h1 class="label-link-overview">@Model.LabelText</h1>
            <nav class="site-wrap grid-wrapper">
                <div class="grid one-whole">
                    <cmslink to="Model.MainItemLink" class="link-with-icon"><svg-icon-razor name="arrow-next"></svg-icon-razor> @Model.MainItemLink.Text</cmslink>
                </div>
            </nav>
        </footer>
    </div>
</nav>

From the Razor perspective everything seems in place.

@razzemans Can you show the outgoing http call?

@davidfowl I'd rather not share the URL publicly. I can say upfront that it is a very simple GET request with no payload. Total size of the response is about 1 or 2 KB, with an average response time of 100ms.
Most of the WinHttpExceptions happen in other parts of our application.
Fact is that so many threads are hanging in this razor view, according to the dump. Again I fail to see a connection between it and those exceptions, but maybe it just a coincidence and the exceptions are more naturally under our peak load. We depend on third parties and use a cancellation token in our SendAsync requests that we limit at 10 seconds.
I'm not sure how to continue from here. We've been dealing with these CGI errors for months. It is quite possible that we do _something_ stupid, _somewhere_, but we've reviewed the code so many times and never found anything. Anyone care to have a look at our repository? 馃槃

All the HttpClient.FinishSendAsync display classes on the heap lead me to believe that you're running into threadpool starvation. You have a bunch of concurrent SendAsync calls, 400+ of which that are blocking threadpool threads because the call to @Html.Partial(MainSubNav... is not async.

This issue is exacerbated because the continuation in HttpClient.FinishSendAsync has very few or no threadpool threads to continue on and complete the call to SendAsync. So the HTTP request might very well be completing reasonably quickly, but the SendAsync might still be taking a very long time just because there are simply no threads to available complete on.

So the longer SendAsync takes, the less threads are available because more time is spent in Html.Partial. And the less threads that are available, the longer SendAsync takes because it has to wait longer to find a thread available for the FinishSendAsync continuation to run on. Simply put, it's a vicious cycle very similar to the one I describe in this comment where I compare it to a deadlock.

I think changing the call to @await Html.PartialAsync(MainSubNav... might fix your entire problem.

@razzemans I just wanted to see the callsite but @halter73 beat me to the punch. I was going to say that something is probably doing a .Result on the outgoing http call.

Thanks all. We will scrutinize our entire codebase for mixing up rendering razor views using Partial and PartialAsync and fix any issues there.Then we'll have to wait and see if the issue is fixed or not. Unfortunately this always takes a while as it happens only during peak load which, for our platform, happens only once every month.
I'd really wish there was some static analyzer to warn developers when they accidentally mix these things up in their code, especially if it might lead to such errors. It's such an easy mistake to make.
I will report back in a few weeks tops :-)

@razzemans Also be careful using singleton HttpClients. It's nice because you get connection reuse, but you're then capped at two concurrent connections/requests.

It's nice because you get connection reuse, but you're then capped at two concurrent connections/requests.

That is not true with .NET Core. The default number of concurrent connections per server is much higher than that (int.MaxValue).

And with .NET Framework, you can adjust that using ServicePointManager.DefaultConnectionLimit property.

I'd really wish there was some static analyzer to warn developers when they accidentally mix these things up in their code, especially if it might lead to such errors. It's such an easy mistake to make.

馃憤

@davidfowl @halter73 We had another occurrence, this time the culprit seemed to be the Console logging:

blocking_logging

Now it could quite be that we could leave out the Console Logging, but I'm rather surprised this is apparently blocking? Any thoughts? This happened during the night, there was practically no load on the platform, which makes it even more interesting, as it would also mean there is hardly anything to log.

The console logger does block if the processing loop falls behind. Are you logging everything? What log filters do you have enabled?

Side note: This bug seems unrelated to the actual title. It would be more useful if there were more specific issues filed in the appropriate places

Sorry about the title, but I already had many issues open and didn't want to create many more.

This is our current logging config:

  "Logging": {
    "IncludeScopes": false,
    "LogLevel": {
      "Default": "Warning",
      "System": "Warning",
      "Microsoft": "Warning"
    }
  }

For one application we have the following in the Startup:

loggerFactory.AddConsole(Configuration.GetSection("Logging"));
loggerFactory.AddDebug();
loggerFactory.AddApplicationInsights(
    (TelemetryClient)app.ApplicationServices.GetRequiredService<TelemetryClient>(),
    Configuration.GetValue<LogLevel>("Logging:LogLevel:Default"));

Again though, at the time the issue started it was the middle of the night so there was practically no load on our platform. I've not seen this specific issue before even when there is more load on the platform.

If you want me to start a new issue for this case, please let me know, I will.

If you want me to start a new issue for this case, please let me know, I will.

This bug doesn't look like a bug in CoreFx HttpClient components. So, please open a new issue in ASP.NET repo.

I will report back in a few weeks tops :-)

@razzemans any update on whether this solved your issue?

@zeiphon you struggling with the same thing?

I am carefully optimistic that the Razor view sync / async was indeed the cause of our problems and that it is now fixed. I won't pop open the champagne quite yet as we usually only run into this issue during peak load (which is once a month), and we've only experienced peak load since our fix once, so it can be a coincidence.

This doesn't fix the logger issue which I reported later, which I hope was just a glitch and won't see again anytime soon.

For anyone interested (@zeiphon ?): the logger issue was caused by having stdout logs enabled in the web.config. We disabled this and timeouts are no longer occurring. This only happened during extremely high load, but it happened nonetheless.

@Html.Partial still can cause downtime in 1.1.5. Is there any reason to ever use this as opposed to @await Html.PartialAsync? Its a long road to find out what the issue is :) . Could @Html.Partial be changed to something like @Html.PartialDangerous?

Was this page helpful?
0 / 5 - 0 ratings