We have an asp.net core 2.1 app running on .net 4.7 in azure web app service.
Recently we started getting A LOT of the following error:
Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.
Our app is under pretty constant heavy load (from incoming webhooks mostly). During big spikes, we tend to see more of these.
We used to run on asp.net core 2.0 / .net 4.6.1 for many months and we never saw that error before.
It seems to have started happening following our recent upgrade to asp.net core 2.1 / .net 4.7.
We'd like to get to the bottom of this issue, but we are not even sure where to start.
Any thoughts?
@sebastienros have you worked up an FAQ wiki yet?
I wonder if it's because if thread pool starvation... Is your application fully async? Do you have any other logs from Kestrel? Can you capture a dump or a profile? Is the application CPU bound or IO bound?
It just happened a couple of times just earlier, usually it's happening for several successive requests in a short time.
I'm monitoring cpu, memory, thread count and a bunch of other metrics in Azure monitor and it all looks healthy.
So even when there is no apparent bottleneck, and the app is very responsive, it can still happen occasionally.
Yes it is fully async.
Do you have any other logs from Kestrel?
No other warnings/errors. Probably info level but I'm not persisting those, although there is probably a way to get these.
Can you capture a dump or a profile?
I should be able to capture a dump from the azure portal if you think this will help?
Not sure what a profile is?
Any thoughts on how to get to the bottom of this issue?
We're not sure if the errors are caused by legit requests.
Is it possible find out the url that was requested when this message gets generated?
We have been getting a couple of hundreds of these per day, even though all metrics look healthy.
Any particular log source that I can turn on in azure/asp.net core maybe?
The stack trace doesn't give us much.
Exception: Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.
at Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException.Throw(RequestRejectionReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.d__4.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
at System.IO.Pipelines.Pipe.TryRead(ReadResult& result)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.OnConsumeAsync()
There was just one log entry with a slightly better call stack including some of our calling code.
The stack is pointing to our webhook action.
In order to verify the webhook is valid, we must hash the request body and we use the following code.
Is there anything wrong with this code?
[HttpPost]
public async Task<OkResult> Webhook()
{
var memoryStream = new MemoryStream();
await Request.Body.CopyToAsync(memoryStream);
bool isValidRequest = await AuthorizationService.IsAuthenticWebhook(Request.Headers, memoryStream, _shopifyLibOptions.SecretKey);
if (!isValidRequest)
{
throw new UnauthorizedAccessException("This request is not an authentic webhook request.");
}
memoryStream.Position = 0;
string body = new StreamReader(memoryStream).ReadToEnd();
//omitted code here unrelated to asp.net
return Ok();
}
The call stack is pointing to the line
bool isValidRequest = await AuthorizationService.IsAuthenticWebhook(Request.Headers, memoryStream, _shopifyLibOptions.SecretKey);
Although in my experience, line numbers in call stack are sometimes a bit off and I wonder if the real issue is the line before await Request.Body.CopyToAsync(memoryStream);
Then the stack below is
Exception: Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.
at Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException.Throw(RequestRejectionReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.d__4.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
at System.IO.Pipelines.Pipe.GetReadAsyncResult()
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.d__22.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.d__23.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at LINE FROM OUR CODE
.ReadToEnd();
is a red flag, that's blocking sync IO. This data rate error has often been reported in applications that are having trouble with too many blocked threadpool threads. In this case there's an easy ReadToEndAsync()
replacement. Have you checked for thread starvation issues elsewhere?
But the stream here is a memory stream. That's why it's using the sync version. No I/O right?
The number of threads is low and metrics are healthy.
This issue seems quite sporadic but we get these errors in batches of 5 or 10...
Could it be caused by the GC causing a pause?
Ah, you got me. I wasn't reading closely enough.
How big are the posts?
They are pretty small json payloads. Order, Customer, Product, this kind of thing...
What size is pretty small?
Typically under 10k characters, although there a few outliers over 100k characters.
We get a lot of requests though. Roughly 500k per day.
Are there any further logs that can be turned on to get to the bottom of this issue?
This is really polluting our log files and we're still not sure why it occurs in the first place.
We appear to have gotten rid of the issue.
We are not entirely sure but most likely it was caused by some code accessing the request object from a background thread.
We are not entirely sure but most likely it was caused by some code accessing the request object from a background thread.
Using the IHttpContextAccessor
? What did that code look like?
Sorry I had to re-open.
I thought it was fixed after our deployment on Sunday evening, but the absence of errors was simply due to the very low activity at this time of the week.
Once Monday kicked in, this error started flooding again.
David, that deployment included a fix related to the IHttpContextAccessor that you help me resolved here. I incorrectly thought this might have fixed this issue as well.
The call stack varies slightly when the error is captued in the log. Usually only asp.net frames, but it also sometimes happen in this new piece of code, which is very similar to the one I posted above handling webhooks:
Simplified a bit for clarity:
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Http.Extensions;
using Microsoft.AspNetCore.Http.Internal;
using Microsoft.Extensions.Logging;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Middlewares
{
public class RequestLogScopeMiddleware
{
private readonly RequestDelegate _next;
private readonly ILogger<RequestLogScopeMiddleware> _logger;
public RequestLogScopeMiddleware(RequestDelegate next, ILogger<RequestLogScopeMiddleware> logger)
{
_next = next;
_logger = logger;
}
public async Task Invoke(HttpContext context)
{
using (_logger.BeginScope(await GetHttpInfo(context)))
{
await _next(context);
}
}
private async Task<HttpInfo> GetHttpInfo(HttpContext ctx)
{
try
{
var req = ctx.Request;
req.EnableRewind();
string body = await new StreamReader(req.Body).ReadToEndAsync();
req.Body.Position = 0;
try
{
body = JsonConvert.SerializeObject(JsonConvert.DeserializeObject(body), Formatting.Indented, new JsonSerializerSettings { NullValueHandling = NullValueHandling.Ignore });
}
catch
{
//body may not be json
}
return new HttpInfo(req.Method, req.GetDisplayUrl(), body, string.Join(Environment.NewLine, req.Headers.Select(kv => $"{kv.Key}: {kv.Value}")));
}
catch (Exception ex)
{
_logger.LogWarning(ex, "Failed to extract http info from request");
return null;
}
}
}
}
Logged error:
Failed to extract http info from request
HttpInfo:
Exception: Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.
at Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException.Throw(RequestRejectionReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.d__4.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
at System.IO.Pipelines.Pipe.TryRead(ReadResult& result)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.OnConsumeAsync()
I'm pretty sure that it is related to await new StreamReader(req.Body).ReadToEndAsync();
Here is another stack pointing clearly showing the reading of the stream is the issue:
Error from Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware:
An unhandled exception has occurred while executing the request.
HttpInfo:
Exception: Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.
at Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException.Throw(RequestRejectionReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.d__4.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
at System.IO.Pipelines.Pipe.ReadAsync(CancellationToken token)
at System.IO.Pipelines.Pipe.DefaultPipeReader.ReadAsync(CancellationToken cancellationToken)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.d__21.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.d__21.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.AspNetCore.WebUtilities.FileBufferingReadStream.d__35.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.IO.StreamReader.d__97.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.IO.StreamReader.d__62.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Middlewares.RequestLogScopeMiddleware.d__5.MoveNext()
@halter73 any ideas?
PS: are you using that middleware in production? It’s looks terribly inefficient.
Yes we are using this in prod.
We haven't suffered any performance issues but I'm listening if you have ideas on how to make this more performant.
Having a log of the body is very useful for investigating why a given request had errors...
Having a log of the body is very useful for investigating why a given request had errors...
Do you have structured logging in place? You don't need to attach this much information to the scope, you just need enough information attached to every request so that you can find all correlated logs. You would log the body in chunks and then look at the entire payload by looking at all logs for a specific request id.
We haven't suffered any performance issues but I'm listening if you have ideas on how to make this more performant.
It's a DoS waiting to happen. I can just send a large payload to your application and boom, out of memory. (see https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/blob/master/Scenarios/Controllers/BigJsonInputController.cs for an example). It'll also be one of the ASP.NET Core no-no (see https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/blob/master/AspNetCoreGuidance.md#avoid-reading-the-entire-request-body-or-response-body-into-memory)
Having a log of the body is very useful for investigating why a given request had errors...
Yes, I've seen it enough times now that I think we should build something more efficient out of the box https://github.com/aspnet/AspNetCore/issues/3700
Do you have structured logging in place? You don't need to attach this much information to the scope, you just need enough information attached to every request so that you can find all correlated logs. You would log the body in chunks and then look at the entire payload by looking at all logs for a specific request id.
I guess you could say it is structured, yes.
Logging the body in chunks? I'm not too sure what you mean?
Does that mean that I have to reconstruct the body from multiple log entries?
It certainly would be much simpler to have a single entry with the full body.
Also, the vast majority of requests we get are webhooks, that can take various shape (but always in json format) and we must validate that they are authentic by hashing the body.
(see the Webhook()
method I posted above)
So we can't use [FromBody]
because the payload can differ a lot from request to request.
It's a DoS waiting to happen. I can just send a large payload to your application and boom, out of memory.
I see how this could be a problem.
Should I use something like BigContentManualGood()
from your linked sample?
As far as I can tell it still loads the entire json in memory, into a JObject, so I fail to see how this is different?
I guess there must be some sensible max-request-size limit default as well in asp.net core?
Going back to the original issue, I just wanted to point out that the pattern we're seeing is that these errors occur in batch.
There won't be any issues from a while (5 minutes or even up to a couple of hours) and then all of sudden we get many times that error in a rapid succession (say between 3 and 10+).
That may be a useful piece of information.
This got me thinking that it might be caused by a longer that usual GC pause... Just an idea...
What is there a difference between the following two?
await new StreamReader(Request.Body).ReadToEndAsync();
await new HttpRequestStreamReader(Request.Body, Encoding.UTF8).ReadToEndAsync();
We deployed yet again today to try to fix this.
We do not collect the body anymore when logging, based on @davidfowl 's good comments about perf and security.
We removed all usages of HttpAccessor and we are left with the only other piece of code that reads the entire body and causes the error, in our Webhook()
action, which now looks like this:
[HttpPost]
public async Task<OkResult> Webhook()
{
var body = await new HttpRequestStreamReader(Request.Body, Encoding.UTF8).ReadToEndAsync();
bool isValidRequest = AuthorizationService.IsAuthenticWebhook(Request.Headers, body, _options.SecretKey);
if (!isValidRequest)
{
throw new UnauthorizedAccessException("This request is not an authentic webhook request.");
}
//handle webhook
return Ok();
}
Using HttpRequestStreamReader
hasn't fixed the problem.
We also wanted to rule temporary thread pool exhaustion (even though the thread count counter in the azure portal looks ok) because we thought that the threadpool might take time to ramp up threads, and maybe a spike of requests might cause temporary starvation.
But, it looks like there are plenty of threads and the error is still happening...
public override void OnException(ExceptionContext context)
{
context.ExceptionHandled = true;
ThreadPool.GetAvailableThreads(out var workerThreads, out var completionPortThreads);
ThreadPool.GetMaxThreads(out var maxWorkerThreads, out var maxCompletionPortThreads);
_logger.LogError(context.Exception, $"Error in global mvc exception filter. Available Threads = ({workerThreads:N0}/{maxWorkerThreads:N0}, {completionPortThreads:N0}/{maxCompletionPortThreads:N0}).");
}
Error in global mvc exception filter. Available Threads = (32,757/32,767, 1,000/1,000).
Exception: Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.
at Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException.Throw(RequestRejectionReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.d__4.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
at System.IO.Pipelines.Pipe.TryRead(ReadResult& result)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.OnConsumeAsync()
We are running out of ideas.
@clement911 What is your App Service scale?
We are having what looks like the same issue. ASP.NET Core 2.1 app in an Azure App Service (Windows) deployment with similar behavior - fine for a few minutes, then a group of connections all throw this exception, then fine again.
All request are POSTs with a JSON body that is being read and deserialized in a middleware.
Hi @tstuts,
It's a deployment with 3 to 5 S2 instances and we get a ton of requests, mostly webhooks.
Several 100k per day.
I'm not sure if it is related to that issue, but we also noticed we have been getting a lot more of the following errors. Nowhere near as much, but we used to maybe get it once a day and now we get it a lot more, and I feel like they might be happening around the same time as the orginal issue.
Warning from Microsoft.AspNetCore.Server.Kestrel:
Heartbeat took longer than "00:00:01" at "11/01/2018 19:48:02 +00:00".
That usually means there's thread pool starvation... Hmm
@davidfowl I don't understand how this is possible given that ThreadPool.GetAvailableThreads
is reporting there are plenty of threads available?
We don't create manual threads.
We have a few Task.Run() and also several periodic background jobs scheduled via Observable.Timer
, which uses the ThreadPool under the hood.
Here is a screenshot from Azure Portal showing our thread count over the last 12 hours.
I don't believe 200+ threads is excessive?
200 threads is a lot more than you have cores. If those threads are all active then some of them may not be getting enough CPU time (similar to thread pool starvation). We're making some changes in the data rate timers to mitigate these scenarios. https://github.com/aspnet/KestrelHttpServer/pull/3070. That said, your system is likely still overtaxed.
If that is the case then I guess increasing the number of instances might fix the issue.
Pardon my ignorance here but I don't understand.
CPU and memory seems healthy to me (see screenshots below).
We are running S2 instances but Azure portal is recommending S1.
I tried to find out how many cores S2 has but all I can tell is that they have 200 ACU (Azure compute units), whatever that means.
I went through the most recent MinRequestBodyDataRate errors and the difference between the available treads and the maximum threads is always less than 100, so there are less than 100 worker threads out of a supposedly sensible (?) default maximum of 32,757! 100/32757 is less than 1/3 of 1%.
I suppose I could increase the number of instances and maybe this will fix the issue, but I guess I would like to understand what is the bottleneck as it is not obvious to me at all. (rather than throwing money at the problem).
See https://blogs.msdn.microsoft.com/vancem/2018/10/16/diagnosing-net-core-threadpool-starvation-with-perfview-why-my-service-is-not-saturating-all-cores-or-seems-to-stall/. For details on how to diagnose starvation.
Are you using Task.Wait or Task.Result anywhere?
Thank you @davidfowl that was a good refresher.
Are you using Task.Wait or Task.Result anywhere?
No.
I will keep looking for any blocking calls I might have missed.
By the way we don't use .net core.
We use .net 4.7.1, but I assume that the thread pool works in a similar way.
@clement911 yes its fundamentally the same (core is a bit more optimized)
The .NET Threadpool DOES try to inject more threads, but does it at a modest rate (e.g 1 or 2 a second), so it makes little difference in the short term (you need many minutes to get up to 1000).
I wonder if this is a problem. That doesn't sound like a great strategy to handle temporary spikes.
Thanks, I will check Ben.BlockingDetector.
All our webhook method does is validate that it is authentic (CPU bound) and save it in the db with ef core (async IO).
I noticed that when the MinRequestBodyDataRate error occurs, it is likely (but not always) to happen in conjunction with a warning for multiple other webhooks saying the save to the database took an absurd amount of time (say, over 40 seconds and even up to 180 seconds!) to save the webhook data.
I thought it might be the EF core retry strategy that does multiple retries, but as far as I can tell the database is very responsive and I haven't found an event in EF core to log when a retry occurs...
It seems like a weird coincidence so I'll assume it might be due to the same thread exhaustion.
Yes those symptoms sound exactly like there's some starvation going on. Taking a memory dump when the problem occurs can help verify. If you have that many threads, you can also look at what those threads are doing and it may hint at the problematic area.
Will do thanks guys, I really appreciate the help
What HTTP Status code will be returned to the client when a MinRequestBodyDataRate / BadHttpRequestException occurs?
I would think a 4xx? (even though it is likely a internal issue in my case)
We get a lots of these errors in our logs but the Azure portal metrics claims there are hardly any HTTP 4xx or 5xx responses. I don't get it.
I got a memory dump from the azure portal (twice) hoping it would shed some light, but nothing obvious is showing up in the parallel stacks window. There are very few threads (7) with a managed stack and nothing blocking besides Program.Main and other couple of usual suspect threads (such as ConsoleLoggerProcessor queue).
All the other threads are showing 'Not available' in the threads window.
I tried another way of getting stacks from all threads, which worked, but that showed the same thing.
I also tried to replicate locally by creating a little program sending fake webhooks and I wasn't able to reproduce the problem, but at least it is showing a all the threads (though async call stacks are hardly readable as usual).
Out of deseperation I (reluctantly) enabled Application Insights which I read could collect metrics without re-deploying. It didn't work. None of the metrics are showing any data. I might have to look into the build-time application insights configuration.
I don't know if I can trust the azure metrics anyway. As I mentioned above, Azure monitor reports no http errors...
The issue seems to be getting more and more severe with the increasing activity that we are getting over time.
:weary: :confused:
The metric below from the dump caught my attention though, but the finalizer thread stack has no user code.
By the way, I was surprised to see that 3 of the threads were taken by the ConsoleLogger MessageProcessor Queue. That got me wondering why it is enabled in prod by default...
So I've been battling with this and wanted to report my findings.
We had to act quickly so and we decided to deploy an exact copy of our app into a new App Service Plan.
Then, we moved all your webhook subscriptions to point to the new instance.
So, the original app service only handles user facing requests and it made it very stable. No errors.
The new app service handles webhooks and DOES NOTHING ELSE.
The new app service is not even exposed to users and the only requests hitting are a simple Webhook() action that does a hash of the body and stores the body in the db by making an async call via SqlCommand with an await cmd.ExecuteNonQueryAsync();
Well, the MinRequestBodyDataRate error still occurs! (on the new instance of course).
Given that there is very little code being exercised in the new instance, I'm confident that we don't accidentally call any sync API or block any threads.
In the logs, we can clearly see that the errors occur when there is a spike of requests.
I'm thinking that the issue may be either in the sql connection pool or the thread pool.
With regular user traffic, one doesn't expect to have huge sudden peak of requests. It usually ramps up slowly and the thread pool seems pretty good at that.
On the other hand, with webhooks, we can sometimes get extremely sudden peaks because some systems somewhere generated a ton of events somewhere.
But the thread pools spawns thread very slowly.
I'm thinking of experimenting with ThreadPool.SetMinThreads
and ADO.net 'MinPoolSize' setting.
It might be a good strategy to keep a lot of threads and/or connections running to handle peaks.
If anyone has experience in that area I'm listening.
Those settings are easy to set, but there is very little documentation on how it works behind the scene and it feels very black boxy. Probably a good reason except when you run into a scenario like this...
Sounds like you're on the right track. One alternative we've considered for scenarios like these is request throttling. This would reduce the demand on the threadpool.
Not sure exactly how you're implementing your throttling and how this would solve the problem?
One issue is that most webhook providers (the systems/APIs sending the requests) typically require listener (like us) to respond with a HTTP 200 pretty quickly or they considered it a failure.
In our case it's 5 seconds.
It's only theoretical at this point, but ASP.NET 4 avoided this class of issues using a similar approach. Your requests are not extremely expensive to process, there's just a lot of them and they compete for resources like threads, CPU time, memory, etc.. If you put those requests in a queue and only process a few at a time then there's no longer contention and they can complete quickly. A naive version of this would be a middleware that used a mechanism like SemaphoreSlim to limit the number of concurrently processed requests. As long as everything is flowing smoothly there's no reason you shouldn't be able to handle large bursts in a timely fashion. You just have to keep the burst from crushing you.
@clement911 can you reproduce the issue on the site with webhooks if you hammer it with similar traffic?
@Tratcher that's an interesting suggestion. But if a request waits a SemaphoreSlim, will this count towards the ThreadPool's IO threads.
I think the hard part will be finding the sweet spot of number of threads in the pool to achieve maximum throughput.
@davidfowl both app services host exactly the same code. I tried to reproduce the issue locally with a naive local program issuing request but that didn't work. It's probably possible to reproduce with a proper load testing tool if that is what you are asking.
SemaphoreSlim has an async waiter so it doesn't block threads.
@davidfowl I think I've got a very similar issue with 2.1.5 handling Azure DevOps webhooks in a test web service. Number of requests is low (bursts of 6 only when queuing builds). In my case it repros on maybe 5-10% of bursts. Again I'm reading/hashing the body to validate against a signature.
@clement911 @mmitche Are you guys seeing a lot of this exception too?
Microsoft.AspNetCore.Connections.ConnectionResetException: An existing connection was forcibly closed by the remote host
@tstuts yes.
I'm guessing that a request that is not getting processed fast enough will lead to the client-side timing out and closing the connection, leading to this exception. Just another symptom of the same root issue. Does that sound right @davidfowl?
Given the very spiky nature of this error, I've also been wondering if it might be Azure running some periodical tasks on the VM, which might be causing this... I remember reading somewhere about some diagnostics jobs that can spike the machine every so often...
I don't recall that exception, but I'm going to test my service a little more now so I'll let you know what falls out.
We're also experiencing this with an API that takes GeoJSON and writes it to an Azure SQL database. The requests come in bursts and there are hundreds of requests at once, because we split the data in batches to avoid hitting request length limits.
It also looks like thread starvation, we're running on a single S3 instance:
I'll try to resolve the problem based on the information in this discussion, and if successful I'll report my findings here.
Not a lot of progress to report so far. I do need to correct my previous comment: we host a number of ASP.NET and ASP.NET Core applications in a single App Service, as Virtual Applications.
I assumed the problem to be in one specific API since that one receives most of the load, but I haven't found any obvious blocking calls. I tried out Ben.BlockingDetector on my local machine, but nearly all reported instances were apparently caused by logging ('ETL' appeared in the stack trace).
I also tried PerfView but didn't succeed in getting the desired output from it.
Regarding the thread count, we reach an average of 180 when idle due to w3wp + the various applications running on it. So perhaps there's not a huge issue in this area. I'll continue load testing and doing diagnostics to locate the source of the problem.
We're seeing the same error messages, but we were able to reproduce them using Chrome's Network Throttling profiles. After creating one at 1kb/s (up/down) and 5K ms latency, we were able to see the effects of getting BadHttpRequestExceptions with "Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate".
In our production case the issue was a mobile client, with very spotty internet connection, downloading data.
@davidfowl Is a hard exception the best way to deal with this? Do we just need middleware to handle slow clients, and swallow the exception?
As in our case, the app services are being recycled (assumed due to something like rapid fail protection). I don't recall anything like this in IIS 7 running ASP.NET 4.5. Although, IIS would have disabled the app pool, and taking the application offline completely...
The same problem to me as well.
Here is my code serving photo uploads from several cameras connected via GPRS (powering on then off to save energy):
[HttpPost("photo/post")]
public async Task PostLegacyAsync()
{
var content = new MultipartFormDataContent();
foreach (var pair in this.Request.Form)
{
content.Add(new StringContent(pair.Value), pair.Key);
}
foreach (IFormFile ff in this.Request.Form.Files)
{
byte[] data;
using (var br = new BinaryReader(ff.OpenReadStream()))
{
data = br.ReadBytes((int)ff.OpenReadStream().Length);
}
content.Add(new ByteArrayContent(data), ff.Name, ff.FileName);
}
// ...
this.Ok();
}
This service is considered legacy for our system, but there are still several devices to upgrade. Suggestions to improve it are welcome.
As a side note, I could accept a random failure sometime, but I don't like the application crashing for that.
Updating to latest System.IO.Pipelines fixed the issue for me.
@highfield you should avoid using Request.Form and instead use ReadFormAsync
Hi All,
We have the same issue in production, 2 S1 instances. In our environment it seems to be related to bursts in connections/requests, see attached diagram. The server errors in the chart are the ones discussed in this issue. I.e:
BadHttpRequestException: Reading the request body timed out due to data arriving too slowly
This is just to say that we are still very much struggling with this issue.
I'm hoping future releases will be able to handle spikes better.
We're seeing the same issue in production, along with a related BadHttpRequestException
triggering a 400 response: "Unexpected end of request content". We have little insight into what is causing this problem, but as little as 1 request/second with a 600KB request body will trigger it. It seems to be exacerbated by unreliable client connections, such as over mobile/cellular 3G networks. We see no related spikes in CPU or memory consumption at the time when these occur.
Have been trying to track down this issue as to why ASP.NET Core 2.1 containers keep crashing on K8s. Thought it would abort the request but surprised that it seems to crash the process...
Aborting the request doesn't crash the process
This is just to say that we are still very much struggling with this issue.
I'm hoping future releases will be able to handle spikes better.
@clement911 i was able to fix this issue in my work project by removing usage of HTTP2 in Kestrel configuration in Program.cs. Could you try and reply about effect - I really curious about it because in company only out project has this issue.
@Chakalaka11 we have not enabled http 2 but could you please copy your code here?
@clement911 So, for the past hours team made some changes, removing UseKestrel, using ConfigureKestrel to re-attach http2, work also fine. Guy who made fix says that UseKestrel could rewrite some standard config and made error.
Maybe this help
Today i will make some regression tests to check how all works and update status if it works fine.
UPD: Work just fine.
Thanks but I'm hesitant to apply changes that don't seem to be related to the original issue.
We'd like to use in-process host but I see it's not available for .net framework...
Ok, if I will have some updates on this error - I'll let you know. Hope you will find your solution.
I'd like to join the "party".
In our code, we do:
C#
await context.Request.Body.CopyToAsync(requestBodyStream);
Nothing synchronous against the request stream. We get the error "Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate."
Any findings that could help us? I read the thread above and it sounds like the root cause is unknown so far.
So, I'm looking at this. Has anybody tried setting a lower value on MinRequestBodyDataRate
?
Also, our application deals with slow network connections, indeed (by design). So maybe we should instruct the front end to retry when the connection is improved? Again, this may apply to only certain kinds of apps, and perhaps we are among them.
I did try the following in Startup.cs
ConfigureServices
method...
services.Configure<KestrelServerOptions>(options =>
{
options.Limits.MinRequestBodyDataRate = new MinDataRate(1, TimeSpan.FromMinutes(5));
options.Limits.MinResponseDataRate = new MinDataRate(1, TimeSpan.FromMinutes(5));
options.Limits.MaxRequestBodySize = 100000000;
});
Prior to making this change, I had asp.net core containers in Kubernetes that were crashing/restarting 90x an hour, with the last message in the container log being the BadHttpRequestException
stack trace. (As @davidfowl said above, this seemed really odd because I would expect this exception to occur at request level and not crash the process... so I'm inclined to think that perhaps there was something else also contributing to the crash?)
After making this change, have not had this issue. If this issue continued, my plan was to try and put some sort of reverse proxy in front of the Kestrel server that would buffer the slow incoming data stream, but haven't had to go to that point yet.
Good to hear that some are finding solutions.
Personally, I don't like to apply work arounds I don't really understand without understanding the issue first.
As far as I understand, under heavy load, the "Reading the request body timed out due to data arriving too slowly" error is not the real cause. It is a consequence of the system being stressed.
What I still haven't figured out is WHY the system is stressed given that metrics look healthy (CPU, memory and thread countmostly).
I'm still suspecting that it might be caused by the ThreadPool algorithm not creating threads fast enough when a spike of requests occurs, but as far as I know we cannot tweak this algorithm.
It would be fantastic if the asp.net core team could create a spike stress test on a simple web app to confirm if there is indeed a bottleneck at the thread pool level, or somewhere else.
We've had to upscale our azure instances even though they look healthy, just because we get tons of these, as well as 'Connection forcibly closed' and 'Connection processing ended abnormally' errors.
This happens for us in
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.<PumpAsync>
called by
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.<ReadAsync>
called by
Microsoft.AspNetCore.WebUtilities.FormReader.<ReadFormAsync>
called by
Microsoft.AspNetCore.Antiforgery.Internal.DefaultAntiforgeryTokenStore.<GetRequestTokensAsync>
which means that it happens at csrf check.
More stack trace:
Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate. at Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException.Throw(RequestRejectionReason reason) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.<PumpAsync>d__4.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.IO.Pipelines.PipeCompletion.ThrowLatchedException() at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result) at System.IO.Pipelines.Pipe.GetReadAsyncResult() at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.<ReadAsync>d__21.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.<ReadAsyncInternal>d__21.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.IO.StreamReader.<ReadBufferAsync>d__97.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.IO.StreamReader.<ReadAsyncInternal>d__64.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.WebUtilities.FormReader.<BufferAsync>d__41.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.WebUtilities.FormReader.<ReadNextPairAsyncImpl>d__34.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.WebUtilities.FormReader.<ReadFormAsync>d__43.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.Http.Features.FormFeature.<InnerReadFormAsync>d__18.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.Antiforgery.Internal.DefaultAntiforgeryTokenStore.<GetRequestTokensAsync>d__3.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.Antiforgery.Internal.DefaultAntiforgery.<ValidateRequestAsync>d__9.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.Mvc.ViewFeatures.Internal.ValidateAntiforgeryTokenAuthorizationFilter.<OnAuthorizationAsync>d__3.MoveNext() -
We had this once now. The operation sends a ID string and an answer string, together with some cookies. so really small body.
2019-05-09 11:31:50.9573|30131|TRACE|9.5.3.0|189|Base|BaseController.QuestionsSaveAnswer|27 ms|
2019-05-09 11:32:17.6812|30131|TRACE|9.5.3.0|97 |Base|BaseController.QuestionsSaveAnswer|24 ms|
2019-05-09 11:32:27.2114|30131|ERROR|9.5.3.0|106|Base|BaseController.GetErrorMessage|Error Page Path=/questionsSaveAnswer, error = Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.
at Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException.Throw(RequestRejectionReason reason)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1MessageBody.PumpAsync()
at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
at System.IO.Pipelines.Pipe.GetReadAsyncResult()
at System.IO.Pipelines.Pipe.DefaultPipeReader.GetResult(Int16 token)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken)
at Microsoft.AspNetCore.WebUtilities.FileBufferingReadStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
at Microsoft.AspNetCore.WebUtilities.StreamHelperExtensions.DrainAsync(Stream stream, ArrayPool`1 bytePool, Nullable`1 limit, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Mvc.Formatters.JsonInputFormatter.ReadRequestBodyAsync(InputFormatterContext context, Encoding encoding)
at Microsoft.AspNetCore.Mvc.ModelBinding.Binders.BodyModelBinder.BindModelAsync(ModelBindingContext bindingContext)
at Microsoft.AspNetCore.Mvc.ModelBinding.ParameterBinder.BindModelAsync(ActionContext actionContext, IModelBinder modelBinder, IValueProvider valueProvider, ParameterDescriptor parameter, ModelMetadata metadata, Object value)
at Microsoft.AspNetCore.Mvc.Internal.ControllerBinderDelegateProvider.<>c__DisplayClass0_0.<<CreateBinderDelegate>g__Bind|0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
We are running in-process ASP Net Core 2.2.0 on IIS. In the log the number before |Base| is the Thread ID.
We finally found a fix!
As we had suspected, it had to do with the thread pool not ramping up threads quickly enough.
We added the following two lines and this fixed the problem entirely and immediately:
ThreadPool.SetMaxThreads(32_767, 2_000);
ThreadPool.SetMinThreads(1_000, 1_000);
The thread pool ramp up algorithm is very conservative, and so are the default minimum threads.
I'm not actually sure if we needed to ramp up the worker threads. Possibly the io threads would be enough, as the CPU at a very low percentage utilization.
So my reading of this is that the IO requests were being queued up internally by the thread pool under spike.
I'm glad we found a solution, but even though the fix is tiny, it took a while to find it.
IMO the default are too conservative.
For a regular website that might work fine, but for an API or webhook receiver, there will be spikes that need to ramp up much faster than 2 threads per second.
It doesn't help that the ThreadPool is opaque and the documentation is very limited.
cc @stephentoub @kouvel
This is an excellent article describing what is causing this and why inceasing the minimum should only be a stopgap measure
https://blogs.msdn.microsoft.com/vancem/2018/10/16/diagnosing-net-core-threadpool-starvation-with-perfview-why-my-service-is-not-saturating-all-cores-or-seems-to-stall/
Yes thank you, I had read that one and it is indeed a great article.
It says that increasing the minimum of threads is a stopgap measure if the thread starvation is caused by blocking calls.
But in our case, we don't ever block. We only use async IO APIs.
So, as far as I can tell, the only way to handle sudden burst is to increase the minimum of threads.
We have now scaled down to a cheaper azure instance and scaled in from 4 instances to 2 instances and we are not getting a single error anymore. It's pretty remarkable!
Interestingly, the number of threads has not increased substantially but it can just ramp up much quicker than before.
Also note that the number of threads is below the minimum of 1,000 that we set, because the thread pool doesn't spawn them unless there are items waiting in the queue.
At least, that's my understanding.
It might be useful to be able to instantiate thread pools and give them different options for different workloads...
I'm extremely surprised that you're seeing starvation without blocking, even for bursts.
@clement911 maybe try to use https://github.com/benaadams/Ben.BlockingDetector to see which framework code is blocking
The only things that I'm aware of that would cause thread starvation are blocking, or long-running CPU/memory-bound work items in the thread pool. @clement911 since you mentioned that CPU utilization is low, without more information I would guess that blocking is occurring.
Blocking doesn't necessarily have to occur from synchronous IO. It could be a lock that is held for longer than expected, forward dependencies in queued tasks (a task schedules another task and synchronously waits for it to complete), or generally any synchronous activity that takes some time.
We finally found a fix!
As we had suspected, it had to do with the thread pool not ramping up threads quickly enough.
We added the following two lines and this fixed the problem entirely and immediately
That is the suggested workaround when it's not possible to fix the root cause. It may well be an appropriate solution if blocking is occurring, due to deficiencies in the thread pool when it comes to work items that block for relatively long spans of time. The solution may be counterproductive if blocking is not occurring. Again since that's not what you're seeing it sounds like blocking is occurring.
It may help to look at blocking that is likely occurring with a PerfView trace including Thread Time
(which includes context-switch events and time spent blocking).
Thanks I appreciate the comments.
A while back we didn't know what to do with this issue but we knew it was related to our webhook receiver controller action receiving a lot of requests.
So we deployed another separate azure app service instance, specifically to receive webhooks.
So, what we have is an asp.net core service with a single controller and a single action being called.
(We left the rest of the code, but the only action being called is the webhook one and it doesn't serve user requests or run any jobs.)
All it does it validate the headers to make sure the request is authentic and save the payload to a sql azure database with an async call.
That's it.
That's why I'm quite confident we don't have any blocking going on.
To your point, sure async improves scalability but there is still a limit to how many requests can be processed, right?
In our case, our spike can be very sudden and very large.
If a user initiates a CSV import in Shopify, Shopify will send us 1 request per row instantly.
We could get 1,000 requests hitting a single instance in a minute. Isn't it possible that under these conditions the thread pool will start queueing requests?
When running on IIS there’s a request queue deep in the kernel (http.sys) but there’s no request queuing happening in ASP.NET core. The only queue is the thread pool queue. The exception you’re seeing is coming from kestrel (so I’m assuming you’re using the out of process IIS module). If you can still reproduce the issue, a trace would be great. We’re just adding counters in .NET Core 3.0 which is of no help to you right now but this sorta thing should be more obvious in the future with those counters.
btw, I was referring to the thread pool queue.
Perhaps you could put the requests from Shopify in your own background queue and process them one by one. That way, you never have the spike.
https://docs.microsoft.com/en-us/aspnet/core/fundamentals/host/hosted-services?view=aspnetcore-2.2
I have done that in a similar scenario where the user uploads a spreadsheet and then lots of work has to be done per row. Happy to share the code somewhere.
@vankampenp
The issue is that we must be sure that the webhook requests are persisted before we can respond HTTP 200 to Shopify.
If we just put it in a memory queue and the app restarts for any reason, then we would lose the requests.
@clement911 I don't know your particular situation. In our scenario we persist a key, then push it on the queue, which sets a flag once it is done. We run a daily catch all to see if there are any key's without flag.
This only makes sense if the work that has to be done takes a long time.
even if it is just a key, the point is you need to persist at least something so IO is needed.
We had the same issue and solved it by moving the reading of the request body out of the controller.
In our case we had a high volume REST API where first the request body was read inside of the controller with _HttpRequestStreamReader_. Under heavier load we saw every few days this error:
BadHttpRequestException: Reading the request body timed out due to data arriving too slowly
And also faced a heavy drop of throughput during this time. This occured randomly and recovered by itself after about 10 minutes.
With the BlockingDetector tool we were able to identify some minimal blocking somewhere in the framework while reading the request body that way.
Because of that we created an InputFormatter that allowed to read json-formatted request body as string and retrieved the request body as parameter in the controller like that:
public async Task<IActionResult> ProcessAsync([FromBody]string jsonBody = null)
After applying this workaround, the BlockingDetector did not detect blocking anymore and we had no more BadHttpRequestExceptions for three weeks now.
With the BlockingDetector tool we were able to identify some minimal blocking somewhere in the framework while reading the request body that way.
You can share the results of this?
Yes, that was the output of the tool:
And here the previous controller code that triggered this:
using (HttpRequestStreamReader reader = new HttpRequestStreamReader(Request.Body, Encoding.UTF8))
{
_jsonBody = await reader.ReadToEndAsync();
}
using (HttpRequestStreamReader reader = new HttpRequestStreamReader(Request.Body, Encoding.UTF8)) { _jsonBody = await reader.ReadToEndAsync(); }
Was this code you wrote in your controller?
Yes, this was the code with the blocking.
@berndku why do manual JSON de-serialization from a string instead of using the built in formatter?
@davidfowl: performance, not all of our requests require complete de-serialization of the entire body. But I believe we also tested it quickly with the built in formatter by replacing the string with a JObject parameter and didn´t see blocking there as well.
@davidfowl: performance, not all of our requests require complete de-serialization of the entire body. But I believe we also tested it quickly with the built in formatter by replacing the string with a JObject parameter and didn´t see blocking there as well.
Performance is overloaded? Better throughput trading off memory usage? Buffering the entire body into memory means you have this temporary string just for the purpose of turning that into another object. The built in formatters don't block, they go out of their way to avoid buffering into a single contiguous block of memory (like a string or byte[]).
Regardless, I'll file an issue for the ReadToEndAsync behavior https://github.com/aspnet/AspNetCore/issues/13834
Thanks for filing the issue!
Fully understand your point about built-in formatters and buffering, but we have to trace each body completely within the controller, no matter if we deserialize it or not. So we need a string representation of the body anyway.
Fully understand your point about built-in formatters and buffering, but we have to trace each body completely within the controller, no matter if we deserialize it or not. So we need a string representation of the body anyway.
https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/blob/master/AspNetCoreGuidance.md#avoid-reading-large-request-bodies-or-response-bodies-into-memory
😄
@davidfowl That link provides context as to _why_ not to do this (great! 😃), but not _what to do instead_ (oh 😞). Let's say you need to accept arbitrary file bodies in a request, and send that arbitrary file request body to another system, such as a database or a message queue. How do you avoid reading that request into memory?
Sure, this is not a good practice, but requests are about 20-40KB, so no LOH fragmentation issues expected. I recommended to move the body tracing to a different place in the architecture a while ago, but this will take longer and we needed a short term solution.
Since #13834 is about _HttpRequestStreamReader_, I checked our custom formatter (code below) again, but there we just used _StreamReader_. Could that be the reason, why we don´t see the blocking with the custom formatter ?
If yes, does that mean we should avoid _HttpRequestStreamReader_ until the issue is fixed ?
public class RawJsonBodyInputFormatter : InputFormatter
{
public RawJsonBodyInputFormatter()
{
this.SupportedMediaTypes.Add("application/json");
}
public override async Task<InputFormatterResult> ReadRequestBodyAsync(InputFormatterContext context)
{
var request = context.HttpContext.Request;
using (var reader = new StreamReader(request.Body))
{
var content = await reader.ReadToEndAsync();
return await InputFormatterResult.SuccessAsync(content);
}
}
protected override bool CanReadType(Type type)
{
return type == typeof(string);
}
}
mm, sorry if this is not related, but I get this output from the blockingmonitor when IIS is restarted. It occurs multiple times. I use the asp-append-version taghelper. After temporarily removing that, the blocking call is no longer reported.
2019-09-10 15:06:25.0604|35.230.16.48||WARN |9.9.19.0|39 |Base|BlockingMonitor.BlockingStart|Blocking method has been invoked and blocked, this can lead to threadpool starvation.
at System.IO.FileStream.Read(Byte[] array, Int32 offset, Int32 count)
at System.Security.Cryptography.HashAlgorithm.ComputeHash(Stream inputStream)
at Microsoft.AspNetCore.Mvc.Razor.Infrastructure.DefaultFileVersionProvider.GetHashForFile(IFileInfo fileInfo)
at Microsoft.AspNetCore.Mvc.Razor.Infrastructure.DefaultFileVersionProvider.AddFileVersionToPath(PathString requestPathBase, String path)
at Microsoft.AspNetCore.Mvc.TagHelpers.ScriptTagHelper.Process(TagHelperContext context, TagHelperOutput output)
at Microsoft.AspNetCore.Razor.TagHelpers.TagHelper.ProcessAsync(TagHelperContext context, TagHelperOutput output)
at Microsoft.AspNetCore.Razor.Runtime.TagHelpers.TagHelperRunner.RunAsync(TagHelperExecutionContext executionContext)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Razor.Runtime.TagHelpers.TagHelperRunner.RunAsync(TagHelperExecutionContext executionContext)
at AspNetCore.Views_Shared__Layout.<>c__DisplayClass14_0.<
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at AspNetCore.Views_Shared__Layout.<>c__DisplayClass14_0.
at Microsoft.AspNetCore.Razor.Runtime.TagHelpers.TagHelperExecutionContext.SetOutputContentAsync()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Razor.Runtime.TagHelpers.TagHelperExecutionContext.SetOutputContentAsync()
at AspNetCore.Views_Shared__Layout.ExecuteAsync()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at AspNetCore.Views_Shared__Layout.ExecuteAsync()
at Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderPageCoreAsync(IRazorPage page, ViewContext context)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderPageCoreAsync(IRazorPage page, ViewContext context)
at Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderPageAsync(IRazorPage page, ViewContext context, Boolean invokeViewStarts)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderPageAsync(IRazorPage page, ViewContext context, Boolean invokeViewStarts)
at Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderLayoutAsync(ViewContext context, ViewBufferTextWriter bodyWriter)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderLayoutAsync(ViewContext context, ViewBufferTextWriter bodyWriter)
at Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderAsync(ViewContext context)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Mvc.Razor.RazorView.RenderAsync(ViewContext context)
at Microsoft.AspNetCore.Mvc.ViewFeatures.ViewExecutor.ExecuteAsync(ViewContext viewContext, String contentType, Nullable1 statusCode)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
at Microsoft.AspNetCore.Mvc.ViewFeatures.ViewExecutor.ExecuteAsync(ViewContext viewContext, String contentType, Nullable
1 statusCode)
at Microsoft.AspNetCore.Mvc.ViewFeatures.ViewExecutor.ExecuteAsync(ActionContext actionContext, IView view, ViewDataDictionary viewData, ITempDataDictionary tempData, String contentType, Nullable1 statusCode)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
at Microsoft.AspNetCore.Mvc.ViewFeatures.ViewExecutor.ExecuteAsync(ActionContext actionContext, IView view, ViewDataDictionary viewData, ITempDataDictionary tempData, String contentType, Nullable
1 statusCode)
at Microsoft.AspNetCore.Mvc.ViewFeatures.ViewResultExecutor.ExecuteAsync(ActionContext context, ViewResult result)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Mvc.ViewFeatures.ViewResultExecutor.ExecuteAsync(ActionContext context, ViewResult result)
at Microsoft.AspNetCore.Mvc.ViewResult.ExecuteResultAsync(ActionContext context)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Mvc.ViewResult.ExecuteResultAsync(ActionContext context)
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeResultAsync(IActionResult result)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeResultAsync(IActionResult result)
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.ResultNextTFilter,TFilterAsync
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeNextResultFilterAsyncTFilter,TFilterAsync
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeNextResultFilterAsyncTFilter,TFilterAsync
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.ResultNextTFilter,TFilterAsync
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeResultFilters()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.StartTStateMachine
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeResultFilters()
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeNextResourceFilter()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox
1.MoveNext()
at System.Runtime.CompilerServices.TaskAwaiter.<>c.
at System.Threading.QueueUserWorkItemCallback1.<>c.<.cctor>b__6_0(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.QueueUserWorkItemCallback
1.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
@alastairs
@davidfowl That link provides context as to why not to do this (great! 😃), but not what to do instead (oh 😞). Let's say you need to accept arbitrary file bodies in a request, and send that arbitrary file request body to another system, such as a database or a message queue. How do you avoid reading that request into memory?
By chunking it into < 85K buffers then transferring it chunk by chunk. It's one of the tricks that Stream uses to copy data from one Stream to another Stream https://github.com/dotnet/corefx/blob/ee9995f31b684a0c6e5488eceb2500bf0057da89/src/Common/src/CoreLib/System/IO/Stream.cs#L31-L34.
That combined with pooling those buffers would result in better performance overall as the large object heap wouldn't be churning aggressively. Also there's a potential security risk in exposing an endpoint that reads everything into memory and transfers it to another system. Any place you can take arbitrary client input is a potential place where clients can potentially cause a denial of service if you're not careful.
@berndku
Since #13834 is about HttpRequestStreamReader, I checked our custom formatter (code below) again, but there we just used StreamReader. Could that be the reason, why we don´t see the blocking with the custom formatter ?
Yes, StreamReader
overrides the right methods.
.UseKestrel(opt =>
{
opt.Limits.MinRequestBodyDataRate = null;
})
Could this work?
Have the same problem when two services communicate with each other (within localhost). @HengzheLi that didn't help me...
Having the same problem
Hi all,
We're starting a new project that involves again saving webhooks to the database for further processing.
We expect small payloads and but a high number of requests.
Is the following code optimal?
[HttpPost]
public async Task<OkResult> Webhook()
{
var body = await new HttpRequestStreamReader(Request.Body, Encoding.UTF8).ReadToEndAsync();
//save to Db asynchronously?
return Ok();
}
I am getting this error too, but I think my issue is here:
blob.UploadFromByteArrayAsync(fileByteArray, 0, fileByteArray.Length).Wait();
vs
await blob.UploadFromByteArrayAsync(fileByteArray, 0, fileByteArray.Length);
using Wait()
will cause the upload to block the thread, whereas using await
won't.
@clement911 that is a good way to process webhooks, and it would be great if your processing is then done by an event queue instead of the web server.
My issue is receiving a url, downloading it, then uploading it! Stupid to do all of that in the web request. So fixing the Wait()
issue will be moot once I rework this code to just verify that the image url is good, then dump the actual db saves and image uploads to an event processor.
But thank you for keeping up with this issue!
FYI, since the time of my post here, I started a new project with Asp Net core 2.2 (previous was 2.1). Furthermore, the app is now running on Ubuntu 18 (previous was on 14).
No one failure since that time.
@davidfowl Is it correct the below piece of code in order to forward an upload to another service? each upload is basically an image (10-15MB) and few key-value pairs.
[HttpPost("photo/post3")]
public async Task PostV3Async()
{
IFormCollection fcoll = await this.Request.ReadFormAsync();
var content = new MultipartFormDataContent();
foreach (var pair in fcoll)
{
content.Add(new StringContent(pair.Value), pair.Key);
}
foreach (IFormFile ff in fcoll.Files)
{
byte[] data;
using (var br = new BinaryReader(ff.OpenReadStream()))
{
data = br.ReadBytes((int)ff.OpenReadStream().Length);
}
content.Add(new ByteArrayContent(data), ff.Name, ff.FileName);
}
var client = this._httpClientFactory.CreateClient(ClientName);
await client.PostAsync("/photo/post3", content);
this.Ok();
}
I'm still wondering what's the best way to read the whole request body asynchronously.
We were doing await new HttpRequestStreamReader(Request.Body, Encoding.UTF8).ReadToEndAsync();
but after upgrading to net core 3.1 we get this issue: https://github.com/aspnet/AspNetCore/issues/13834
Since the fix is planned for .net 5.0, what is the right way for .net 3.1?
Any commit for the issue?
Since this issue has been marked as a medium security vulnerability by WhiteSource, can anyone tell me a quick summary of what the exploit is here? This is a VERY long thread going back to 2018. Thank you.
I'm having the same problem
Having the same issue as well.
For those who are still experiencing the issue, did you read my post? https://github.com/dotnet/aspnetcore/issues/4707#issuecomment-557538742
As today, I had NO issues since over one year on two different servers.
Any update on this issue, I am also facing the same exception. The request is pretty small but kestrel is throwing 408 with message "request arriving too slow".
"message": "Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.", "exception": {"Type":"Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException", "StatusCode":408, "TargetSite":"Void Throw(Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.RequestRejectionReason)", "StackTrace":" at Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException.Throw(RequestRejectionReason reason)\r\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1ContentLengthMessageBody.ReadAsyncInternal(CancellationToken cancellationToken)\r\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken)\r\n at Microsoft.AspNetCore.WebUtilities.FileBufferingReadStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)\r\n at Microsoft.AspNetCore.WebUtilities.StreamHelperExtensions.DrainAsync(Stream stream, ArrayPool`1 bytePool, Nullable`1 limit, CancellationToken cancellationToken)\r\n at Microsoft.AspNetCore.Mvc.Formatters.NewtonsoftJsonInputFormatter.ReadRequestBodyAsync(InputFormatterContext context, Encoding encoding)\r\n at Microsoft.AspNetCore.Mvc.ModelBinding.Binders.BodyModelBinder.BindModelAsync(ModelBindingContext bindingContext)\r\n at Microsoft.AspNetCore.Mvc.ModelBinding.ParameterBinder.BindModelAsync(ActionContext actionContext, IModelBinder modelBinder, IValueProvider valueProvider, ParameterDescriptor parameter, ModelMetadata metadata, Object value)\r\n at Microsoft.AspNetCore.Mvc.Controllers.ControllerBinderDelegateProvider.<>c__DisplayClass0_0.<<CreateBinderDelegate>g__Bind|0>d.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeInnerFilterAsync>g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeNextExceptionFilterAsync>g__Awaited|25_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow(ExceptionContextSealed context)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeNextResourceFilter>g__Awaited|24_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow(ResourceExecutedContextSealed context)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited|19_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)\r\n at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)\r\n
[HttpPost]
[ProducesResponseType(typeof(BaseResponse<bool>), 1)]
public async Task<BaseResponse> SyncMsg()
{
string data = string.Empty;
try
{
using (var sr = new StreamReader( Request.Body))
{
data = await sr.ReadToEndAsync();
//todo: lenth more then 1024, Decompress
data = StringHandleHelper.Decompress(data);
var url = ConfigurationManager.GetValue("CallBackUrl");
Task.Factory.StartNew((obj) =>
{
_messageService.SyncMsg((string)obj, url);
}, data);
var j_data = JToken.Parse(data);
var type = j_data["type"].ToObject<PcWxMsgType>();
var wxid = j_data["wxid"].ToObject<string>();
return Response2WeChatRobot(wxid, type);
}
}
catch (Exception ex)
{
return new BaseResponse() { Code = -2, Msg = "站点:" + ex.Message };
}
}
I
I've seen all the discussions. Is there an effective solution now
Is this going to get fixed in the upcoming release?
Hi all,
We're starting a new project that involves again saving webhooks to the database for further processing.
We expect small payloads and but a high number of requests.Is the following code optimal?
[HttpPost] public async Task<OkResult> Webhook() { var body = await new HttpRequestStreamReader(Request.Body, Encoding.UTF8).ReadToEndAsync(); //save to Db asynchronously? return Ok(); }
@clement911 if moving to DB processing first would solve the ReadToEndAsync issue?
or else you tried using structured class instead of reading the object from Stream instead?
Ok, I'm started reading this thread an hour ago and Im hear at the end yet to get solution. What is the solution now for .NEt 3.1, Im having same issue
It's because the issue has turned into lots of different issues 😄. There's no repro for this issue and it depends on the environment and client speed (at least the original issue does). We know there's an issue somewhere but it's not clean how we make progress with the information provided here.
@Gopichandar I came across this and solved the issue as follow
refractor most methods using await + async Task method() pattern
seems the root cause in our case caused by all CPU being used for waiting tasks (mostly DB/IO related) to finish, and this causes the exception with BadHttpRequestException: Reading the request body timed out due to data arriving too slowly
@davidfowl I think I was able to reproduce this issue over and over when I debugged my own code. I had made a small movie and Wireshark captures. But cannot share them here in this public github repo, what other more private channels can I reach out to you and the MS team?
The longer story:
Was making a few million PUT calls to our servers and around 0,08% of them failed also after multiple retries attempts – all had the same message on the server side: “Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate”
. For sure the server was not idle during the many calls. But the day after the server was almost idle, and I was able to take the failed calls and re-execute them as a single request at the time. To my big surprise was it consistent that the payload failed and I still see same error message in our server logs.
Had also tried to increase the timeout, to give the server time for finding a free thread, but still same errors just after 10 sec, instead of the default 5 sec:
options.Limits.MinRequestBodyDataRate = new MinDataRate(120, TimeSpan.FromSeconds(10))
After hours of investigation I found out that there was two thing I could change to make the call succeed:
1) Disconnect Ciso VPN.
The VPN splits the payload into smaller TCP packages (here is it 5 packages). It’s always the last package that need to be re-transmitted. See image, the “.135” is my pc and “.57” is server. (when im disconnected from VPN it will send the payload in single TCP package). Was also connected to VPN when made the few millions calls.
2) Change the payload size with 1 byte – either add or remove one character to the json payload (off course on a field that is not analyzed on server side)
The payload is still splited into 5 TCP packages - due to connected to VPN.
The payload is ~6Kb.
Its not only from my pc with VPN that we see this error, we see it from our end customers requests round the world without VPN. But this investigation its the closest I had been to debug, analyze and understand this error message, we see in our server logs.
I don’t think it’s a thread pool issue on the server side, due I am able to reproduce it over and over and a 1 byte change in the payload size can affect the result. And the other day the server handled way more requests.
I had seen error message “Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate”
on both request to routes with simple aspnet core model Binding [FromBody], and routes that read body it self await Request.Body.ReadAsBytesAsync()
.
So the question is, what is it in aspnetcore that can cause this issue ? and how do I share more data with you guys?
It seems like I found a workaround until this gets fixed.
I had the same issues as @rasmus-s . It kept working until the TCP packages got split into separate ones, which (for me) was exactly at 1398 bytes. Adding the overhead, that's oddly close to the ~1500 bytes of an ethernet frame.
After doing some research I found a post where someone suggested to check the MTU size, which I did with ping ip -f -l xxx
. Sure enough, -l 1473
returned the message Packet needs to be fragmented but DF set
, so 1472 seems to be the limit where packages won't be split and where the exception does not occur.
To compare my result with something else, I hopped on a system where it worked fine and ran that command again - this time no Packet needs to be fragmented but DF set
returned. I even used -l 65500
which is the maximum - no message.
It actually does seem like the MTU is the problem here.
I'll change the MTU of the first system where it doesn't work and keep this updated.
Most helpful comment
Good to hear that some are finding solutions.
Personally, I don't like to apply work arounds I don't really understand without understanding the issue first.
As far as I understand, under heavy load, the "Reading the request body timed out due to data arriving too slowly" error is not the real cause. It is a consequence of the system being stressed.
What I still haven't figured out is WHY the system is stressed given that metrics look healthy (CPU, memory and thread countmostly).
I'm still suspecting that it might be caused by the ThreadPool algorithm not creating threads fast enough when a spike of requests occurs, but as far as I know we cannot tweak this algorithm.
It would be fantastic if the asp.net core team could create a spike stress test on a simple web app to confirm if there is indeed a bottleneck at the thread pool level, or somewhere else.