Aspnetcore: IIS Application Pool Recycle returns unexpected 503

Created on 9 May 2019  路  9Comments  路  Source: dotnet/aspnetcore

Describe the bug / Steps To Reproduce

I have a Asp.Net Full Framework MVC Application (Visual Studio default MVC Template) hosted on my local IIS10 (Win10) (not IIS Express) with "Disable Overlapped Recyle = false" (default value).
If i recycle the application pool while i use a tool to continuously send http requests the webserver will respond correctly for all requests, no Errors (500 / 503) and no timeouts.

If i try to do the same with Asp.Net Core 2.2 (InProcess) i will get 503 exceptions.

I have the same Problem if i change the physical path for my website.
Full Framework no 503, AspNetCoreModuleV2 a lot of 503.
The old "AspNetCoreModule" (Asp.Net Core 2.1) had a similar Problem but had returned less 503 responses.

Expected behavior

The Expected behavior would be:

  • no 503 errors on recycle (with "Disable Overlapped Recyle = false")
  • no 503 responses if the physical path changes.

This is important for high availability and depoyment in my case.

Screenshots

Full Framework 4.7.2 with overlapped recycling not hosted via AspNetCoreModule
grafik
(left Baseline, right recycle test)

AspNetCore 2.2 with overlapped recycling via AspNetCoreModuleV2
grafik
(left Baseline, right recycle test)

affected-few area-servers enhancement severity-minor

Most helpful comment

@anurse thanks for the investigation. I agree with your conclusions about high-availability scenarios and deployment.

But for me the more important case is the normal periodical application pool recycle.
Correct me if i'm wrong, but for this case no shadow copying would be neccessary because nothing changes in the web folder.

During the recycle process the IIS returns IIS 10.0 detailed error - 503.0 - Service Unavailabe, this could be caused by https://github.com/aspnet/AspNetCore/blob/3477daf3c4f530dff80f197e0642cb39a26fb07b/src/Servers/IIS/AspNetCoreModuleV2/AspNetCore/proxymodule.cpp#L121 ERROR_SERVER_SHUTDOWN_IN_PROGRESS but i'm not sure about this.

it seems like there are requests still processed by the worker process in shutdown phase and not queued in IIS request queue for the new worker process. This issue is probably different from the shadow copy problem during a deployment, and it happen much more often (every 27 hours with default config)

All 9 comments

Maybe the application pool's 'Queue Length' is too small? Default is 1,000 and if exhausted, 503ers are returned.

@rockerinthelocker the 'Queue Length' for my tests was 1000 (default) but in both cases, .net full and core. I've made one more test with Queue Length = 65000 but nothing has changed, neary the same count of request errors.
grafik

One more thing i've noticed is, the difference in Max Request time, this could explain the problem, or maybe give us a hint...

  • .net full framework
    if i hit recycle the already started requests will be processed.
    the new worker process will be started (overlapped)
    the new incomming requests will be queued
    if the new process is ready they will be processed there without errors

    result: some requests with long request time (long wait time in queue)

  • .net core
    if i hit recycle the already started requests will be processed.
    the new worker process will be started (overlapped)
    the new incomming requests will be queued OR still processed by the old worker but return 503 while in shutdown phase
    if the new process is ready they the queued requests will be processed by the new worker but this happens to early, process is not ready to process requests and return 503

    result: in both cases we get some 503 but we get less long response time (less time in queue)

Of course this is only a possible explaination, no concrete result of debugging!

We'll do some looking at if we can improve this. However, in general we recommend using a pattern like blue-green deployments for zero-downtime deployments. Features like Overlapped Recycle help, but don't guarantee that you can do a zero-downtime deployment.

@anurse Thanks for taking a look at this.

It's ok if it doesn't guarantee zero downtime for deployment, but it would be nice if it would work like the old .NET, but you're right there will be no guarantee.

But i think it should be guaranteed zero downtime for application pool recycle (by time interval, by memory limitation, manual recycle via IIS UI) right?

But i think it should be guaranteed zero downtime for application pool recycle (by time interval, by memory limitation, manual recycle via IIS UI) right?

Nothing is guaranteed when it comes to downtime, that's what makes high-availability applications so tricky :). But my understanding is that yes we can try to make this better. In the situation where the app pool is recycling, IIS should be emitting the appropriate events to ensure we keep downtime low.

We'll look in to this if we can in 3.0, but we have a lot of high-priority work, so I can't guarantee how far we'll get.

Moving to the Backlog. We've done some initial investigation and this is going to be very challenging. For high-availability scenarios we need this and some kind of shadow copying mechanic which is a very costly feature. We might be able to revisit later. Using separate app pools and a load balancer is our recommended approach for high-availability as it allows you full flexibility over deployment process and the ability to easily revert versions.

@anurse thanks for the investigation. I agree with your conclusions about high-availability scenarios and deployment.

But for me the more important case is the normal periodical application pool recycle.
Correct me if i'm wrong, but for this case no shadow copying would be neccessary because nothing changes in the web folder.

During the recycle process the IIS returns IIS 10.0 detailed error - 503.0 - Service Unavailabe, this could be caused by https://github.com/aspnet/AspNetCore/blob/3477daf3c4f530dff80f197e0642cb39a26fb07b/src/Servers/IIS/AspNetCoreModuleV2/AspNetCore/proxymodule.cpp#L121 ERROR_SERVER_SHUTDOWN_IN_PROGRESS but i'm not sure about this.

it seems like there are requests still processed by the worker process in shutdown phase and not queued in IIS request queue for the new worker process. This issue is probably different from the shadow copy problem during a deployment, and it happen much more often (every 27 hours with default config)

So, given this issue and https://github.com/dotnet/aspnetcore/issues/8775, is there a way to have zero-downtime application pools recycles and deployments with IIS-only tools, or we should use other tools? We're facing the exact same issue.

@ferrarimartin FYI: for our company we have decided to go on with .net 4.x full framework to provide this zero-downtime services.
we where not able to work around this issue for application pool recycle. for Publishing we used some kind of "blue green deployment strategy" but for periodical recycle we where not able to find a workaround

Was this page helpful?
0 / 5 - 0 ratings

Related issues

guardrex picture guardrex  路  3Comments

rbanks54 picture rbanks54  路  3Comments

Kevenvz picture Kevenvz  路  3Comments

fayezmm picture fayezmm  路  3Comments

farhadibehnam picture farhadibehnam  路  3Comments