I honestly am at a complete loss and not sure where to go to log this issue I am having, but I will try my best to provide information and I hope someone can help guide me.
We published a Net Core 3.1 API on an internal Windows Server 2019 machine. We setup a new machine (fresh install). The exact API ran previously on Windows Server 2016.
However we noticed that there was some serious issues on the new server. We could not remote and the API was not accessible.
Finally we managed to see what was happening:
The w3wp.exe
process that hosted the AspNet API looked to be stuck and permanently allocating memory. So much so that there was no memory left for anything else, effectively taking the server offline. It also seemed to be doing some processing on the SQL Server Express instance that was also installed on that machine, but the traces we ran of what SQL was running did not point us in any relevant direction.
Restarting the IIS App Pool did not immediately stop the problem as the process continued to spin away. Eventually the rogue process would stop and we could regain control of the server.
Now we removed so much of our code to try and establish where the problem was to no avail. And unfortunately it has led me here seeking some help.
One thing that we did find in our efforts to fix this was that if we run OutOfProcess
the issue vanishes. The second we run InProcess
, the process will run out of control at some point in time (probably like 2-4 times a day).
Now this does not happen as soon as the process has started. But seems to occur randomly - I am not able to trace any specific API that is called (as they have all been disabled). The API really is simple, just a bunch of Controllers that query SQL. The process will respond and serve traffic like normal, and then you go for a jog and the next thing you know your phone is blowing up because things are not working. This inconsistency is extremely stressful. However I did find that when I restart the server and then remote onto it, the process has a tenancy to start going crazy.
I do not know what traces or debug things to run or how to further resolve this issue. Sorry I know this may not be the right place and I have tried to explain my story. It is hard to do huge amounts of diagnostics on a live server but I am willing to try get to the bottom of it. I am hoping there is some strange config somewhere...?
To summarise:
w3wp
process spins up a core and allocates huge amounts of memory making the host unresponsive.<AspNetCoreHostingModel>OutOfProcess</AspNetCoreHostingModel>
_only_ InProcess
Startup is simple:
Looks to be using Workstation GC - Using Server GC makes no difference.
- ASP.NET Core version 3.1
Host (useful for support):
Version: 3.1.3
Commit: 4a9f85e9f8
.NET Core SDKs installed:
No SDKs were found.
.NET Core runtimes installed:
Microsoft.AspNetCore.All 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.All 2.2.7 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 2.2.7 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 3.0.0 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 3.1.0 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 3.1.1 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 3.1.3 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.2.7 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 3.0.0 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 3.1.3 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Looks to be using Workstation GC - Using Server GC makes no difference.
How did you end up using Workstation GC? That would be hard to do unless you set it that way.
API w3wp process spins up a core and allocates huge amounts of memory making the host unresponsive.
It would make sense to capture a memory dump when things are headed south. You can take a look at the dotnet diagnostics tools to get more information as well:
https://docs.microsoft.com/en-us/dotnet/core/diagnostics/#net-core-dotnet-diagnostic-global-tools
cc @Maoni0
Been doing some testing with dotnet-counters
as requested. It was quite hard as the InProcess w3wp.exe consumes all memory on the machine.
Again this _only_ happens with InProcess. OutOfProcess works perfectly.
Additional info:
GCSettings.IsServerGC
)@sywhang how to interpret this dotnet-counters output? the GenX GC count / 60 sec means during the last 60 seconds this is the # of GenX GCs observed? how did we choose these values (60 secs/1 sec)? this looks different from the doc I found. for example, it could be very normal that we do gen2 GCs every > 60s.
the GenX GC count / 60 sec means during the last 60 seconds this is the # of GenX GCs observed?
Not exactly. It depends on on the update interval - whatever update interval we saw, we adjust the value to the / 60 seconds format. So if you requested for this counter to be published every second (via the --refresh-interval flag), and you saw 1 gen 2 GC between that one second of update interval, the value will show up as "60" which doesn't necessarily mean that 60 Gen 2 GCs happened in the past 60 seconds. If you chose the --refersh-interval to a longer interval, that would yield some more useful info here.
The doc you linked does seem outdated though, so I will go update that.
you saw 1 gen 2 GC between that one second of update interval, the value will show up as "60" which doesn't necessarily mean that 60 Gen 2 GCs happened in the past 60 seconds.
that seems confusing. why would we do this? might as well just display 1 for the interval that you specified since you can't extrapolate to 60s anyway.
Anything I can try to figure out how to get to the bottom of this? Any further information I can provide?
I did take a dump from task manager while the process was misbehaving. It's too big to attach. I did open the dump in VS, but I didn't see relevant (although I'm not 100% confident in what to look for).
when you open the dump in VS did it show what's on the heap?
CC @brianrob if he has cycles to help.
We are also facing exactly similar issue in our production servers(windows server 2019). We had to change the hosting model to _outofprocess_ to keep it running as of now. Now interestingly there is memory build up on w3wp.exe, but dotnet.exe runs fine.
@dylanvdmerwe, having skimmed this issue, if I understand correctly, you see the machine run out of memory 2-4 times per day, is that right? How quickly does the memory rate increase from when you reset it? Ideally we can attempt to capture what's happening with some ETW traces, but we probably will also want to capture a dump or two as the memory is increasing.
To capture a trace, I would recommend using PerfView (https://aka.ms/perfview). Please use the command: PerfView.exe collect /KernelEvents=Default+VirtualAlloc /CircularMB:4096 /BufferSize:1024
You'll probably want to capture these and then put them on a file sharing service such as OneDrive. You can link to them from here, or if you would like to keep them private, you can send them to me in e-mail - brianrob [at] microsoft.
@dylanvdmerwe, having skimmed this issue, if I understand correctly, you see the machine run out of memory 2-4 times per day, is that right?
How quickly does the memory rate increase from when you reset it? Ideally we can attempt to capture what's happening with some ETW traces, but we probably will also want to capture a dump or two as the memory is increasing.
Today, I am able to replicate this issue in my development machine. It is increasing very rapidly, 4GB within 3 or 4 Seconds. I have created a gif on what happens. Strangely this happens only when I browse with older IE, but on Edge Chromium it is fine.
https://drive.google.com/file/d/1h31cbs5UGyXKa9fFJD7fKZ3TzclneBps/view
_I will try to capture with perfview and share in sometime_
I used fiddler to find the difference is in the requests, between IE and Edge Chromium.
Edge: Accept-Encoding: gzip, deflate, _br_
IE: Accept-Encoding: gzip, deflate
So, brotli (_iisbrotli.dll_), IIS Compression Module is the problem and it is working fine after uninstalling it.
@dylanvdmerwe Can you confirm?
https://docs.microsoft.com/en-us/iis/extensions/iis-compression/iis-compression-overview#installing-iis-compression
@balajigunasekaran now that is very interesting. I will try when the server is quiet and can handle some testing a bit later. I have brotli installed atm.
Uninstalling Microsoft IIS Compression
module and I cannot reproduce the runaway memory usage on w3wp when running InProcess.
Obviously now the server is not able to use gzip and br, but well done @balajigunasekaran on figuring out the issue. Probably needs to be raised here , but I am not sure of the turnaround as that repo looks very quiet.
_gzip & deflate_ will be natively supported by IIS. Just _brotli_ compression wont work.
https://docs.microsoft.com/en-us/iis/extensions/iis-compression/iis-compression-overview
We will live with gzip for now and wait until this issue is fixed, because its not a show stopper.
On my side, after uninstalling the compression module, things that were previously br compressed and now not being compressed at all, not even with gzip.
Nice job @balajigunasekaran. I have sent an e-mail to see about getting some eyes on this from the right folks. Stay tuned.
Oh, that is bad. Let me also check once now that at least gZip works.... fingers crossed
Yes, it works fine _IIS Default compression_ is kicked in and gZip is used.
You have to enable it here.
I have dynamic compression enabled on the server:
But now since uninstalled the IIS Compression Module, things are not even being deflate compressed.
No content-encoding: gzip.
I uninstalled the module from Add or Remove Programs. I have a feeling that something has been unregistered somewhere.
We used to have url rewriter configuration in web.config for forcing _br_ compression. Do You have anything like that?
Is it possible to test, by quickly hosting new localhost site, with some static dummy content and check default compression is applied?
Yes, there was a URL Rewrite rule for brotli. Disabling this reenabled gzip.
Very interesting evening and thanks for all your input @balajigunasekaran! Let's see what updates can happen on IIS Compression Module repo, and then we can test further to make sure this is resolved.
my server hangs too after installing brotli compression module. with InProcess asp.net core. IIS 10
after removing it , everything works normal. module downloaded here.
https://docs.microsoft.com/en-us/iis/extensions/iis-compression/iis-compression-overview
Most helpful comment
We are also facing exactly similar issue in our production servers(windows server 2019). We had to change the hosting model to _outofprocess_ to keep it running as of now. Now interestingly there is memory build up on w3wp.exe, but dotnet.exe runs fine.