Aspnetcore: Asp.net Core not Collecting Garbage

Created on 10 Aug 2018  路  66Comments  路  Source: dotnet/aspnetcore

Copied from https://github.com/aspnet/Home/issues/1976

_@sinapis commented on Dec 19, 2017_

Hi,

I have the same issue with ASP.NET Core 2. I've took a memory dump and tried analyzing. From what I see the problem is exactly as the OP said. My application starts with allocation about 75 MB, and very quickly it goes all the way to ~750MB, out of it 608MB is "Unused memory allocated to .NET".

First snapshot at app start:
image

Second snapshot after 3 minutes and 100 requests:
image

_@sebastienros commented on Apr 5_

  • We only measure this on 2.1, I will look into adding the same thing for 2.0
  • All comments seem to be related to to the LOH issue that was mentioned, which should be taken into account in the application and pool large arrays as much as possible

_@sinapis commented on Apr 8_

@sebastienros, several questions:

  1. I used Ants profiler to measure memory usage, according to it, no LOH fragmentation was detected. Can you advise how can I verify if my application suffers from LOH fragmentation issues?
  2. What are the results on .net core 2.1? Is the issue resolved because Kestrel is using Span?
  3. What if we can't pool arrays - can you provide a workaround? Should we use GCSettings.LargeObjectHeapCompactionMode.CompactOnce?

_@davidfowl commented on Apr 8_

What are the results on .net core 2.1? Is the issue resolved because Kestrel is using Span?

We personally haven't seen any evidence that the issue is in Kestrel. It still looks like an application problem.

What if we can't pool arrays - can you provide a workaround? Should we use GCSettings.LargeObjectHeapCompactionMode.CompactOnce?

_@sebastienros commented on Apr 10_

@sinapis
Would any of you be able to share an app that shows the issue? If I could repro it we would be able to find the reason, or at least remove some code piece by piece and isolate a minimal repro.

_@sinapis commented on Apr 10_

@sebastienros I can't share the app, but I can share the session from PerfView session + memory dump.
Some description: I have a ASP.NET Core 2 Web API, I've created a load test of 200 users all sending the same request over 10 seconds. Overall 775 requests were processed.

This app jumped to almost 1 GB memory usage in task manager and stayed like that. Looking at the dump I can count about 18 MB:

image

So the questions is where did almost 1 GB go?

_@sebastienros commented on Apr 10_

@sinapis Thanks

The behavior you are describing is not unexpected, the GC will allocated some memory as necessary on the peak load, and just release it over time. It's the GC Server mode, and usually wait for idle periods to release it and not affect your app perf. The amount of memory it will reserve depends of the total memory available on the system.

We would definitely see an issue if it kept increasing. I assume that if you don't send anymore requests and let your app run you will see the memory usage going down.

Could you run the same thing until it consumes most of your system memory? Or at least long enough with the same load that it will show it growing continuously? I will still get a look at your current dumps.

_@sebastienros commented on Apr 10_

Also can you take dumps during and at the end of the jobs, so we can see the detals.

_@Maoni0 commented on Apr 11_

@sinapis I looked at your ETW trace. it is puzzling to me - you survived very little in the last induced gen2 GC yet we still chose to keep that much memory committed. your application seems edge case (you mostly just did only background GCs due to LOH allocations) - I wonder if we have some accounting errors there (another possibility is errors in the numbers reported but if you already verified that you have that much committed that's a smaller possibility). if you could repro with something you can share with me, that'd be fantastic; otherwise if it's possible to run your app with some logging from GC (I can give you a commit that does that) that'd be helpful too.

_@sinapis commented on Apr 11_

@Maoni0 please share how should I enable GC logging
If there is some other data you would like me to provide in order to disprove accounting error please let me know what should I provide you with and how (maybe tell perfview to collect more data?)
I'll try creating a minimum repro, but not sure I'll succeed since I don't know where the problem is.

@sebastienros hopefully I will provide another dump with more memory consumption today

investigate

Most helpful comment

We have very few pinned objects (only what Kestrel pins) - and my demo app above demonstrates that this happens with or without. Assuming servers come with RAM increases is not safe. Intel releases X performance with new cores (or AMD, whatever floats your boat!) but we don't need more RAM. We get no benefit from it. As an SRE and the person who specs most of our server hardware, I'm going to have a hard time justifying "because the GC doesn't run" as the cause for needing to spend thousands more per box on memory.

Intel is planning on doubling the number of cores coming up, we don't need double the RAM. At least, not in the foreseeable future. I think any assumption that RAM goes with cores is not a valid one we should be basing technical decisions on here, FWIW.

All 66 comments

@sinapis

Are you still waiting from us to make progress on this issue?

Hi,

That project was abandoned but it's very important for us to get to the bottom of this issue. I could use the info how to record more info on GC.

Also, please describe what exactly I can provide that will help got to the bottom of this issue.

@sinapis are you trying to find out the issue you described above or just interested in recording more info in general? for the former, are you in a position to build a private coreclr.dll and use that in your project that's showing the symptom you described? if so I can just give you a commit with proper logging enabled.

@Maoni0 Yes, I can build a private coreclr.dll and use it. If there is are additional logging you would like me to add, I'd be more than happy to do so. In general, tell me what you need. I'll do whatever I can.

Environment

We are currently on .NET Core, version 2.1.300, but are installing 2.2 preview 3 for testing likely this week. I'll re-visit this issue then to confirm or show changes.

Details

We are seeing this at Stack Overflow with our canary .NET Core web socket server as well. We are observing 2 behaviors:

  1. GC does not appear to be running on it's own. We are seeing Gen 0 grow to gigabytes (at which point we have to kill it).
  2. When GC is forced to run, overall process memory is not freed.

Here's a before and after of a forced all-generations GC run:

|Before|After|
|------|-----|
|before|after|

Note that gen 0 is cleared, the long-lived websocket object trees (expected) are moved to gen 2, and overall memory usage drops to 530MB. However, overall process memory was at 2.8GB and remains at 2.8GB. Even though only 530MB is used internally.

We currently have a timer running inside the application forcing GC to get by for other testing, but that's not a great idea and really shouldn't be necessary. I currently consider this a blocker to deploying Stack Overflow on .NET Core. What can we do to help? We have memory dumps and can readily do analysis and testing to assist.

How's the fragmentation? That can cause symptoms like this when there's lots of pinned buffers.

How much memory is on the system?

There should be no new pinned buffers when using ASP.NET Core. The only pinned buffers are the ones Kestrel uses.

What we'll need:

  • Something we can repro consistently, and ideally locally
  • Memory dumps
  • Perfview traces under load

@Tratcher Attaching !dumpheap -stat and !eeheap-gc here: SocketServerDbg.txt (happy to do any other breakdowns as well) - this is on the 530MB dump (assuming you wanted highest fragmentation).

@sebastienros Specs:

  • 24 Cores, 48 logical processors
  • 64GB of RAM (43.3GB in use)

I don't think we can provide memory dumps here as they contain user cookies due to the nature of the application and that's a security issue, but we're more than happy to assist as best we can. On the perfview front: send instructions and happy to help.

How's the fragmentation? That can cause symptoms like this when there's lots of pinned buffers.

Looking at the dump doesn't look huge?

Total 5432168 objects
Fragmented blocks larger than 0.5 MB:
           Addr     Size      Followed by
00000218476b92e8    1.2MB 00000218477e12d8 System.Byte[]

GC does not appear to be running on it's own. We are seeing Gen 0 grow to gigabytes (at which point we have to kill it).

When you say "grow to gigabytes", this is not something that is unexpected. I understand you are aware that the Server GC will be triggered by memory pressure. If there is no pressure (you have 64GB) it doesn't have to be triggered. So it's all about how much available memory was left when you decided to kill it. Are these numbers the ones you shared before or were they higher that that?

Quoting the documentation:

Garbage collection occurs when one of the following conditions is true:

  • The system has low physical memory. This is detected by either the low memory notification from the OS or low memory indicated by the host.
  • The memory that is used by allocated objects on the managed heap surpasses an acceptable threshold. This threshold is continuously adjusted as the process runs.

These explanations are very fuzzy: "acceptable threshold". It doesn't tell what is acceptable or not, and you might have other acceptance thresholds ;)

When GC is forced to run, overall process memory is not freed.

Same. I assume that's an expected behavior, though it doesn't mean it's what should have happened, but nothing tells me it shouldn't behave this way from the data you shared.

@Maoni0 is this is an expected behavior based on the numbers that were provided?

I have never tried to run an ASP.NET app and experience how the GC behaves with such an amount of RAM to see what these thresholds look like.

Are these numbers the ones you shared before or were they higher that that?

Oh no, it just takes a while to grow (due to the long-lived nature of sockets, connections ramp up slowly). It grows well beyond what's here.
I've let it run since posting ~4 hours ago and I just took another dump. It's at 5GB from the Windows view. Here's the breakdown:
image

I understand being lazy and using available memory...but that's way beyond anything I've ever seen in Gen 0.

With every .NET full framework application we have today: GC runs as expected, and memory is freed instantaneously or very shortly after a run. This is critical behavior for a shared web tier. If every app only runs GC when there's memory pressure, you're lighting 8 sticks of dynamite at once all to go off at the same time and pegging CPU when they do get a pressure signal. I'm really hoping this is a GC bug, because if it's expected behavior we have legitimate concerns and blockers with such behavior.

Standing by to run whatever diagnostics will help. And we're more than happy to run a custom build with better logging, etc. Whatever we can do - this is my highest priority to fix. I'm happy to force a GC and get numbers, but will await feedback here before doing so - lest we lose data you'd like to see.

My assumption; perhaps incorrect, was a Gen0 would be triggered if it needed to allocate more from the OS (or after xN OS allocations)?

Would you like to try the workstation GC as a mitigation for now?

@sebastienros we have a mitigation (GC on a timer), but are more concerned with solving the root issue. I'm loading the latest dump up now to see what state the GC itself is in and see if I can find anything. Potentially relevant: we're using libuv as the base layer here, which may or may not be common to the other issues? I'll update with more info as I dig here.

Thread dump for the curious: SocketServerThreads.txt

@NickCraver are you on Windows?

and @sinapis I apparently completely dropped the ball on the logging part...really sorry about that! I dunno if it still helps to give you a commit to log now...

@Maoni0 Yep, Windows Server 2016 - happy to test a custom build if it'll give you more data.

@NickCraver could you please collect some GC ETW traces? you can use this commandline for perfview:

perfview /GCCollectOnly /nogui /accepteula /NoV2Rundown /NoNGENRundown /NoRundown /merge:true /zip:true /BufferSize:3000 /CircularMB:3000 /MaxCollectSec:7200 collect

this collects for 7200s. you can start it right before you start your process. if 2 hours isn't enough feel free to replace it with longer. please share the resulting trace.

@Maoni0 A few qeustions:

  • It's a shared web tier - is there a switch to restrict it to a single process/PID to eliminate lots of noise? (Last I'm aware, it's only machine-wide, but maybe there's a filter on the logging side?)
  • Does the process need to be restarted, or should I capture as it's running now? Or both?

You can't restrict to a single process, but share the process id and/or the startup dll name.
You don't have to restart the process.

Alrighty - capture is running now, I'll have results in a few hours - can we please arrange a secure/NDA (we already have NDA in place) upload in the mean time? craver at stackoverflow on my end

capturing it running while it already has a big gen0 is not going to help.

GC is not a point in time thing, in order to figure out why you are in a situation, we need to see the history. so we need a trace to capture enough history that led up to that point. so if it takes 2 hours before the gen0 gets to be a couple of GB, we need those 2 hours captured. or you could capture at least a while before it got to that point.

Understood - restarting the process and capture here.

Edit: Now restarted. I set the capture to 3 hours for a few GB (lower traffic at night...takes longer), and I received the folder invite to upload to - thanks!

@sebastienros @Maoni0 The .etl output is in the dropbox.

Process details:

  • PID: 14704
  • Command line: dotnet.exe c:\SocketServer\SocketServer2.dll --environment=NY-Prod --service
  • ~2.02GB windows memory at the end of the capture

is in the dropbox

Well, OneDrive you mean? ;)

that's one very odd trace...now I am very curious. the commandline was fine....it detects GCs in your other processes but none in the dotnet process. the only events it detected are some thread start/stop and some exceptions. you mentioned you had some dumps, if you could share one of them I'll check if there were indeed GCs happening. you could also capture another trace with this command line that collects more events -

perfview /nogui /accepteula /NoV2Rundown /NoNGENRundown /NoRundown /MaxCollectSec:300 /OnlyProviders:Clr:2:4,Clr:1:5 /KernelEvents=default /ClrEvents:GC+Stack+GCHandle /BufferSize:3000 /CircularMB:3000 /Merge:true /ZIP:true collect

it only collects for 300s which should be enough to give some idea of what's happening in the process.

According to the output of !eeheap -gc, this is a server with 40 cores. That means that each ephemeral segment is 1GB. Assuming that allocations are well-balanced across the heaps, isn't that expected that gen 0 will grow to multiple GBs in size?
I don't want to meddle with the investigation, but the base hypothesis seems to contradict what I know, so I'd be more than happy if somebody could point out where I'm wrong :)

Following @kevingosse remark, I have 2 comments:

  1. did you cut the output of !dumpheap -stat?
    the number of free objects seems large (+300.000 objects) even though only 1 is larger than 500KB
    we had this kind of scenario happening due to pinned buffers for asynchronous I/Os
    I would recommend to use the tiny tool I wrote to visualy "see these pinned objects (available from https://github.com/chrisnas/DebuggingExtensions)
    look for the vertical red lines and see if you find some at the end of your gen0 segments like in this screenshot (the white spaces are the "free" blocks):
    image
    this could explain why the "size" of gen stays high even after GC (due to the number of segments 40/48 ~ number of cores - see my next bullet point)
  1. did you cut the output of !eeheap?
    if your server has 48 cores, you should have 48 heaps but his output shows only 40
    --> did you use the settings provided by Maoni to limit the number of heaps + core affinity?

I hope this helps and good luck for your investigation ;^)

@Maoni0 A new ETL dump is in the folder (PerfViewData-Round2-PID-31100.etl.zip), details:

  • PID: 31100
  • Command line: dotnet.exe c:\SocketServer\SocketServer2.dll --environment=NY-Prod --service
  • ~360MB windows memory at the end of the capture

@chrisnas No truncation there on !dumpheap -stat, it was just after a full GC run, so a compaction happened...but the process didn't free the memory. I'm not sure what happened on the !eeheap copy, here's a re-run and me being sure everything got copied: eeheap.txt

And I'll give the tool a shot when at my desk, just doing some keep-it-going here from the laptop :)

Update: @mgravell is working on a synthetic load to hopefully get the same behavior now - I'll join in a bit.

If that's successful, we'll have a dump we can share. It might also result in a portable repro. If it does, we'll land that on GitHub. It's underway and we'll share everything we can.

Thanks for the eeheap.txt @NickCraver: it seems complete with the 48 heaps.
Even after compaction, you still might have some free objects (mostly due to pinned objects - maybe @Maoni0 could comment).
BTW, you can see the 300.000+ "free objects" in the same !dumpheap -stat.
The one that appears at the end of the output is just the only one larger than 500 KB.

Agree on all of that but...we're talking 2.8GB on 530MB used...that's over a 5:1 fragmentation ration. The root problem remains: GC doesn't run, which gets us into this state in the first place :)

Following @kevingosse comment, Budget for Gen0 is per heap, if allocation is well balanced you can allocate gen0budget * 48 before triggering a GC.

= Warning this is high speculation: =

When you start a server, usually you initialize (and then allocate) a bunch of objects that are long lived (mean survived). Budget depends on survival rate, so with a high survival rate, you increase your gen0 budget. When initialization is done, you may enter in a state where you have a large budget (AKA 100MB per heap =4.8GB total) and if per request you don't allocate a lot, you can have a long period before reaching the budget limit. again the allocation need to be well balanced across heap to reach this state.

just my 2 cents :)

Following @kevingosse comment, Budget for Gen0 is per heap, if allocation is well balanced you can allocate gen0budget * 48 before triggering a GC.

And with 64GB of RAM (43.3GB in use), system memory load won't trigger GCs either.

A lot of comments about gen 0 budget here and elsewhere, so I wrote the following simple test to start exploring that:
```c#
public static void Main(string[] args)
{
WriteLine("GC Info");
WriteLine(" GC Mode: " + (GCSettings.IsServerGC ? "Server" : "Client"));
WriteLine(" GC Latency Mode: " + GCSettings.LatencyMode.ToString());
WriteLine("Press any key to begin!");
ReadKey();
WriteLine("Starting allocation run...");

long allocatedBytes = 0;

while (true)
{
    Parallel.For(0, 1_000, _ =>
    {
        // Throw-away allocation - Gen 0 ahoy!
        var tc = new TestClass()
        {
            Object1 = new object(),
            Object2 = new object(),
            Object3 = new object(),
            Object4 = new object(),
            Object5 = new object(),
        };
    });
    long newAllocatedBytes = GC.GetTotalMemory(false);
    WriteLine($"Allocated: {newAllocatedBytes:N0} bytes");

    if (newAllocatedBytes < allocatedBytes)
    {
        WriteLine($"GC Detected (drop from {allocatedBytes:N0} to {newAllocatedBytes:N0} bytes) - press any key to continue!");
        if (ReadKey().Key == ConsoleKey.Escape)
        {
            return;
        }
    }
    allocatedBytes = newAllocatedBytes;

    Thread.Sleep(15);
}

}

public class TestClass
{
public object Object1 { get; set; }
public object Object2 { get; set; }
public object Object3 { get; set; }
public object Object4 { get; set; }
public object Object5 { get; set; }

public int Int1 { get; set; }
public int Int2 { get; set; }
public int Int3 { get; set; }
public int Int4 { get; set; }
public int Int5 { get; set; }

public Guid Guid1 { get; set; }
public Guid Guid2 { get; set; }
public Guid Guid3 { get; set; }
public Guid Guid4 { get; set; }
public Guid Guid5 { get; set; }

}

I have compiled this (Release) under `net472` and `netcoreapp2.1`. Results vary a bit, but always very close. Here's 3 runs from each scenario, on the same server config as all the other dumps and such. To replicate this on a client OS, control with the app.config in .NET 4.7.2:
```xml
<configuration><runtime><gcServer enabled="true" /></runtime></configuration>

Or in the .csproj for netcoreapp2.1:

<ServerGarbageCollection>true</ServerGarbageCollection>

Results

.NET 4.7.2: Client GC

GC Detected (drop from 15,625,584 to 324,640 bytes)
GC Detected (drop from 15,628,560 to 340,960 bytes)
GC Detected (drop from 15,618,312 to 315,464 bytes)

.NET Core 2.1 Client GC (see note below)

GC Detected (drop from 15,518,568 to 329,280 bytes)
GC Detected (drop from 15,488,344 to 271,752 bytes)
GC Detected (drop from 15,523,704 to 313,128 bytes)

.NET 4.7.2 Server GC

GC Detected (drop from 2,641,191,560 to 113,976 bytes)
GC Detected (drop from 2,606,708,392 to 474,768 bytes)
GC Detected (drop from 2,635,277,656 to 346,912 bytes)

.NET Core 2.1 Server GC

GC Detected (drop from 2,369,694,800 to 111,288 bytes)
GC Detected (drop from 2,471,212,872 to 110,472 bytes)
GC Detected (drop from 2,457,686,152 to 138,720 bytes)

Note 1: I disabled other processes on this server to eliminate the external memory pressure variable against runs...so everything was pretty consistent.

Note 2: Since GC mode "auto selects" on a server OS (on Windows Server 2016 but not on Windows 10 for example), I had to set SET COMPlus_gcServer=0 to get client GC working under netcoreapp2.1 on Windows Server 2016. None of the other configuration options worked, e.g. <ServerGarbageCollection>false</ServerGarbageCollection> in .csproj, manually editing the runtimeconfig.json to set it to false, or an app.config. I'm hoping that's not intended behavior and is a separate issue. But on a shared system, it's still an issue here if that's a proposed workaround for anything.

Note 3: Even after collection, in all cases, the process did not release the freed memory.

I'm not sure how Gen0 budget works overall, but my understanding is it has a few things that factor in. Perhaps the ASP.NET Core/Kestrel case with web sockets has slightly different factors that double the budget per segment resulting in our ~4.8GB before any collection kicks in? I'm definitely speculating there.

I'm posting this results because maybe this is "as it should be" (from a point of pure spec adhering), but at the same time isn't really reasonable either. On a shared server with a lot of apps all using obscenely high not-actually-needed amounts of gen 0 memory makes management of all things difficult and predictions and problem identifications much harder. From the outside, we can't tell a leak from a lack of GC until it's too late.

Are we way off base? What next steps should we take here?

I've opened https://github.com/dotnet/coreclr/issues/20911 on the GC Client/Server issue - I believe that's distinct from the issue here as the problem isn't with the switch, it's with running in Server mode in general.

as others have pointed out, you have 48 heaps, 2GB gen0 means ~43mb gen0/heap. that's not that big considering you do have pinning which is shown by the 2nd trace you uploaded. the 2nd trace does show one GC happening very early on when the heap was quite small so I am still not sure why the 1st trace showed no GCs at all.

Server GC does use a lot of memory when memory is plentiful - this is by design because it increases performance (we don't need to do GCs as frequently). pinning can be problem if you have pins that don't go away. with no pinning it should just shrink the heap when memory gets tight. with pinning, because we cannot move pins, and if they keep being alive they will stay there. what GC does is we try to keep as much fragmentation in gen0 as possible because the fragmentation can be used to satisfy allocation requests right away.

Even after collection, in all cases, the process did not release the freed memory.

I don't understand what you mean by "the free memory". could you please elaborate?

If this is really the expected behavior, I assume something like https://github.com/dotnet/coreclr/issues/11338 would be the solution.

if you have a dump from the case where it didn't do any GCs in 2 hours, I can also check the gen0 budget there to see if it's big enough to accommodate all the allocations during those 2 hours (could be possible if you haven't allocated much during those 2 hours).

@sebastienros or you can run it in a container if you simply want to limit that process's memory usage. on Windows this is a job object.

I don't understand what you mean by "the free memory". could you please elaborate?

As in windows still sees the overall memory usage, it doesn't go down any.

The GC behavior here would be good if it was a dedicated machine. But it's not great behavior in a shared server situation (which we have a lot of). This includes local development on Windows 10 (our only option for Visual Studio). As servers get more and more cores (Intel is going to double the amount of cores soon), are users expected to absorb double gen0 budgets as a result? It seems at some point we need this to be adjustable or the inherent defaults to account for overall limits. Core counts are adjusting fast, and with that variable in play without a corresponding control...things will get hairy. It's increasingly difficult to manage things on shared systems as a result, and the more cores that are in play, the more sharing that's likely...or likely to be more complex at least.

On the dumps front: we have sensitive data in there, but I'm happy to pull the data from the existing dumps if you've got any WinDBG pointers or other such that that's reasonable to convey (understandable if not).

@NickCraver: increasing the number of cores should come with a RAM increase. The sizing of servers stays a complicated problem. At work, we are going to linux containers managed by Mesos and I'm not sure that terminating a process when a threshold is detected is the right solution (it would be the same with a Windows Job).
If, as @Maoni0 said, you have pinned objects, feel free to use the tool I mentionned earlier: you will see if red vertical lines are near the end of the gen0 segments + full of white space: it would explain the difference between the size of "used" objects and the size of allocated segments. In addition, you will get the class name of these pinned instances as an hint of possible area of improvement.

@Maoni0: do you have a timeframe for having in .NET Core the settings you described in https://blogs.msdn.microsoft.com/maoni/2018/10/02/middle-ground-between-server-and-workstation-gc/?
The could be a solution to limit the gen0 growth and be part of the sizing of apps on a server.

At Windows level, the memory manager is able to deal with a lot of applications consuming "memory" beyond the available RAM; especially if the heaps contain objects that are no more accessed (would be easily paged out on very fast SSD). I don't have exact numbers about how many complete 4KB pages of gen0 are not accessed at all before a collection but could be important.

We have very few pinned objects (only what Kestrel pins) - and my demo app above demonstrates that this happens with or without. Assuming servers come with RAM increases is not safe. Intel releases X performance with new cores (or AMD, whatever floats your boat!) but we don't need more RAM. We get no benefit from it. As an SRE and the person who specs most of our server hardware, I'm going to have a hard time justifying "because the GC doesn't run" as the cause for needing to spend thousands more per box on memory.

Intel is planning on doubling the number of cores coming up, we don't need double the RAM. At least, not in the foreseeable future. I think any assumption that RAM goes with cores is not a valid one we should be basing technical decisions on here, FWIW.

@NickCraver: my point was more related to the number of applications you want to run on a machine (hence my Mesos example). The more apps, the more RAM.
However, I agree with you that each app should be able to "control" its high end memory consumption (the only way/workaround I know today with the CLR is to use the settings I mentionned). I don't believe in the hard limit that would either trigger an OOM on Windows with Jobs and process termination on Mesos.
This is a kind of new (and stronger) requirement for .NET (maybe @Maoni0 could comment). In addition, with the raise of containers, it would really help to avoid the "terminator" threshold. BTW, one of Criteo developers pushed a PR to take into account the per container (via cgroup) core count in a previous version of Core CLR for this exact same reason. IMHO, memory consumption limit should not be bound to number of CPU cores .

Maybe I'm just being obtuse in this case, but is there any reason the GC doesn't just run every so often? Not running in hours is our case. Is there a way to tell it to run on an interval, regardless of usage? That's effectively what our workaround way, a timer with a GC call...but that seems pretty silly for every application to do. It seems like a sanity check of:

Let's say once every 5 minutes: "have I run in 30 minutes? No? Okay run", otherwise do nothing. Is that an insane idea? I'd be surprised if such a thing hadn't been discussed before...I'm curious what the thinking is there.

Can bring down the SOH segment size via environment variable; prior to startup?

COMPlus_GCSegmentSize

Is hex value in bytes and > 4MB & multiple of 1MB, so 1000000, 2000000, 4000000, etc

According to docs it defaults are the following:

64bit: > 8 CPUs its 1GB on server; whereas 256MB on workstation
32bit: > 8 CPUs its 16MB on server; whereas 16MB on workstation

[Credits to @kevingosse [on twitter](https://twitter.com/KooKiz/status/1060980584611282945), for working out units and that its hex.]

That base level will be then multiplied by the number of heaps? Which leads also to the other knobs:

do you have a timeframe for having in .NET Core the settings you described in "Middle Ground between Server and Workstation GC"

Should be able to set them via the environment variables; prior to execution

COMPlus_GCHeapCount // starts at core count
COMPlus_GCNoAffinitize 
COMPlus_GCHeapAffinitizeMask 

@NickCraver We've implemented basically that in our base framework for WAS hosted services. There's some periodic handle usage by WAS or IIS to monitor the service (something like 3 Event, 1 Thread, and 1 something else handle if I remember correctly) that will eventually get released when the finalizer runs so it eventually gets cleaned up when GC happens.
But, we have some services that receive virtually no traffic and so never get GC run. The process handle count just keeps increasing by 5 every few seconds. After a few days the handle count and associated memory usage start getting absurd (10+GB on a shared web host).
We just check every hour to see if GC has run since the last check, and if not, request a collection.

Having the GC run "every so often" would eliminate the need for us to have this timer code.

Thinking about what's different...

Running raw Kestrel (without a Job object or cglmits) is going to behave to the GC defaults; though I assume @NickCraver is using IIS.

So question is does IIS use Job objects; where the GC maximums would be obeyed; however a difference with ASP.NET Core 2.x is it spawns a second dotnet process to run (2.2 will allow in-proc).

Does that second process then still obey the limits set by IIS?

In this case - no IIS (though we will be elsewhere). The web socket server is netcoreapp2.1 via dotnet.exe listening on a port directly with Kestrel (we can open source this if we need...it doesn't even depend on Stack Overflow code). I'm curious about all this all the same though, and how it'll apply to both this and our next applications (which will be in-proc 2.2).

ANCM uses job objects when creating a new dotnet.exe process: https://github.com/aspnet/AspNetCore/blob/master/src/IISIntegration/src/AspNetCoreModuleV2/OutOfProcessRequestHandler/serverprocess.cpp#L74.

@benaadams I'm confused about:

So question is does IIS use Job objects; where the GC maximums would be obeyed;

IIS doesn't have a GC. How would these maximums be set by IIS? Environment variables would be respected if WAS is restarted.

So question is does IIS use Job objects; where the GC maximums would be obeyed;

IIS doesn't have a GC. How would these maximums be set by IIS?

I'd assume it would use these settings?

image

Hitting those app pool limits just cause an app pool recycle ( spawns a new w3wp.exe while the existing finishes serving the current requests). IIS worker processes don't use job objects, the ASP.NET Core IIS module seems to be, but that's different.

I've seen Job objects mentioned here and at the parent issue as well and did a quick test and it does not seem that setting lets say a 100Mb memory limit on dotnet.exe via adding it to a job object is triggering GC's either. ( I could be doing it wrong).
Did a dummy web api and started hammering it and the memory usage kept climbing up until dotnet.exe died when hitting the limit.
I've used https://github.com/nccgroup/WindowsJobLock as below

.\Win.JobLock.exe -p 21780 -n dotnet -m $(100Mb)

Wanted to share my related experience, Hopefully it is relevant to this discussion.

Over the last one year we moved to a more micro servicey architecture and have 20+ containerized applications deployed on many XXL instances in AWS ECS.

Each deployable unit based on microsoft/dotnet:2.1-aspnetcore-runtime image

We do a lot of caching, and work offloading via queues and workers, hence more than 50% of our http services, except for the user facing ones , are hit at avg ~200 requests per minute.

Keeping this is mind we had set our resource allocation for these sparsely hit services very modestly. 128mb memory soft limit and 512mb hard limit.

As soon as we released, we noticed that the memory graph was consistently at 400%, and the CPU was extremely spikey, perf was also erratic. Since we have hard limits set on our containers, unlike what OP observed, we consistently ran into memory pressure which caused GC, which caused CPU spikes, alerts, restarts, hairloss.

After noticing that other users have had a similar experience , we decided to allocate each service with 1024MB memory limit. Which felt too excessive given what these services do. However, as soon as we did that, everything stabilized. Stable lines, no spikes, no restarts. Even if we turned off caching, the services handled the load flawlessly 馃憦 .

However, this is not sustainable. We are enjoying the small small many independent services, but giving each service minimum 1GB becomes a bottleneck. Our solution so far has been to add more instances to our cluster to accommodate new services. Its getting too expensive :( .

I love C# and .Net, so please forgive me for calling this out, but we recently deployed 3 of our new services in Go, and they have been chugging along fine with just 128mb hard memory limit. For this reason primarily, it is becoming hard to justify using aspnet core for our newer services.

@arishlabroo could that be this bug - https://github.com/dotnet/coreclr/issues/19060 and corresponding PR https://github.com/dotnet/coreclr/pull/19518 ?

@kierenj 馃, seems possible, but wouldn't the latest microsoft/dotnet:2.1-aspnetcore-runtime image include that fix? Since the bug was closed 3 months ago.

@arishlabroo I think that fix landed in 2.1.5 that was released month ago October 2, 2018.

@arishlabroo IMO this kind of many small services that are limited in memory should use the workstation GC. The processes would then hardly go over 100MB, and most probably less than 50MB.

@arishlabroo as @sebastienros pointed out, have you tried Workstation GC? if not, would you mind trying it out?

also I've seen this WindowsJobLock tool used by folks - this is NOT going to work for limiting GC heap usage because the physical limit is read when the runtime is loaded. if you wait till the process is already running and then specify the limits they are not going to get read, period.

I'm actually looking at processes running within a job with small memory limits this week so I will publish a blog entry to talk about my findings. for folks who want to specify a small limit, Workstation GC is certainly a good starting point if you are currently using Server GC.

@Maoni0 @sebastienros is it possible to provide some recommendations?
If I'm running Kestrel with 1 cpu assigned. At a low memory limit I should be using workstation GC. Is there a higher limit (what?), at which it makes sense to change to server GC?
If I'm running Kestrel on N cpus with server GC. Is there a recommended amount of RAM I should provision per core?

@tmds for sure...I will be providing guidelines in my blog entry. thanks for your patience!

Was this page helpful?
0 / 5 - 0 ratings