Here are the reasons I'm aware of that a .NET process could raise an OutOfMemoryException
:
OutOfMemoryException
by developer code (like Bitmap.Clone). new string('c', int.MaxValue)
new
operation. I'm sure this is not an exhaustive list though. I'm curious what the other reasons are that a process could OOM in .NET. Is there a good list somewhere I can read through?
Asking because recently I've been looking into a number of OOM bugs for Roslyn and I'm having problems identifying the cause. Specifically we're seeing the following patterns:
OutOfMemoryException
. There is no pattern to which test causes the OOM and it happens infrequently enough that we don't have good historical data on it (this is changing going forward). Hoping that learning more about OutOfMemoryException
will give me new insight into these issues.
Is pinned buffers with heap fragmentation leading to oom exception still a thing or is that only relevant for 32bit?
Another reason for OOM that I've seen recently is getting out of open files limit on Unix. That is also reported as out of memory. Sockets count against that limit too. On my Ubuntu 16.04, the default limit is 1024 files. See https://github.com/dotnet/coreclr/issues/25645 for an example of such OOM.
On 32 bit platforms, the OOMs due to limited VA space can also happen when too many threadpool threads are created. Each thread allocates VA space for its stack and multiplied by the number of threads, it can eventually consume the whole available VA space. I was looking into such case on ARM some time ago too.
when you get an OOM from a managed process, the first and foremost thing to identify is whether you are getting it from a managed allocation or a native allocation. !ao will tell you. you could set your post mortem debugging to run this sos command (and better yet, capture a dump if it's not easy to repro).
you can of course get OOM from running out of physical storage, VA is not the only limiting factor when you need more memory.
over the years I fixed a few premature OOM bugs in GC (mostly in Server GC which VBCSCompiler does use I think last I heard); I haven't seen any in a while.
Also, on Windows its quite possible to OOM on running out of commit space. In general, when memory is allocated on Windows using VirtualAlloc, memory isn't actually physically assigned when allocated, but instead when the memory is first touched, the OS will assign memory to the application, and the memory usage number will go up. However, even before the memory is touched, after a VirtualAlloc to commit memory, the committed memory is increased. If that committed memory number ever exceeds the amount of physical RAM + swap space on the machine, then the OS will treat it the same as running out of physical memory, and an OOM will be triggered. This is most commonly a problem on machines where the number of threads is high relative to the amount of RAM+swap, or if swap is disable as each thread in Windows commits by default 1MB.
As an example, as I'm typing now, I have an MicrosoftEdgeCP.exe that is using 59,276KB of RAM (private working set), but has 234,908KB of commit in use.
yep, that's what I meant by "physical storage". on Windows, as you pointed out, you get OOM at the commit time if you don't have enough physical mem + page file to accommodate the commit. this means as long as commit succeeds, you are guaranteed to not get OOM. it's a very predictable behavior as far as OOM is concerned; on Linux it's different - you can OOM as you actually touch pages for the first time to bring them into physical memory.
Yeah, I just wanted to drop in some explanation that might help if @jaredpar is looking at perf counters or something. There are such a multitude of them, that its fairly easy to not realize that when looking for OOM on a Windows box, looking at the most obvious one (private working set) which is displayed in TaskManager by default isn't enormously useful.
Are there any hints in a dump file that would help me determine if any of these commit situations were hit?
!ao would show you whether you have a commit failure if you are getting OOM due to an allocation on the GC heap, or if the GC has trouble getting memory for its own bookkeeping purpose.
for other scenarios where you might get OOM like from native allocations, "!address -summary" which will show you how much memory is already committed.
another thing that can help is to set this env var:
COMPlus_GCBreakOnOOM
to 1 which will do a DebugBreak() as soon as GC observes an OOM. so you don't end up running a lot of stuff (which may very well change the memory situation) before you actually get to process the OOM exception.
There is one special OOM that seems to be somewhat unknown. It has a different text than the usual OOM: "Insufficient memory within specified address space range to continue the execution of the program."
Some of it is documented in this KB (for which I had to do some arm wrestling to get published, including writing most of the content):
https://support.microsoft.com/kb/3152158
Edit: The content seem to have been heavily modified since then.
Short story: The late-bound jumps in an assembly are +/- 2 GB (signed 32-bit) relative jumps. If the jump has to be further an 64-bit absolute jump trampoline is created. But if the CLR can't find space for the trampoline within +/- 2 GB we get this OOM from the CLR.
Short story: The late-bound jumps in an assembly are +/- 2 GB (signed 32-bit) relative jumps. If the jump has to be further an 64-bit absolute jump trampoline is created. But if the CLR can't find space for the trampoline within +/- 2 GB we get this OOM from the CLR.
BTW if anyone's interested, the BotR page on jump stubs is an interesting and detailed read
@jaredpar hmm... I was hitting OOM quite a lot when compiling recently; turns out I was actually running out of commit space as page file was set too low. Did take me a little while to workout why 馃槃
@jaredpar Did you figure out why you got the OOMs in Roslyn?
@WebDancer69
@benaadams scenario is a plausible reason for why we get some of our customer reported OOMs (via say Watson). It's hard to verify though from a Watson that this is indeed the cause.
As for our unit tests I haven't found a good answer for that yet.
@WebDancer69
@benaadams scenario is a plausible reason for why we get some of our customer reported OOMs (via say Watson). It's hard to verify though from a Watson that this is indeed the cause.
As for our unit tests I haven't found a good answer for that yet.
I think this is answered @jaredpar. Please reopen if not.
Most helpful comment
Also, on Windows its quite possible to OOM on running out of commit space. In general, when memory is allocated on Windows using VirtualAlloc, memory isn't actually physically assigned when allocated, but instead when the memory is first touched, the OS will assign memory to the application, and the memory usage number will go up. However, even before the memory is touched, after a VirtualAlloc to commit memory, the committed memory is increased. If that committed memory number ever exceeds the amount of physical RAM + swap space on the machine, then the OS will treat it the same as running out of physical memory, and an OOM will be triggered. This is most commonly a problem on machines where the number of threads is high relative to the amount of RAM+swap, or if swap is disable as each thread in Windows commits by default 1MB.
As an example, as I'm typing now, I have an MicrosoftEdgeCP.exe that is using 59,276KB of RAM (private working set), but has 234,908KB of commit in use.