Runtime: Reasons for OutOfMemoryException

Created on 20 Aug 2019 · 17Comments · Source: dotnet/runtime

Here are the reasons I'm aware of that a .NET process could raise an OutOfMemoryException:

Explicit throw of OutOfMemoryException by developer code (like Bitmap.Clone).
Allocation of a single object that exceeds inherent CLR limitations. For instance new string('c', int.MaxValue)
A 32 bit process actually running out of virtual address space to use and failing on a simple new operation.

I'm sure this is not an exhaustive list though. I'm curious what the other reasons are that a process could OOM in .NET. Is there a good list somewhere I can read through?

Asking because recently I've been looking into a number of OOM bugs for Roslyn and I'm having problems identifying the cause. Specifically we're seeing the following patterns:

OOM running VBCSCompiler. Given this is a 64 bit process it's hard to understand why this would generate an OOM.
Running our unit tests for dotnet/roslyn. They will fail with an OutOfMemoryException. There is no pattern to which test causes the OOM and it happens infrequently enough that we don't have good historical data on it (this is changing going forward).

Hoping that learning more about OutOfMemoryException will give me new insight into these issues.

area-Meta question untriaged

Source

jaredpar

👀7

Most helpful comment

Also, on Windows its quite possible to OOM on running out of commit space. In general, when memory is allocated on Windows using VirtualAlloc, memory isn't actually physically assigned when allocated, but instead when the memory is first touched, the OS will assign memory to the application, and the memory usage number will go up. However, even before the memory is touched, after a VirtualAlloc to commit memory, the committed memory is increased. If that committed memory number ever exceeds the amount of physical RAM + swap space on the machine, then the OS will treat it the same as running out of physical memory, and an OOM will be triggered. This is most commonly a problem on machines where the number of threads is high relative to the amount of RAM+swap, or if swap is disable as each thread in Windows commits by default 1MB.

As an example, as I'm typing now, I have an MicrosoftEdgeCP.exe that is using 59,276KB of RAM (private working set), but has 234,908KB of commit in use.

davidwrighton on 21 Aug 2019

👍6

All 17 comments

Is pinned buffers with heap fragmentation leading to oom exception still a thing or is that only relevant for 32bit?

Tornhoof on 20 Aug 2019

Another reason for OOM that I've seen recently is getting out of open files limit on Unix. That is also reported as out of memory. Sockets count against that limit too. On my Ubuntu 16.04, the default limit is 1024 files. See https://github.com/dotnet/coreclr/issues/25645 for an example of such OOM.

janvorli on 20 Aug 2019

👍1

On 32 bit platforms, the OOMs due to limited VA space can also happen when too many threadpool threads are created. Each thread allocates VA space for its stack and multiplied by the number of threads, it can eventually consume the whole available VA space. I was looking into such case on ARM some time ago too.

janvorli on 20 Aug 2019

👍1

when you get an OOM from a managed process, the first and foremost thing to identify is whether you are getting it from a managed allocation or a native allocation. !ao will tell you. you could set your post mortem debugging to run this sos command (and better yet, capture a dump if it's not easy to repro).

you can of course get OOM from running out of physical storage, VA is not the only limiting factor when you need more memory.

over the years I fixed a few premature OOM bugs in GC (mostly in Server GC which VBCSCompiler does use I think last I heard); I haven't seen any in a while.

Maoni0 on 20 Aug 2019

👍2

As an example, as I'm typing now, I have an MicrosoftEdgeCP.exe that is using 59,276KB of RAM (private working set), but has 234,908KB of commit in use.

davidwrighton on 21 Aug 2019

👍6

yep, that's what I meant by "physical storage". on Windows, as you pointed out, you get OOM at the commit time if you don't have enough physical mem + page file to accommodate the commit. this means as long as commit succeeds, you are guaranteed to not get OOM. it's a very predictable behavior as far as OOM is concerned; on Linux it's different - you can OOM as you actually touch pages for the first time to bring them into physical memory.

Maoni0 on 21 Aug 2019

Yeah, I just wanted to drop in some explanation that might help if @jaredpar is looking at perf counters or something. There are such a multitude of them, that its fairly easy to not realize that when looking for OOM on a Windows box, looking at the most obvious one (private working set) which is displayed in TaskManager by default isn't enormously useful.

davidwrighton on 22 Aug 2019

Are there any hints in a dump file that would help me determine if any of these commit situations were hit?

jaredpar on 22 Aug 2019

!ao would show you whether you have a commit failure if you are getting OOM due to an allocation on the GC heap, or if the GC has trouble getting memory for its own bookkeeping purpose.

for other scenarios where you might get OOM like from native allocations, "!address -summary" which will show you how much memory is already committed.

Maoni0 on 22 Aug 2019

👍2

another thing that can help is to set this env var:
COMPlus_GCBreakOnOOM
to 1 which will do a DebugBreak() as soon as GC observes an OOM. so you don't end up running a lot of stuff (which may very well change the memory situation) before you actually get to process the OOM exception.

Maoni0 on 22 Aug 2019

👍1

There is one special OOM that seems to be somewhat unknown. It has a different text than the usual OOM: "Insufficient memory within specified address space range to continue the execution of the program."

Some of it is documented in this KB (for which I had to do some arm wrestling to get published, including writing most of the content):
https://support.microsoft.com/kb/3152158

Edit: The content seem to have been heavily modified since then.

Short story: The late-bound jumps in an assembly are +/- 2 GB (signed 32-bit) relative jumps. If the jump has to be further an 64-bit absolute jump trampoline is created. But if the CLR can't find space for the trampoline within +/- 2 GB we get this OOM from the CLR.

WebDancer69 on 11 Sep 2019

👀2

Short story: The late-bound jumps in an assembly are +/- 2 GB (signed 32-bit) relative jumps. If the jump has to be further an 64-bit absolute jump trampoline is created. But if the CLR can't find space for the trampoline within +/- 2 GB we get this OOM from the CLR.

BTW if anyone's interested, the BotR page on jump stubs is an interesting and detailed read

mattwarren on 11 Sep 2019

@jaredpar hmm... I was hitting OOM quite a lot when compiling recently; turns out I was actually running out of commit space as page file was set too low. Did take me a little while to workout why 😄

benaadams on 23 Sep 2019

@jaredpar Did you figure out why you got the OOMs in Roslyn?

WebDancer69 on 25 Feb 2020

@WebDancer69

@benaadams scenario is a plausible reason for why we get some of our customer reported OOMs (via say Watson). It's hard to verify though from a Watson that this is indeed the cause.

As for our unit tests I haven't found a good answer for that yet.

jaredpar on 25 Feb 2020

@WebDancer69

@benaadams scenario is a plausible reason for why we get some of our customer reported OOMs (via say Watson). It's hard to verify though from a Watson that this is indeed the cause.

As for our unit tests I haven't found a good answer for that yet.

jaredpar on 25 Feb 2020

I think this is answered @jaredpar. Please reopen if not.