Node: Create heapdump on out of memory

Created on 3 May 2019 · 32Comments · Source: nodejs/node

Introduction
This issue is a followup of https://github.com/nodejs/node/issues/23328. The outcome of that issue was the introduction of the v8.writeHeapSnapshot() API.

Next step would be to introduce a way of handling an out of memory situation in JS context.
When running nodejs processes in a low memory environment, every out of memory that occurs is interesting. To figure out why a process went out of memory, a heapdump can help a lot.

Desired solution
There are several possible solutions which would suffice:

introduce an event like process.on('fatal_error'), which kicks in for an OoM event. (see https://github.com/nodejs/diagnostics/issues/239#issuecomment-427600405). The question is if it's feasible to execute JS code after the 'fatal_error' occurred?
add a CLI flag which enables automatic heapdumps on OoM. This might be more feasible, as it doesn't need to be a JS API and thus can be fixed in native code instead of JS. Downside is that it's not flexible.
...

Alternatives
At the moment, we use our own module. This uses native code to hook into the SetOOMErrorHandler handler of V8.
This works although it's not very elegant.

feature request memory

Source

paulrutter

👍8

Most helpful comment

Also, as discussed in https://github.com/nodejs/node/pull/33010#discussion_r414011746 having this implemented in Node.js core, instead of as an addon, may help us avoid the situation where the snapshot generation triggers a system OOM due to the additional native memory overhead, because our own implementation have access to the parameter used initially to configure the V8 heap, so we can do some calculations with the information to avoid this as much as we can.

joyeecheung on 24 Apr 2020

👍4

All 32 comments

I'm not very keen to have this in Node.js core at the moment because it usually takes a lot of memory to take a heap snapshot. If the system is already on a V8 OOM situation, trying to take a heap snapshot can lead to an operating system OOM (which might kill other important processes).

If we could drastically reduce the memory used to take the heap snapshot, I would be happy to see this feature introduced in core.

mmarchini on 4 May 2019

👍4

introduce an event like process.on('fatal_error')

It is possible to execute JS at that point, but in general I don't think it's a good idea to open this opportunity up, as when there is a fatal error we need to be very careful about what we execute (it's similar to signal handlers in some way).

add a CLI flag which enables automatic heapdumps on OoM. This might be more feasible

This looks more promising (or just provide heap snapshots as one of the actions that can be specified to be done when a fatal error occurs, as we already make it possible to trigger node-report in the fatal error handler), though as @mmarchini points out it's at the users' own risk if they want to do that.

joyeecheung on 4 May 2019

👍2

I would be happy to go for option 2, to make it another configurable cli option. Memory use of the heapdump is indeed a risk, depending on how you use it.
In my usecase, nodejs processes run with restricted old space size, within Docker. So there is always enough memory to make the heapdump. The Docker memory limit has to be at least twice the amount of the nodejs process, to be safe.

paulrutter on 5 May 2019

I'd agree that a CLI flag makes sense, we should look at the existing option for generating a node-report and make it consistent with that.

mhdawson on 9 May 2019

@paulrutter if you make a snapshot you still will not have anything to compare with, because you need at least one more snapshot. I can assume that the answer to the question: "why do you need a heapdump on OOM" is "inspect state of memory". For this case you can use flag --abort-on-uncaught-exception and create a coredump (stack trace + heapdump) on "process abort". And then you can explore the memory with llnode. It may take a little longer, but it definitely works.
Thanks.

matvi3nko on 25 Aug 2019

@matvi3nko Comparison is only needed when you suspect a memory leak, but this is not always the reason of an out of memory situation. More often, the code being executed is not memory efficient (for example: reading a whole file in memory at once instead of streaming). A single heapdump would show such issues, without the need for comparison.

I tried llnode in the past and found it not very user friendly to use. Of course this is more of an experience issue at my end, but still i think it would be beneficial to other Node.js users to have a more entry-level heapdump generation process in place.
Compare the use of Chrome DevTools to analyze a heapdump to llnode; it's on a whole different level.

paulrutter on 26 Aug 2019

With the latest Node.js 12.11.1, the node-oom-heapdump module fails with the following message:

<--- JS stacktrace --->
Cannot get stack trace in GC.
Generating Heapdump to 'C:\git\node-oom-heapdump\tests\my_heapdump.heapsnapshot' now...


#
# Fatal error in , line 0
# unreachable code
#
#
#
#FailureMessage Object: 0000006BE0FF7890

The API's for creating a heapdump do not have seem to be changed. It seems that calling "createHeapSnapshot" no longer works in the context that it did before.
But maybe i should ask the v8 team for help on this issue.

Is there any progress made on adding the functionality to Node.js core?

paulrutter on 6 Oct 2019

Does it work on v12.10?

mmarchini on 9 Oct 2019

No, it doesn't. Same behavior.

paulrutter on 9 Oct 2019

What about 12.0.0, 12.3.0 and 12.5.0 (those are the V8 bumps during Node.js v12)? If the issue is on V8 it's good to narrow down which version it started.

Also, you should be able to get a core dump if this is throwing a Fatal error. This will allow you to print the native call stack, which should help finding the issue.

To generate a core dump, run:

ulimit -c unlimited
node your-code.js

And then open it with gdb or lldb to get the stack trace:

gdb core  # lldb /cores/core.PID  if your're on OS X
(gdb) bt

Post the ~core dump~ stack trace here, should help narrow down the issue.

EDIT: Don't post the core dump, it's a bad idea :sweat_smile:

mmarchini on 9 Oct 2019

Thanks, will try to narrow the issue down and come back with the results.

paulrutter on 10 Oct 2019

I looked into the issue, and found out that on Node.js 12 (doesn't matter which minor version) the node-oom-heapdump module works well as long as the following flags are not used:

--optimize_for_size --always_compact

When these flags are used, the behavior is a bit unpredictable.
Sometimes it completes, but more often it fails with the following stacktrace:

gdb node core.<pid>

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `node --max_old_space_size=40 --optimize_for_size --always_compact --inspect=999'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000ce249d in v8::internal::Heap::GarbageCollectionPrologue() ()
(gdb) bt
#0  0x0000000000ce249d in v8::internal::Heap::GarbageCollectionPrologue() ()
#1  0x0000000000ceba22 in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#2  0x0000000000cec25f in v8::internal::Heap::PreciseCollectAllGarbage(int, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#3  0x0000000000f6eb1f in v8::internal::HeapSnapshotGenerator::GenerateSnapshot() ()
#4  0x0000000000f60793 in v8::internal::HeapProfiler::TakeSnapshot(v8::ActivityControl*, v8::HeapProfiler::ObjectNameResolver*) ()
#5  0x00007ff22cf1dba7 in OnOOMError(char const*, bool) () from /nodeapp/work/node12/package/build/Release/node_oom_heapdump_native.node
#6  0x0000000000b32d90 in v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) ()
#7  0x0000000000b33139 in v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) ()
#8  0x0000000000cde455 in v8::internal::Heap::FatalProcessOutOfMemory(char const*) ()
#9  0x0000000000d0c093 in v8::internal::EvacuateNewSpaceVisitor::Visit(v8::internal::HeapObject, int) ()
#10 0x0000000000d13f70 in void v8::internal::LiveObjectVisitor::VisitBlackObjectsNoFail<v8::internal::EvacuateNewSpaceVisitor, v8::internal::MajorNonAtomicMarkingState>(v8::internal::MemoryChunk*, v8::internal::MajorNonAtomicMarkingState*, v8::internal::EvacuateNewSpaceVisitor*, v8::internal::LiveObjectVisitor::IterationMode) ()
#11 0x0000000000d211f8 in v8::internal::FullEvacuator::RawEvacuatePage(v8::internal::MemoryChunk*, long*) ()
#12 0x0000000000d05b5e in v8::internal::Evacuator::EvacuatePage(v8::internal::MemoryChunk*) ()
#13 0x0000000000d05e27 in v8::internal::PageEvacuationTask::RunInParallel(v8::internal::ItemParallelJob::Task::Runner) ()
#14 0x0000000000cfb315 in v8::internal::ItemParallelJob::Task::RunInternal() ()
#15 0x0000000000cfb724 in v8::internal::ItemParallelJob::Run() ()
#16 0x0000000000d154b7 in void v8::internal::MarkCompactCollectorBase::CreateAndExecuteEvacuationTasks<v8::internal::FullEvacuator, v8::internal::MarkCompactCollector>(v8::internal::MarkCompactCollector*, v8::internal::ItemParallelJob*, v8::internal::MigrationObserver*, long) ()
#17 0x0000000000d23784 in v8::internal::MarkCompactCollector::EvacuatePagesInParallel() ()
#18 0x0000000000d2439a in v8::internal::MarkCompactCollector::Evacuate() [clone .constprop.1218] ()
#19 0x0000000000d29587 in v8::internal::MarkCompactCollector::CollectGarbage() ()
#20 0x0000000000ce9fa9 in v8::internal::Heap::MarkCompact() ()
#21 0x0000000000cead13 in v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) ()
#22 0x0000000000ceb885 in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#23 0x0000000000cee298 in v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) ()
#24 0x0000000000cb4bc7 in v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType) ()
#25 0x0000000000feaafb in v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) ()
#26 0x000000000136d539 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit () at ../../deps/v8/../../deps/v8/src/builtins/base.tq:3028
#27 0x000013f93c8c5f5f in ?? ()
#28 0x0000000000000000 in ?? ()

So, i'm not sure if i need to follow up on this.
Maybe it's just the combination of my test case and the node flags that gives the unpredictable behavior. I'll just need to try it in a more real-life scenario and see what happens.

paulrutter on 16 Oct 2019

Found this issue while searching for a solution similar to java heap dump on oom

While llnode may do all the required things, it's not the tool that is familiar to JS developers, whereas dev tools are.

And as for heapdump generation, I think that double memory requirement is not an option for general usage. If your process is already flagged to be terminated, there should be a way to stop the world and stream heap contents directly to fs without creating an intermediate object.

betalb on 17 Mar 2020

And as for heapdump generation, I think that double memory requirement is not an option for general usage. If your process is already flagged to be terminated, there should be a way to stop the world and stream heap contents directly to fs without creating an intermediate object.

Heapdumps are essentially graphs of the live objects on the heap. Creating that graph is what takes up a lot of memory. It's unavoidable.

A coredump doesn't have that problem because it doesn't create a graph, it simply dumps the heap as a byte array.

One possible way forward is to create a tool (possibly integrated into the node binary) for transforming a coredump into a heapdump.

I wrote a tool like that for Node.js v0.10 once but that's totally antiquated by now, of course. :-)

bnoordhuis on 17 Mar 2020

One possible way forward is to create a tool (possibly integrated into the node binary) for transforming a coredump into a heapdump.

I think it is really a way to go

But after playing some time with core dump generation, I would say that not only tooling, but core dump generation for process that runs inside of container is an extremely difficult thing to get right, and the main issue here is configuring core dump location which can be done only on the host machine and is probably not an option. Until core_pattern namespacing is not implemented in kernel

betalb on 17 Mar 2020

Node.js doesn't have direct access to the heap, so generating a heapdump is not something we can do out of the box. This is a feature request for V8 (https://bugs.chromium.org/p/v8/issues/list).

mmarchini on 17 Mar 2020

I've failed to find this issue and created #32756 (already closed it). A huge +1 from me (and, as I believe, from many node users) for the CLI flag option.

@paulrutter did you already open an issue against V8?

puzpuzpuz on 10 Apr 2020

@puzpuzpuz No, i haven't gotten to it yet.

paulrutter on 10 Apr 2020

👍1

As the heapdump is a metadata, not a raw memory dump, why would the dump generator take as much memory as the heap? May be there is an improvement opportunity in V8?

gireeshpunathil on 10 Apr 2020

👀2

May be there is an improvement opportunity in V8?

Open-ended questions like that aren't useful. When is there _not_ room for improvement?

The 2x is a rule of thumb - i.e., a decent assumption, not a hard rule. The lower bound for most programs is about 33% (1 pointer per object where the average object is 3 pointers big.)

But before someone goes "oh, so it's only one-third": lower bound != average.

bnoordhuis on 10 Apr 2020

Open-ended questions like that aren't useful. When is there not room for improvement?

@bnoordhuis - according to me, the effort required to generate a dump involves traversing the object graph and recording the reference tree and the size information (for example our own MemoryRetainer). If footprint for this effort grows in proportion to the heap size, it is reasonable to expect room for improvement?

I don't know how the snapshot is collected, that is why I used may be and put it across as a question. It becomes useful when someone from v8 investigates and / or makes observations.

gireeshpunathil on 10 Apr 2020

If footprint for this effort grows in proportion to the heap size, it is reasonable to expect room for improvement?

Maybe partially. Probably not easily.

The current heap snapshot generator uses additional memory because it creates persistent snapshots. They remain valid after they're created; e.g., JS strings are copied.

A zero-copy, one-shot generator is conceivable but computing the graph edges is still going to require additional memory.

N objects that point to M other objects on average is N*M relations that need to be recorded.

bnoordhuis on 10 Apr 2020

👍3

@bnoordhuis - understood all other points, thanks.

N objects that point to M other objects on average is N*M relations that need to be recorded.

agree, but that implies the generator footprint is a function of number of objects in the heap and their cross-references, not on the size of the heap?

gireeshpunathil on 10 Apr 2020

I don't really see much way around the memory requirements with the current snapshot approach. We'd really need to move to a heap tracing model that can stream out events as they happen and that would allow a graph to be build via post processing.

jasnell on 10 Apr 2020

that implies the generator footprint is a function of number of objects in the heap and their cross-references, not on the size of the heap?

Note that I was careful to write "live objects on the heap" in https://github.com/nodejs/node/issues/27552#issuecomment-600035110. :-)

In an out-of-memory condition they're roughly equivalent though. The heap is so full with live objects that there isn't room for more.

bnoordhuis on 10 Apr 2020

V8 provides v8::Isolate::AddNearHeapLimitCallback() for adjusting the heap limit when V8 is approaching it. The debugger implementation in V8 uses this to break at the point where the heap size limit is near in the Chrome DevTools. I did a proof of concept --heapsnapshot-near-heap-limit implementation (https://github.com/nodejs/node/pull/33010), it does work if I just write a snapshot to disk in that callback. I temporarily raise the heap size limit to a value slightly bigger than the original limit until the snapshot is done, and tells V8 to restore to the initial limit later. There are some observations with this approach:

Once the heap limit is raised, V8 can't restore the heap limit back to the initial one until the heap usage fall back under the initial limit. So for a heap with unbounded growth, if we set the new limit to 2x, this means the process will eventually crash with 2x of the original heap limit, effectively raising the max heap size to 2x of the original setting. We could try some tricks to adjust this limit on-the-fly, I don't have any good ideas about this.
- We could make that new limit configurable by user, of course.
- Somehow just setting it to initial limit + 1 doesn't seem to lead to any issues, even for relatively big heaps. Maybe V8 doesn't really use the memory from the managed heap that much to do the snapshotting (I have not measured the RSS yet).
When the process is approaching that limit, it could spend a lot of time writing snapshots - multiple ones, because the snapshot creation triggers GC eagerly, so the process will live a bit longer than usual, hitting the limit back and forth. Then compared to doing nothing at all, the process will stay irresponsive for a longer period of time before it crashes. For example, I tweaked the max limit to initial limit + 1, and with test/fixtures/workload/allocation.js and --max-old-space-size=100, without using --heapsnapshot-near-heap-limit, the process crashes in 20s after 73 GCs. With the option on it crashes in 121s after 130 GCs, leaving 12 snapshots of size 140-170MB on disks.
- We could also limit the amount of snapshot we want to write with this option, to avoid keeping the process in limbo for longer than it should. This can also be configurable.
- This actually should help with the issue raised in https://github.com/nodejs/node/issues/27552#issuecomment-524653540 - with multiple snapshots the users can find out what's growing inside the heap and with our existing snapshot naming machinery the files are aptly named in a way for you to locate them in order ( it goes like Heap.20200423.063523.40985.0.001.heapsnapshot, Heap.20200423.063524.40985.0.002.heapsnapshot..)

Any comments about these observations?

joyeecheung on 23 Apr 2020

👍2

@joyeecheung
When starting on the node-oom-heapdump code, i used gc-stats to detect when a full GC happens. When the RSS was over a user-defined threshold, a heapdump would be created. This in combination with a user-defined maximum number of heapdumps when being over the threshold.
This is the poor mans solution to your native implementation, if i understand correctly.
Observations on this implementation:

it worked well for "minor" memory leaks, where the memory grew gradually
when the memory usage grew very volatile, there was no time to create the heapdump before the OoM. That's when i started looking into the V8 SetOOMErrorHandler method, which had a better result.

It would be interesting to know if your WiP can handle the volatile increase of memory usage.

paulrutter on 23 Apr 2020

@paulrutter Thanks, fro what I can tell, using the NearHeapLimitCallback should work for the more volatile increase, and it should work better than the OOM handler, since the OOM handler is triggered when V8 is about to crash, whereas NearHeapLimitCallback is, as the name implies, triggered sometime before that, when there's still some room in the heap. The snapshot writing would be synchronous, so it's guaranteed that at least one snapshot will be written before the program crashes - whether there will be more depends on how fast the heap grows and how much room we leave for V8/V8 leaves for us before the callback is invoked the next time, since snapshot generation triggers GC which in term, might increase heap usage (while promoting objects in young generation, but not for the snapshot generation itself like what we had been worried about in this thread).

joyeecheung on 24 Apr 2020

👍1

joyeecheung on 24 Apr 2020

👍4

Today I tried to run the module recommended here in the first comment (https://github.com/blueconic/node-oom-heapdump) while restricting the memory for the old heap to 100mb.

When crashing the process's memory was rising to up to 500mb and needed about 7 minutes to gather the heap dump.

This makes me question this technique asking myself why I couldn't simply use a core-dump here instead. Is there a way to create a core dump when running out of memory and later on (on a machine with enough resources available) "transform" it into a heap dump?

SimonSimCity on 27 Apr 2020

@SimonSimCity We're using that module for node processes restricted between 80 and 160MB, and when one of those crashes it never takes more than a few seconds to create the heapdump.
7 minutes is excessive indeed. Was your testcase representative?

Yes, core dumps can be created already, by passing the --abort-on-uncaught-exception flag.
This has been discussed in an earlier thread. I don't know if this information is usable to transform into a heapdump format though.

paulrutter on 28 Apr 2020

I tested it on an application our company is working on, which is a Meteor project running in development mode. Maybe the generated object graph, as mentioned in https://github.com/nodejs/node/issues/27552#issuecomment-612038240 is very complex there ...

It was very often quoted that this is requires a significant amount of additional memory (https://github.com/nodejs/node/issues/27552#issuecomment-618079094, https://github.com/nodejs/diagnostics/issues/239#issuecomment-427571478, https://github.com/nodejs/diagnostics/issues/239#issuecomment-426953394, https://github.com/nodejs/node/issues/27552#issuecomment-489272479) - some of them also mentioning performance as a problematic factor.

Heap snapshots also seem to be problematic when the heap size is high: https://github.com/nodejs/diagnostics/issues/239#issuecomment-427479988 (the ticket linked in the comment mentions a size of >1.5 GB)

If a core dump, automatically generated by the OS, can help us here, this should be used preferably in my opinion. At least on Linux it seems to be provided at almost no memory cost. Windows and other OSes might have different format or even different approach here, but I'd rather go this direction first than a solution which requires significantly more memory.

I'll test out some options here regarding core dumps as I know my application will only run inside a Linux based docker container.

SimonSimCity on 28 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings