Introduction
This issue is a followup of https://github.com/nodejs/node/issues/23328. The outcome of that issue was the introduction of the v8.writeHeapSnapshot()
API.
Next step would be to introduce a way of handling an out of memory situation in JS context.
When running nodejs processes in a low memory environment, every out of memory that occurs is interesting. To figure out why a process went out of memory, a heapdump can help a lot.
Desired solution
There are several possible solutions which would suffice:
process.on('fatal_error')
, which kicks in for an OoM event. (see https://github.com/nodejs/diagnostics/issues/239#issuecomment-427600405). The question is if it's feasible to execute JS code after the 'fatal_error' occurred? Alternatives
At the moment, we use our own module. This uses native code to hook into the SetOOMErrorHandler
handler of V8.
This works although it's not very elegant.
I'm not very keen to have this in Node.js core at the moment because it usually takes a lot of memory to take a heap snapshot. If the system is already on a V8 OOM situation, trying to take a heap snapshot can lead to an operating system OOM (which might kill other important processes).
If we could drastically reduce the memory used to take the heap snapshot, I would be happy to see this feature introduced in core.
introduce an event like process.on('fatal_error')
It is possible to execute JS at that point, but in general I don't think it's a good idea to open this opportunity up, as when there is a fatal error we need to be very careful about what we execute (it's similar to signal handlers in some way).
add a CLI flag which enables automatic heapdumps on OoM. This might be more feasible
This looks more promising (or just provide heap snapshots as one of the actions that can be specified to be done when a fatal error occurs, as we already make it possible to trigger node-report in the fatal error handler), though as @mmarchini points out it's at the users' own risk if they want to do that.
I would be happy to go for option 2, to make it another configurable cli option. Memory use of the heapdump is indeed a risk, depending on how you use it.
In my usecase, nodejs processes run with restricted old space size, within Docker. So there is always enough memory to make the heapdump. The Docker memory limit has to be at least twice the amount of the nodejs process, to be safe.
I'd agree that a CLI flag makes sense, we should look at the existing option for generating a node-report and make it consistent with that.
@paulrutter if you make a snapshot you still will not have anything to compare with, because you need at least one more snapshot. I can assume that the answer to the question: "why do you need a heapdump on OOM" is "inspect state of memory". For this case you can use flag --abort-on-uncaught-exception and create a coredump (stack trace + heapdump) on "process abort". And then you can explore the memory with llnode. It may take a little longer, but it definitely works.
Thanks.
@matvi3nko Comparison is only needed when you suspect a memory leak, but this is not always the reason of an out of memory situation. More often, the code being executed is not memory efficient (for example: reading a whole file in memory at once instead of streaming). A single heapdump would show such issues, without the need for comparison.
I tried llnode in the past and found it not very user friendly to use. Of course this is more of an experience issue at my end, but still i think it would be beneficial to other Node.js users to have a more entry-level heapdump generation process in place.
Compare the use of Chrome DevTools to analyze a heapdump to llnode; it's on a whole different level.
With the latest Node.js 12.11.1, the node-oom-heapdump
module fails with the following message:
<--- JS stacktrace --->
Cannot get stack trace in GC.
Generating Heapdump to 'C:\git\node-oom-heapdump\tests\my_heapdump.heapsnapshot' now...
#
# Fatal error in , line 0
# unreachable code
#
#
#
#FailureMessage Object: 0000006BE0FF7890
The API's for creating a heapdump do not have seem to be changed. It seems that calling "createHeapSnapshot" no longer works in the context that it did before.
But maybe i should ask the v8 team for help on this issue.
Is there any progress made on adding the functionality to Node.js core?
Does it work on v12.10?
No, it doesn't. Same behavior.
What about 12.0.0, 12.3.0 and 12.5.0 (those are the V8 bumps during Node.js v12)? If the issue is on V8 it's good to narrow down which version it started.
--
Also, you should be able to get a core dump if this is throwing a Fatal error. This will allow you to print the native call stack, which should help finding the issue.
To generate a core dump, run:
ulimit -c unlimited
node your-code.js
And then open it with gdb or lldb to get the stack trace:
gdb core # lldb /cores/core.PID if your're on OS X
(gdb) bt
Post the ~core dump~ stack trace here, should help narrow down the issue.
EDIT: Don't post the core dump, it's a bad idea :sweat_smile:
Thanks, will try to narrow the issue down and come back with the results.
I looked into the issue, and found out that on Node.js 12 (doesn't matter which minor version) the node-oom-heapdump module works well as long as the following flags are not used:
--optimize_for_size --always_compact
When these flags are used, the behavior is a bit unpredictable.
Sometimes it completes, but more often it fails with the following stacktrace:
gdb node core.<pid>
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `node --max_old_space_size=40 --optimize_for_size --always_compact --inspect=999'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000ce249d in v8::internal::Heap::GarbageCollectionPrologue() ()
(gdb) bt
#0 0x0000000000ce249d in v8::internal::Heap::GarbageCollectionPrologue() ()
#1 0x0000000000ceba22 in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#2 0x0000000000cec25f in v8::internal::Heap::PreciseCollectAllGarbage(int, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#3 0x0000000000f6eb1f in v8::internal::HeapSnapshotGenerator::GenerateSnapshot() ()
#4 0x0000000000f60793 in v8::internal::HeapProfiler::TakeSnapshot(v8::ActivityControl*, v8::HeapProfiler::ObjectNameResolver*) ()
#5 0x00007ff22cf1dba7 in OnOOMError(char const*, bool) () from /nodeapp/work/node12/package/build/Release/node_oom_heapdump_native.node
#6 0x0000000000b32d90 in v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) ()
#7 0x0000000000b33139 in v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) ()
#8 0x0000000000cde455 in v8::internal::Heap::FatalProcessOutOfMemory(char const*) ()
#9 0x0000000000d0c093 in v8::internal::EvacuateNewSpaceVisitor::Visit(v8::internal::HeapObject, int) ()
#10 0x0000000000d13f70 in void v8::internal::LiveObjectVisitor::VisitBlackObjectsNoFail<v8::internal::EvacuateNewSpaceVisitor, v8::internal::MajorNonAtomicMarkingState>(v8::internal::MemoryChunk*, v8::internal::MajorNonAtomicMarkingState*, v8::internal::EvacuateNewSpaceVisitor*, v8::internal::LiveObjectVisitor::IterationMode) ()
#11 0x0000000000d211f8 in v8::internal::FullEvacuator::RawEvacuatePage(v8::internal::MemoryChunk*, long*) ()
#12 0x0000000000d05b5e in v8::internal::Evacuator::EvacuatePage(v8::internal::MemoryChunk*) ()
#13 0x0000000000d05e27 in v8::internal::PageEvacuationTask::RunInParallel(v8::internal::ItemParallelJob::Task::Runner) ()
#14 0x0000000000cfb315 in v8::internal::ItemParallelJob::Task::RunInternal() ()
#15 0x0000000000cfb724 in v8::internal::ItemParallelJob::Run() ()
#16 0x0000000000d154b7 in void v8::internal::MarkCompactCollectorBase::CreateAndExecuteEvacuationTasks<v8::internal::FullEvacuator, v8::internal::MarkCompactCollector>(v8::internal::MarkCompactCollector*, v8::internal::ItemParallelJob*, v8::internal::MigrationObserver*, long) ()
#17 0x0000000000d23784 in v8::internal::MarkCompactCollector::EvacuatePagesInParallel() ()
#18 0x0000000000d2439a in v8::internal::MarkCompactCollector::Evacuate() [clone .constprop.1218] ()
#19 0x0000000000d29587 in v8::internal::MarkCompactCollector::CollectGarbage() ()
#20 0x0000000000ce9fa9 in v8::internal::Heap::MarkCompact() ()
#21 0x0000000000cead13 in v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) ()
#22 0x0000000000ceb885 in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#23 0x0000000000cee298 in v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) ()
#24 0x0000000000cb4bc7 in v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType) ()
#25 0x0000000000feaafb in v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) ()
#26 0x000000000136d539 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit () at ../../deps/v8/../../deps/v8/src/builtins/base.tq:3028
#27 0x000013f93c8c5f5f in ?? ()
#28 0x0000000000000000 in ?? ()
So, i'm not sure if i need to follow up on this.
Maybe it's just the combination of my test case and the node flags that gives the unpredictable behavior. I'll just need to try it in a more real-life scenario and see what happens.
Found this issue while searching for a solution similar to java heap dump on oom
While llnode may do all the required things, it's not the tool that is familiar to JS developers, whereas dev tools are.
And as for heapdump generation, I think that double memory requirement is not an option for general usage. If your process is already flagged to be terminated, there should be a way to stop the world and stream heap contents directly to fs without creating an intermediate object.
And as for heapdump generation, I think that double memory requirement is not an option for general usage. If your process is already flagged to be terminated, there should be a way to stop the world and stream heap contents directly to fs without creating an intermediate object.
Heapdumps are essentially graphs of the live objects on the heap. Creating that graph is what takes up a lot of memory. It's unavoidable.
A coredump doesn't have that problem because it doesn't create a graph, it simply dumps the heap as a byte array.
One possible way forward is to create a tool (possibly integrated into the node binary) for transforming a coredump into a heapdump.
I wrote a tool like that for Node.js v0.10 once but that's totally antiquated by now, of course. :-)
One possible way forward is to create a tool (possibly integrated into the node binary) for transforming a coredump into a heapdump.
I think it is really a way to go
But after playing some time with core dump generation, I would say that not only tooling, but core dump generation for process that runs inside of container is an extremely difficult thing to get right, and the main issue here is configuring core dump location which can be done only on the host machine and is probably not an option. Until core_pattern namespacing is not implemented in kernel
Node.js doesn't have direct access to the heap, so generating a heapdump is not something we can do out of the box. This is a feature request for V8 (https://bugs.chromium.org/p/v8/issues/list).
I've failed to find this issue and created #32756 (already closed it). A huge +1 from me (and, as I believe, from many node users) for the CLI flag option.
@paulrutter did you already open an issue against V8?
@puzpuzpuz No, i haven't gotten to it yet.
As the heapdump
is a metadata, not a raw memory dump, why would the dump generator take as much memory as the heap? May be there is an improvement opportunity in V8?
May be there is an improvement opportunity in V8?
Open-ended questions like that aren't useful. When is there _not_ room for improvement?
The 2x is a rule of thumb - i.e., a decent assumption, not a hard rule. The lower bound for most programs is about 33% (1 pointer per object where the average object is 3 pointers big.)
But before someone goes "oh, so it's only one-third": lower bound != average.
Open-ended questions like that aren't useful. When is there not room for improvement?
@bnoordhuis - according to me, the effort required to generate a dump involves traversing the object graph and recording the reference tree and the size information (for example our own MemoryRetainer
). If footprint for this effort grows in proportion to the heap size, it is reasonable to expect room for improvement?
I don't know how the snapshot is collected, that is why I used may be
and put it across as a question. It becomes useful
when someone from v8 investigates and / or makes observations.
If footprint for this effort grows in proportion to the heap size, it is reasonable to expect room for improvement?
Maybe partially. Probably not easily.
The current heap snapshot generator uses additional memory because it creates persistent snapshots. They remain valid after they're created; e.g., JS strings are copied.
A zero-copy, one-shot generator is conceivable but computing the graph edges is still going to require additional memory.
N
objects that point to M
other objects on average is N*M
relations that need to be recorded.
@bnoordhuis - understood all other points, thanks.
N objects that point to M other objects on average is N*M relations that need to be recorded.
agree, but that implies the generator footprint is a function of number of objects in the heap and their cross-references, not on the size of the heap?
I don't really see much way around the memory requirements with the current snapshot approach. We'd really need to move to a heap tracing model that can stream out events as they happen and that would allow a graph to be build via post processing.
that implies the generator footprint is a function of number of objects in the heap and their cross-references, not on the size of the heap?
Note that I was careful to write "live objects on the heap" in https://github.com/nodejs/node/issues/27552#issuecomment-600035110. :-)
In an out-of-memory condition they're roughly equivalent though. The heap is so full with live objects that there isn't room for more.
V8 provides v8::Isolate::AddNearHeapLimitCallback() for adjusting the heap limit when V8 is approaching it. The debugger implementation in V8 uses this to break at the point where the heap size limit is near in the Chrome DevTools. I did a proof of concept --heapsnapshot-near-heap-limit
implementation (https://github.com/nodejs/node/pull/33010), it does work if I just write a snapshot to disk in that callback. I temporarily raise the heap size limit to a value slightly bigger than the original limit until the snapshot is done, and tells V8 to restore to the initial limit later. There are some observations with this approach:
test/fixtures/workload/allocation.js
and --max-old-space-size=100
, without using --heapsnapshot-near-heap-limit
, the process crashes in 20s after 73 GCs. With the option on it crashes in 121s after 130 GCs, leaving 12 snapshots of size 140-170MB on disks.Heap.20200423.063523.40985.0.001.heapsnapshot
, Heap.20200423.063524.40985.0.002.heapsnapshot
..)Any comments about these observations?
@joyeecheung
When starting on the node-oom-heapdump code, i used gc-stats to detect when a full GC happens. When the RSS was over a user-defined threshold, a heapdump would be created. This in combination with a user-defined maximum number of heapdumps when being over the threshold.
This is the poor mans solution to your native implementation, if i understand correctly.
Observations on this implementation:
SetOOMErrorHandler
method, which had a better result.It would be interesting to know if your WiP can handle the volatile increase of memory usage.
@paulrutter Thanks, fro what I can tell, using the NearHeapLimitCallback should work for the more volatile increase, and it should work better than the OOM handler, since the OOM handler is triggered when V8 is about to crash, whereas NearHeapLimitCallback is, as the name implies, triggered sometime before that, when there's still some room in the heap. The snapshot writing would be synchronous, so it's guaranteed that at least one snapshot will be written before the program crashes - whether there will be more depends on how fast the heap grows and how much room we leave for V8/V8 leaves for us before the callback is invoked the next time, since snapshot generation triggers GC which in term, might increase heap usage (while promoting objects in young generation, but not for the snapshot generation itself like what we had been worried about in this thread).
Also, as discussed in https://github.com/nodejs/node/pull/33010#discussion_r414011746 having this implemented in Node.js core, instead of as an addon, may help us avoid the situation where the snapshot generation triggers a system OOM due to the additional native memory overhead, because our own implementation have access to the parameter used initially to configure the V8 heap, so we can do some calculations with the information to avoid this as much as we can.
Today I tried to run the module recommended here in the first comment (https://github.com/blueconic/node-oom-heapdump) while restricting the memory for the old heap to 100mb
.
When crashing the process's memory was rising to up to 500mb and needed about 7 minutes to gather the heap dump.
This makes me question this technique asking myself why I couldn't simply use a core-dump here instead. Is there a way to create a core dump when running out of memory and later on (on a machine with enough resources available) "transform" it into a heap dump?
@SimonSimCity We're using that module for node processes restricted between 80 and 160MB, and when one of those crashes it never takes more than a few seconds to create the heapdump.
7 minutes is excessive indeed. Was your testcase representative?
Yes, core dumps can be created already, by passing the --abort-on-uncaught-exception
flag.
This has been discussed in an earlier thread. I don't know if this information is usable to transform into a heapdump format though.
I tested it on an application our company is working on, which is a Meteor project running in development mode. Maybe the generated object graph, as mentioned in https://github.com/nodejs/node/issues/27552#issuecomment-612038240 is very complex there ...
It was very often quoted that this is requires a significant amount of additional memory (https://github.com/nodejs/node/issues/27552#issuecomment-618079094, https://github.com/nodejs/diagnostics/issues/239#issuecomment-427571478, https://github.com/nodejs/diagnostics/issues/239#issuecomment-426953394, https://github.com/nodejs/node/issues/27552#issuecomment-489272479) - some of them also mentioning performance as a problematic factor.
Heap snapshots also seem to be problematic when the heap size is high: https://github.com/nodejs/diagnostics/issues/239#issuecomment-427479988 (the ticket linked in the comment mentions a size of >1.5聽GB)
If a core dump, automatically generated by the OS, can help us here, this should be used preferably in my opinion. At least on Linux it seems to be provided at almost no memory cost. Windows and other OSes might have different format or even different approach here, but I'd rather go this direction first than a solution which requires significantly more memory.
I'll test out some options here regarding core dumps as I know my application will only run inside a Linux based docker container.
Most helpful comment
Also, as discussed in https://github.com/nodejs/node/pull/33010#discussion_r414011746 having this implemented in Node.js core, instead of as an addon, may help us avoid the situation where the snapshot generation triggers a system OOM due to the additional native memory overhead, because our own implementation have access to the parameter used initially to configure the V8 heap, so we can do some calculations with the information to avoid this as much as we can.