A tiny exec call when Node.js is using at least half of the otherwise-available memory causes spawn ENOMEM:
// If your system has more than 4 GB of mem, repeat these two lines until >50% of memory is used:
let x = Buffer.allocUnsafe(2e9);
x.fill(2); // virtual -> reserved
// Causes ENOMEM even if there's >1 GB of available memory:
require("child_process").exec("pwd", console.log)
# (strace)
[pid 3017] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ffae3ea7a10) = -1 ENOMEM (Cannot allocate memory)
Apparently this is a common problem when using fork()/clone(), and using .posix_spawn(3) avoids it
I don't see any upstream issues in libuv for this.
This could be the underlying cause of this very popular SO issue for some users: https://stackoverflow.com/questions/26193654/node-js-catch-enomem-error-thrown-after-spawn
I think something like this would be handled/solved at the libuv level, not node. The libuv issue tracker is here.
Duplicate of https://github.com/nodejs/node/issues/14917~~?
Not really a dupe. Both are caused by using fork(), but different symptoms (ENOMEM vs. blocked loop), and this issue isn't addressed by MADV_DONTFORK as suggested in #14917. I just tried my repro with a buffer created from madvise(..., MADV_DONTFORK)'ed memory, and the fork() syscall still failed with ENOMEM. Even if the kernel doesn't make the memory available, it apparently still requests the allocation (or goes through some of the same assertions).
Edit May 23, 2019 tried madvise(MADV_DONTFORK) with a 5.1+ kernel and it still causes ENOMEM.
What does sysctl vm.overcommit_memory print on your system?
As you've probably gathered from the issues linked in https://github.com/libuv/libuv/issues/2133, this is not something that is solvable, easily or at all. vfork() and clone() have their own share of issues.
@bnoordhuis Can you point me in the right direction for the issues you've encountered with using vfork? I've tried to search the issue tracker and commit history, but seemed hard to discover back that far. I've seen a couple references to cases where it was seen to have issues (such as libc developers arguing about it https://sourceware.org/bugzilla/show_bug.cgi?id=10354), but fork seems to share most of the same limitations (posix also specifies that "Consequently, [after fork] to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called."), but additionally (on linux) has been observed to be slower and to fail in low-memory situations. Additionally on linux, libuv could use clone directly to avoid some of the issues (with the vfork flag, but providing a separate stack). I did find this interesting graph though, which suggests that the problems with fork might be linux-specific: https://github.com/rtomayko/posix-spawn
It's been a couple years, but when I looked into the linux kernel code for fork, my recollection is that it does not respect the overcommit_memory flag(s) and just enforces that process memory < physical memory (plus probably some extra constant fudge factors). I couldn't really say what memory it counts against the limit however (e.g. if it includes MADV_DONTFORK).
The biggest issue for libuv is that vfork() doesn't run pthread_atfork() handlers. That might be surmountable for Node.js but add-ons might be a problem.
It's been a couple years, but when I looked into the linux kernel code for fork, my recollection is that it does not respect the overcommit_memory flag(s)
Linux's overcommit logic has been overhauled several times in the last years. I wouldn't know where exactly it stands now without checking the source (and also of past releases.) :-)
They aren't run though because they shouldn't often be needed. Although I don't know what Node.js does (or promises to do) in this case. From the rationale of pthread_atfork, running them is necessary to work around design issues with fork arising due to the COW semantics in multi-threaded programs that try to also use "unsafe" functions. But we can audit uv_spawn and know that it only uses the async-safe functions then execv (per fork documentation). Did someone actually need their pthread_atfork handler to run, or is that just a hypothetical concern that someone might try to do something more in their pthread_atfork handler?
OTOH, libuv now does register a pthread_atfork handler which does some extra operations that adds a couple of additional unnecessary syscalls, and thus may increase the cost of launching a process without vfork. I have no idea whether the impact of that code is measurable though.
But if the overcommit-related logic has changed, perhaps it might be faster now too? I see the OP is on an older Ubuntu 16.04. But if the kernel can do fork (almost) as fast and now as vfork and without hitting ENOMEM limits (as it seems may be true on mach, so perhaps possible at the hardware level), that would great! (and would save me from actually needing to ever upstream the vfork support to libuv, haha)
@bnoordhuis the default overcommit_memory was 0. The OP repro run with the three options:
0 - ENOMEM
1 - okay
2 - ENOMEM (with additional difficulty allocating the buffer in the first place)
Aside from that workaround (possibly viable) or adding swap (not viable), for my specific situation I'm looking into a PR to make https://github.com/googleapis/google-auth-library-nodejs either not exec at all, or exec earlier before the process grows.
Although I don't know what Node.js does (or promises to do) in this case.
@vtjnash There is (or was) at least one shared memory add-on module that uses pthread_atfork() so I don't think it's out of the question that other add-ons exist that, directly or indirectly, rely on the current behavior.
I say "indirectly" because add-on X might depend on shared library Y, which in turn depends on a functioning pthread_atfork(). Since that's impossible to audit for, I'd be uncomfortable changing the default, but, as discussed in https://github.com/libuv/libuv/pull/141, an opt-in should be acceptable.
@zbjornson Thanks for checking! I figured that DWIM mode (vm.overcommit_memory=1) would end up working around this.
It's curious that madvise(MADV_DONTFORK) didn't work and I wonder if that's a kernel bug. It's not the behavior I'd expect at any rate.
(Random data point: also seeing this on a 2GB server trying to execute imagemin via CLI)
Hrm, twice previously I tried using MADV_DONTFORK and was still getting ENOMEMs, but looking at the kernel source to figure out why, I saw that it should work. I just tried again and, lo and behold, the ENOMEM goes away. (Maybe I didn't have a page-aligned allocation or maybe I had two advice flags OR'ed together, and wasn't checking the return value? /shrug, and sorry for the wasted cycles thinking about alternate solutions.)
Anyway, getting https://chromium-review.googlesource.com/c/v8/v8/+/1101679 landed would be great. Looks like there's just a typo pending.
(Also, on my system an madvise call only takes ~6 渭s.)
can we push https://chromium-review.googlesource.com/c/v8/v8/+/1101679 forward?
cc @bnoordhuis
This would explain why we've had some ENOMEM errors on production since switching to Node 12.
Because default old space size was 1.4G on Node 10, 1.4*2=2.8GB was still under our 4GB server's ram.
Since Node 12 removed the old space size limit, the heap size was sometimes over 3GB, so the spawn would require 3*2 =6GB, and our instances -who do not have SWAP-, would just throw an ENOMEM error.
I will keep an eye on this issue !
can we push https://chromium-review.googlesource.com/c/v8/v8/+/1101679 forward?
Hi, there is progress on this topic ?
From what I've read the cheapest way to safely do it is use vfork+exec. True fork should setup just a memory with most/all pages mapped from the parent but this is not easy/fast task for huge memory hog as most of node process tend to become. So if we're going to exec past fork and if vfork exist in the system why not use it?
Stumbled on this issue trying to figure out why child_process.exec and it's peers use the spawn syscall instead of execve: https://github.com/nodejs/node/blob/f1ae7ea343020f608fdc1ca77d9cdfe2c093ac72/lib/child_process.js#L237
Given that the execve syscall should act such that the machine code, data, heap, and stack of the process are replaced by those of the new program, it may be reasonable to expect that calling child_process.exec should not result in an out of memory condition. The calling process machine code, data, heap and stacks should be freed.
I'm not sure _why_ spawn is called rather than exec, especially given that child_process.spawn and others are available. Seems to me that it's a bad idea to have both spawn and exec available, but not following expected semantics. One path forward may be to create a new set of methods such as child_process.overlay which do actually call exec, and make it clear in the docs that exec doesn't follow the typical exec paradigm.
Further to my comment above, perhaps it would help to consider that using the exec syscall in preference to spawn is a form of tail call elimination, which is useful to limit memory usage.
Most helpful comment
@bnoordhuis the default
overcommit_memorywas 0. The OP repro run with the three options:0 - ENOMEM
1 - okay
2 - ENOMEM (with additional difficulty allocating the buffer in the first place)
Aside from that workaround (possibly viable) or adding swap (not viable), for my specific situation I'm looking into a PR to make https://github.com/googleapis/google-auth-library-nodejs either not exec at all, or exec earlier before the process grows.