Node: ENOMEM with exec/spawn - child process tries to reserve as much mem as parent

Created on 8 Jan 2019 · 19Comments · Source: nodejs/node

Version: v8.12.0 64-bit
Platform: Linux 4.15.0-1026 (Ubuntu 16.04)
Subsystem: child_process

A tiny exec call when Node.js is using at least half of the otherwise-available memory causes spawn ENOMEM:

// If your system has more than 4 GB of mem, repeat these two lines until >50% of memory is used:
let x = Buffer.allocUnsafe(2e9);
x.fill(2); // virtual -> reserved
// Causes ENOMEM even if there's >1 GB of available memory:
require("child_process").exec("pwd", console.log)

# (strace)
[pid  3017] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ffae3ea7a10) = -1 ENOMEM (Cannot allocate memory)

Apparently this is a common problem when using fork()/clone(), ~~and using posix_spawn(3) avoids it~~.

Good write up of the issue: https://www.oracle.com/technetwork/server-storage/solaris10/subprocess-136439.html
Related glibc bug discussion from JDK devs working on same issue https://sourceware.org/bugzilla/show_bug.cgi?id=10311
JDK bug that goes with above https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6850720

I don't see any upstream issues in libuv for this.

child_process help wanted memory

Source

zbjornson

👍10

Most helpful comment

@bnoordhuis the default overcommit_memory was 0. The OP repro run with the three options:

0 - ENOMEM
1 - okay
2 - ENOMEM (with additional difficulty allocating the buffer in the first place)

Aside from that workaround (possibly viable) or adding swap (not viable), for my specific situation I'm looking into a PR to make https://github.com/googleapis/google-auth-library-nodejs either not exec at all, or exec earlier before the process grows.

zbjornson on 8 Jan 2019

👍2

All 19 comments

This could be the underlying cause of this very popular SO issue for some users: https://stackoverflow.com/questions/26193654/node-js-catch-enomem-error-thrown-after-spawn

zbjornson on 8 Jan 2019

I think something like this would be handled/solved at the libuv level, not node. The libuv issue tracker is here.

mscdex on 8 Jan 2019

Duplicate of ~~https://github.com/nodejs/node/issues/14917~~?~~

vtjnash on 8 Jan 2019

Not really a dupe. Both are caused by using fork(), but different symptoms (ENOMEM vs. blocked loop), and this issue isn't addressed by MADV_DONTFORK as suggested in #14917. I just tried my repro with a buffer created from madvise(..., MADV_DONTFORK)'ed memory, and the fork() syscall still failed with ENOMEM. Even if the kernel doesn't make the memory available, it apparently still requests the allocation (or goes through some of the same assertions).

Edit May 23, 2019 tried madvise(MADV_DONTFORK) with a 5.1+ kernel and it still causes ENOMEM.

zbjornson on 8 Jan 2019

👍1

What does sysctl vm.overcommit_memory print on your system?

As you've probably gathered from the issues linked in https://github.com/libuv/libuv/issues/2133, this is not something that is solvable, easily or at all. vfork() and clone() have their own share of issues.

bnoordhuis on 8 Jan 2019

@bnoordhuis Can you point me in the right direction for the issues you've encountered with using vfork? I've tried to search the issue tracker and commit history, but seemed hard to discover back that far. I've seen a couple references to cases where it was seen to have issues (such as libc developers arguing about it https://sourceware.org/bugzilla/show_bug.cgi?id=10354), but fork seems to share most of the same limitations (posix also specifies that "Consequently, [after fork] to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called."), but additionally (on linux) has been observed to be slower and to fail in low-memory situations. Additionally on linux, libuv could use clone directly to avoid some of the issues (with the vfork flag, but providing a separate stack). I did find this interesting graph though, which suggests that the problems with fork might be linux-specific: https://github.com/rtomayko/posix-spawn

It's been a couple years, but when I looked into the linux kernel code for fork, my recollection is that it does not respect the overcommit_memory flag(s) and just enforces that process memory < physical memory (plus probably some extra constant fudge factors). I couldn't really say what memory it counts against the limit however (e.g. if it includes MADV_DONTFORK).

vtjnash on 8 Jan 2019

👍1

The biggest issue for libuv is that vfork() doesn't run pthread_atfork() handlers. That might be surmountable for Node.js but add-ons might be a problem.

It's been a couple years, but when I looked into the linux kernel code for fork, my recollection is that it does not respect the overcommit_memory flag(s)

Linux's overcommit logic has been overhauled several times in the last years. I wouldn't know where exactly it stands now without checking the source (and also of past releases.) :-)

bnoordhuis on 8 Jan 2019

They aren't run though because they shouldn't often be needed. Although I don't know what Node.js does (or promises to do) in this case. From the rationale of pthread_atfork, running them is necessary to work around design issues with fork arising due to the COW semantics in multi-threaded programs that try to also use "unsafe" functions. But we can audit uv_spawn and know that it only uses the async-safe functions then execv (per fork documentation). Did someone actually need their pthread_atfork handler to run, or is that just a hypothetical concern that someone might try to do something more in their pthread_atfork handler?

OTOH, libuv now does register a pthread_atfork handler which does some extra operations that adds a couple of additional unnecessary syscalls, and thus may increase the cost of launching a process without vfork. I have no idea whether the impact of that code is measurable though.

But if the overcommit-related logic has changed, perhaps it might be faster now too? I see the OP is on an older Ubuntu 16.04. But if the kernel can do fork (almost) as fast and now as vfork and without hitting ENOMEM limits (as it seems may be true on mach, so perhaps possible at the hardware level), that would great! (and would save me from actually needing to ever upstream the vfork support to libuv, haha)

vtjnash on 8 Jan 2019

👍1

@bnoordhuis the default overcommit_memory was 0. The OP repro run with the three options:

0 - ENOMEM
1 - okay
2 - ENOMEM (with additional difficulty allocating the buffer in the first place)

Aside from that workaround (possibly viable) or adding swap (not viable), for my specific situation I'm looking into a PR to make https://github.com/googleapis/google-auth-library-nodejs either not exec at all, or exec earlier before the process grows.

zbjornson on 8 Jan 2019

👍2

Although I don't know what Node.js does (or promises to do) in this case.

@vtjnash There is (or was) at least one shared memory add-on module that uses pthread_atfork() so I don't think it's out of the question that other add-ons exist that, directly or indirectly, rely on the current behavior.

I say "indirectly" because add-on X might depend on shared library Y, which in turn depends on a functioning pthread_atfork(). Since that's impossible to audit for, I'd be uncomfortable changing the default, but, as discussed in https://github.com/libuv/libuv/pull/141, an opt-in should be acceptable.

@zbjornson Thanks for checking! I figured that DWIM mode (vm.overcommit_memory=1) would end up working around this.

It's curious that madvise(MADV_DONTFORK) didn't work and I wonder if that's a kernel bug. It's not the behavior I'd expect at any rate.

bnoordhuis on 10 Jan 2019

(Random data point: also seeing this on a 2GB server trying to execute imagemin via CLI)

tomasdev on 29 May 2019

Hrm, twice previously I tried using MADV_DONTFORK and was still getting ENOMEMs, but looking at the kernel source to figure out why, I saw that it should work. I just tried again and, lo and behold, the ENOMEM goes away. (Maybe I didn't have a page-aligned allocation or maybe I had two advice flags OR'ed together, and wasn't checking the return value? /shrug, and sorry for the wasted cycles thinking about alternate solutions.)

Anyway, getting https://chromium-review.googlesource.com/c/v8/v8/+/1101679 landed would be great. Looks like there's just a typo pending.

(Also, on my system an madvise call only takes ~6 μs.)

zbjornson on 24 Aug 2019

can we push https://chromium-review.googlesource.com/c/v8/v8/+/1101679 forward?

gireeshpunathil on 4 Jan 2020

cc @bnoordhuis

gireeshpunathil on 4 Jan 2020

This would explain why we've had some ENOMEM errors on production since switching to Node 12.

Because default old space size was 1.4G on Node 10, 1.4*2=2.8GB was still under our 4GB server's ram.

Since Node 12 removed the old space size limit, the heap size was sometimes over 3GB, so the spawn would require 3*2 =6GB, and our instances -who do not have SWAP-, would just throw an ENOMEM error.

I will keep an eye on this issue !

hugodes on 30 Jan 2020

can we push https://chromium-review.googlesource.com/c/v8/v8/+/1101679 forward?

Hi, there is progress on this topic ?

fast0490f on 14 Apr 2020

From what I've read the cheapest way to safely do it is use vfork+exec. True fork should setup just a memory with most/all pages mapped from the parent but this is not easy/fast task for huge memory hog as most of node process tend to become. So if we're going to exec past fork and if vfork exist in the system why not use it?

sdrsdr on 30 Apr 2020

Stumbled on this issue trying to figure out why child_process.exec and it's peers use the spawn syscall instead of execve: https://github.com/nodejs/node/blob/f1ae7ea343020f608fdc1ca77d9cdfe2c093ac72/lib/child_process.js#L237

Given that the execve syscall should act such that the machine code, data, heap, and stack of the process are replaced by those of the new program, it may be reasonable to expect that calling child_process.exec should not result in an out of memory condition. The calling process machine code, data, heap and stacks should be freed.

I'm not sure _why_ spawn is called rather than exec, especially given that child_process.spawn and others are available. Seems to me that it's a bad idea to have both spawn and exec available, but not following expected semantics. One path forward may be to create a new set of methods such as child_process.overlay which do actually call exec, and make it clear in the docs that exec doesn't follow the typical exec paradigm.

tysonclugg on 9 Jul 2020

Further to my comment above, perhaps it would help to consider that using the exec syscall in preference to spawn is a form of tail call elimination, which is useful to limit memory usage.

tysonclugg on 9 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

installer does not run on vista x32 pae amd

jmichae3 · 3Comments

Unhandled 'error' event : ECONNRESET

cong88 · 3Comments

Readline printing output history while printing with readline.write

Brekmister · 3Comments

verbose unsafe-perm in lifecycle true

fanjunzhi · 3Comments

Array.includes() is not working

willnwhite · 3Comments