Node: Cancerous memory consumption under specific circumstances

Created on 6 Aug 2019  Â·  4Comments  Â·  Source: nodejs/node

  • Version: all versions from 12.5.0 affected, also occurs on 12.7.0
  • Platform: tested on macOS (18.7.0 Darwin xnu-4903.270.47~4/RELEASE_X86_64) and Linux
  • Subsystem: something low-level

This is one of the weirdest bugs I've ever debugged.

Under certain circumstances, Node process (starting from version 12.5.0, previous versions not affected), will not exit (even if we call process.exit() at the end of the file), and will start to consume memory until it fills the whole available memory, using 100% CPU while at it.

Here's a reproduction:

https://gist.github.com/niieani/a527e9b2e6caf28e3021a50b938e4a91

Now, what's really funny, there are a few things that make this reproduction not trigger the bug:

  • making the strings in the reproduction.json shorter makes the process exit normally
  • removing various lines from the reproduction (comments show where) makes the process exit normally
  • replacing path.resolve with a call to a different function, e.g. by console.log (note that this originally used jest-haste-map/build/lib/fast_path and would still trigger the bug)

Standard memory or CPU profiling doesn't show anything (literally, there are no function calls shown in the profiler). Dumping the heap snapshot results in a 36MB file, even when the memory consumption of the process is 10 GB.

The backstory is that in our case jest would not end AFTER all the tests have completed, and instead would start consuming huge amounts of memory (on CI, multiple processes from separate PRs would deplete the 128GB of RAM on the machine) so I started going down the rabbit hole to try and pinpoint what is the cause of the issue, which lead me to this minimal reproduction scenario.

I've done a core dump, so here's what I got from lldb from a slightly larger (earlier version) of my reproduction:


core dump (lldb)

(lldb) bt all
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff5bddd86a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff5be9c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
    frame #2: 0x0000000100682ae6 node`uv_cond_wait + 9
    frame #3: 0x00000001000cbd19 node`node::TaskQueue<v8::Task>::BlockingDrain() + 39
    frame #4: 0x00000001000ccf6b node`node::NodePlatform::DrainTasks(v8::Isolate*) + 47
    frame #5: 0x00000001000af530 node`node::NodeMainInstance::Run() + 394
    frame #6: 0x000000010005a28a node`node::Start(int, char**) + 245
    frame #7: 0x00007fff5bca53d5 libdyld.dylib`start + 1
    frame #8: 0x00007fff5bca53d5 libdyld.dylib`start + 1
  thread #2, stop reason = signal SIGSTOP
    frame #0: 0x00007fff5bde078e libsystem_kernel.dylib`kevent + 10
    frame #1: 0x00000001006872c2 node`uv__io_poll + 841
    frame #2: 0x0000000100677c27 node`uv_run + 320
    frame #3: 0x00000001000cd437 node`node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Run() + 305
    frame #4: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #5: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #6: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13
  thread #3, stop reason = signal SIGSTOP
    frame #0: 0x0000000100b17bc7 node`v8::internal::compiler::TypeNarrowingReducer::Reduce(v8::internal::compiler::Node*) + 377
    frame #1: 0x0000000100ad2fe8 node`v8::internal::compiler::(anonymous namespace)::SourcePositionWrapper::Reduce(v8::internal::compiler::Node*) + 60
    frame #2: 0x0000000100a349b9 node`v8::internal::compiler::GraphReducer::Reduce(v8::internal::compiler::Node*) + 83
    frame #3: 0x0000000100a347af node`v8::internal::compiler::GraphReducer::ReduceTop() + 293
    frame #4: 0x0000000100a345bb node`v8::internal::compiler::GraphReducer::ReduceNode(v8::internal::compiler::Node*) + 83
    frame #5: 0x0000000100ad4a85 node`v8::internal::compiler::LoadEliminationPhase::Run(v8::internal::compiler::PipelineData*, v8::internal::Zone*) + 569
    frame #6: 0x0000000100acc567 node`void v8::internal::compiler::PipelineImpl::Run<v8::internal::compiler::LoadEliminationPhase>() + 121
    frame #7: 0x0000000100aca405 node`v8::internal::compiler::PipelineImpl::OptimizeGraph(v8::internal::compiler::Linkage*) + 219
    frame #8: 0x0000000100aca2bf node`v8::internal::compiler::PipelineCompilationJob::ExecuteJobImpl() + 381
    frame #9: 0x000000010021a580 node`v8::internal::OptimizedCompilationJob::ExecuteJob() + 30
    frame #10: 0x00000001002190fd node`v8::internal::OptimizingCompileDispatcher::CompileNext(v8::internal::OptimizedCompilationJob*) + 31
    frame #11: 0x0000000100219d95 node`v8::internal::OptimizingCompileDispatcher::CompileTask::RunInternal() + 395
    frame #12: 0x00000001000cbb30 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 301
    frame #13: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #14: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #15: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13
  thread #4, stop reason = signal SIGSTOP
    frame #0: 0x00007fff5bddd86a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff5be9c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
    frame #2: 0x0000000100682ae6 node`uv_cond_wait + 9
    frame #3: 0x00000001000cd526 node`node::TaskQueue<v8::Task>::BlockingPop() + 62
    frame #4: 0x00000001000cbb56 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 339
    frame #5: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #6: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #7: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13
  thread #5, stop reason = signal SIGSTOP
    frame #0: 0x00007fff5bddd86a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff5be9c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
    frame #2: 0x0000000100682ae6 node`uv_cond_wait + 9
    frame #3: 0x00000001000cd526 node`node::TaskQueue<v8::Task>::BlockingPop() + 62
    frame #4: 0x00000001000cbb56 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 339
    frame #5: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #6: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #7: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13
  thread #6, stop reason = signal SIGSTOP
    frame #0: 0x00007fff5bddd86a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff5be9c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
    frame #2: 0x0000000100682ae6 node`uv_cond_wait + 9
    frame #3: 0x00000001000cd526 node`node::TaskQueue<v8::Task>::BlockingPop() + 62
    frame #4: 0x00000001000cbb56 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 339
    frame #5: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #6: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #7: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13
  thread #7, stop reason = signal SIGSTOP
    frame #0: 0x00007fff5bdda266 libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x0000000100682fb5 node`uv_sem_wait + 16
    frame #2: 0x000000010010e006 node`node::inspector::(anonymous namespace)::StartIoThreadMain(void*) + 18
    frame #3: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #4: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #5: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13

I couldn't get llnode to work, so this is all I could get from the above.

The integrated report also wasn't very helpful, but here's the relevant part (report from an even earlier reproduction while running in jest):


Node Report

{
  "javascriptStack": {
    "message": "No stack.",
    "stack": [
      "Unavailable."
    ]
  },
  "nativeStack": [
    {
      "pc": "0x0000000100147402",
      "symbol": "report::TriggerNodeReport(v8::Isolate*, node::Environment*, char const*, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, v8::Local<v8::String>) [/usr/local/bin/node]"
    },
    {
      "pc": "0x000000010014df5c",
      "symbol": "report::WriteReport(v8::FunctionCallbackInfo<v8::Value> const&) [/usr/local/bin/node]"
    },
    {
      "pc": "0x00000001001cf90c",
      "symbol": "v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) [/usr/local/bin/node]"
    },
    {
      "pc": "0x00000001001cefd6",
      "symbol": "v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/usr/local/bin/node]"
    },
    {
      "pc": "0x00000001001ce7be",
      "symbol": "v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) [/usr/local/bin/node]"
    },
    {
      "pc": "0x000000010071ccd9",
      "symbol": "Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit [/usr/local/bin/node]"
    }
  ],
  "javascriptHeap": {
    "totalMemory": 43360256,
    "totalCommittedMemory": 41433760,
    "usedMemory": 38031208,
    "availableMemory": 2160634808,
    "memoryLimit": 2197815296,
    "heapSpaces": {
      "read_only_space": {
        "memorySize": 262144,
        "committedMemory": 32272,
        "capacity": 261832,
        "used": 31960,
        "available": 229872
      },
      "new_space": {
        "memorySize": 1048576,
        "committedMemory": 4520,
        "capacity": 1047328,
        "used": 3272,
        "available": 1044056
      },
      "old_space": {
        "memorySize": 35008512,
        "committedMemory": 34752088,
        "capacity": 34943712,
        "used": 32316584,
        "available": 2627128
      },
      "code_space": {
        "memorySize": 1212416,
        "committedMemory": 1072384,
        "capacity": 1150976,
        "used": 909120,
        "available": 241856
      },
      "map_space": {
        "memorySize": 2363392,
        "committedMemory": 2107280,
        "capacity": 2359128,
        "used": 1369600,
        "available": 989528
      },
      "large_object_space": {
        "memorySize": 3416064,
        "committedMemory": 3416064,
        "capacity": 3397216,
        "used": 3397216,
        "available": 0
      },
      "code_large_object_space": {
        "memorySize": 49152,
        "committedMemory": 49152,
        "capacity": 3456,
        "used": 3456,
        "available": 0
      },
      "new_large_object_space": {
        "memorySize": 0,
        "committedMemory": 0,
        "capacity": 1047328,
        "used": 0,
        "available": 1047328
      }
    }
  },
  "resourceUsage": {
    "userCpuSeconds": 17.6539,
    "kernelCpuSeconds": 1.12431,
    "cpuConsumptionPercent": 98.8324,
    "maxRss": 1796776919040,
    "pageFaults": {
      "IORequired": 2,
      "IONotRequired": 474408
    },
    "fsActivity": {
      "reads": 0,
      "writes": 0
    }
  },
  "libuv": [
    {
      "type": "async",
      "is_active": true,
      "is_referenced": false,
      "address": "0x0000000104102f70"
    },
    {
      "type": "timer",
      "is_active": true,
      "is_referenced": true,
      "address": "0x0000000105843328",
      "repeat": 0,
      "firesInMsFromNow": 616,
      "expired": false
    },
    {
      "type": "check",
      "is_active": true,
      "is_referenced": false,
      "address": "0x00000001058433c0"
    },
    {
      "type": "idle",
      "is_active": false,
      "is_referenced": true,
      "address": "0x0000000105843438"
    },
    {
      "type": "prepare",
      "is_active": false,
      "is_referenced": false,
      "address": "0x00000001058434b0"
    },
    {
      "type": "check",
      "is_active": false,
      "is_referenced": false,
      "address": "0x0000000105843528"
    },
    {
      "type": "async",
      "is_active": true,
      "is_referenced": false,
      "address": "0x000000010422b840"
    },
    {
      "type": "async",
      "is_active": true,
      "is_referenced": false,
      "address": "0x0000000101128958"
    },
    {
      "type": "signal",
      "is_active": true,
      "is_referenced": false,
      "address": "0x0000000103f05970",
      "signum": 31,
      "signal": "SIGUSR2"
    },
    {
      "type": "tty",
      "is_active": false,
      "is_referenced": true,
      "address": "0x0000000104231848",
      "width": 178,
      "height": 35,
      "fd": 20,
      "writeQueueSize": 0,
      "readable": true,
      "writable": true
    },
    {
      "type": "signal",
      "is_active": true,
      "is_referenced": false,
      "address": "0x0000000104303240",
      "signum": 28,
      "signal": "SIGWINCH"
    },
    {
      "type": "tty",
      "is_active": false,
      "is_referenced": true,
      "address": "0x0000000104305568",
      "width": 178,
      "height": 35,
      "fd": 22,
      "writeQueueSize": 0,
      "readable": true,
      "writable": true
    },
    {
      "type": "loop",
      "is_active": true,
      "address": "0x000000010112e690"
    }
  ]
}

V8 Engine

Most helpful comment

@niieani I don’t really have an answer for that – maybe somebody in @nodejs/v8 happens to know more. It’s a good sign that this is almost surely a bug in V8 and is fixed by a V8 upgrade; in the worst case, one could bisect V8 to figure out which commit exactly fixed this.

All 4 comments

Great to know we have a fix, but it would be useful to understand what was causing the issue in the first place, before closing.

Otherwise we cannot know if the issue simply shifted around and isn't triggered by something slightly different (given how delicate reproducing this bug is).

@niieani I don’t really have an answer for that – maybe somebody in @nodejs/v8 happens to know more. It’s a good sign that this is almost surely a bug in V8 and is fixed by a V8 upgrade; in the worst case, one could bisect V8 to figure out which commit exactly fixed this.

For people looking at this from a "react-native start" is slow perspective, you can either use node <= v12.4.0, likely node >= 12.9.0 (assuming it gets the V8 sync #28955), or (new info) based on my testing just now you might try a workaround in jest-haste-map documented here with an attached patch-package compatible patch until jest-haste-map has the workaround integrated

https://github.com/facebook/jest/pull/8787#issuecomment-520091960

Was this page helpful?
0 / 5 - 0 ratings