Node: When Node addon creates a new isolate, use it to execute JS code and let it triggers GC, it will crash

Created on 1 Nov 2017  Â·  28Comments  Â·  Source: nodejs/node

  • Version: v8.5.0 or higher
  • Platform: Darwin/Linux/Windows (x64)
  • Subsystem:
  1. Create a simple Node addon.
  2. In the Init(), create a thread and create a new isolate in that thread.
  3. In the Method(), notify the thread to execute some javascript code.
  4. When the javascript code triggers GC, the process will crash.
  • Crash error on Ubuntu 16.04.2

    #
    # Fatal error in ../deps/v8/src/heap/heap-inl.h, line 228
    # Debug check failed: gc_state_ == NOT_IN_GC.
    #

    #
    # Fatal error in ../deps/v8/src/heap/heap.cc, line 452
    # Debug check failed: !AllowHeapAllocation::IsAllowed() && gc_state_ == NOT_IN_GC.
    #

  • Crash error on macOS Sierra 10.12.6

    node(27735,0x7fffad7f03c0) malloc: * error for object 0x10231a980: pointer being freed was not allocated
    set a breakpoint in malloc_error_break to debug
    node(27735,0x7000075f6000) malloc:
    error for object 0x10231a778: incorrect checksum for freed object - object was probably modified after being freed.
    *
    set a breakpoint in malloc_error_break to debug

  • Crash without error outputted to console (Windows7)

Please refer to https://github.com/fs-eire/node-modules-playground/tree/test-node-crash for the code.

Other findings:

  1. If we block the Node main thread, ( put a while-true in the end of the javascript code) the crash will not happen.
  2. The crash always happens when a Full GC is in operation.

We released Napa.js as a Node module sharing the same v8 platform, and created isolates \& manipulated them the same way as the sample addon.

A workaround we could think of is to link with a separate v8 shared library if sharing the same platform is not desired. Actually we started by doing that, but afterwards changed to dynamic linking to V8 from Node for easier build and distribution.

Please kindly suggest if this is a bug in Node, or a behavior by design.

V8 Engine V8 Platform

All 28 comments

Hey! Sounds like you might be interested in something like https://github.com/ayojs/ayo/commit/2f47b5b9aa3239f549b6e2986b4ad19724937158, which I did for a multi-threading Workers implementation? Let me know if that gets you anywhere.

Please kindly suggest if this is a bug in Node, or a behavior by design.

Neither really. What you are doing is simply not supported at the current time. Please keep in mind that native modules can do virtually anything (as any other C or C++ code) with no guarantee that you won't explode Node.

Until recent times there was never a need to support multiple isolates, so that work was never done or completed.

@Fishrock123 I understand that multiple isolates are not explicitly supported by Node. However, this issue only happens on Node 8.5.0 or later. When using Node 6.x, 7.x or 8.0~8.4.0, it works as expected without crash.

Maybe it is a regression in 8.5.0 release?

@fs-eire Can you run git bisect to find a specific commit?

commit found: https://github.com/nodejs/node/commit/9e08695f85d4273f01e010cf384f42030d66b453 @addaleax @matthewloring

This change introduces a new implementation of v8::Platform. It looks related to this issue.

@fs-eire Oh, I’m sorry, I didn’t know you were looking for the cause of this.

Yes, this happens because Node has its own v8::Platform implementation now, which doesn’t care about multiple isolates. I’ve linked a patch above that I think should take care of most things.

I’ve opened https://github.com/nodejs/node/pull/16700 with the patch(es) I mentioned above, please feel free to check that out and see whether it helps resolve the issues you were seeing.

@addaleax I've tested the change. I encountered the following error when calling script->Run():

/Users/eire/nodejs/out/Release/node[78814]: ../src/node_platform.cc:203:node::PerIsolatePlatformData node::NodePlatform::ForIsolate(v8::Isolate *): Assertion `(data) != (nullptr)' failed.
1: node::Abort() [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
2: node::(anonymous namespace)::DomainEnter(node::Environment
, v8::Local) [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
3: node::NodePlatform::ForIsolate(v8::Isolate) [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
4: node::NodePlatform::CallOnForegroundThread(v8::Isolate
, v8::Task) [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
5: v8::internal::IncrementalMarking::Start(v8::internal::GarbageCollectionReason) [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
6: v8::internal::LargeObjectSpace::AllocateRaw(int, v8::internal::Executability) [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
7: v8::internal::Heap::AllocateRaw(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
8: v8::internal::Heap::AllocateUninitializedFixedDoubleArray(int, v8::internal::PretenureFlag) [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
9: v8::internal::Factory::NewFixedDoubleArray(int, v8::internal::PretenureFlag) [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
10: v8::internal::(anonymous namespace)::ElementsAccessorBase >::GrowCapacity(v8::internal::Handle, unsigned int) [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
11: v8::internal::Runtime_GrowArrayElements(int, v8::internal::Object
, v8::internal::Isolate) [/Users/eire/test-1/node-modules-playground/../../nodejs/out/Release/node]
12: 0x2d73b7c042fd
Abort trap: 6

We have a similar issue during development of a module which uses multiple isolates.
We were trying to trace this issue back to v8 and exchanged some emails with the GC team but without any solution.
we were able to workaround this issue by using "--noincremental-marking" and tested it with recent node versions (v9.0.0, v9.1.0).
@boriskozo

@fs-eire Sorry, missed your message. Which overload of CreateIsolateData are you using? Can you share any code? Edit: Sorry, missed the link to your repo in the OP :)

@YafimK Can you share any code that would help reproduce the issue?

I've used the example provided above (by @fs-eire ) and I got the following:

Using a debug compilation of the node master branch that includes your fixes (on windows)

  1. got the following error:
    image
  2. using the flag "--noincremental-marking" it seems to work.

Using a debug compilation of node v9.1 -(on windows)

  1. I've got another issue -
    image
  1. again the flag "--noincremental-marking" seems to make it disappear.

Right, there is no way to associate an Isolate without grabbing hold of the platform explicitly, so the platform can’t be informed about a new Isolate being created and that’s what’s causing the trouble here.

I’ll think about how to best work around that… it would probably be easiest to expose the default NodePlatform* instance to the public here, if there is one.

Hi @addaleax,

Right, there is no way to associate an Isolate without grabbing hold of the platform explicitly, so the platform can’t be informed about a new Isolate being created and that’s what’s causing the trouble here.

I don't fully understand why multiple Isolate in NodePlatform may cause the crash. Could you please take me to the point?

I’ll think about how to best work around that… it would probably be easiest to expose the default NodePlatform* instance to the public here, if there is one.

Is there a prototype commit that to prove this idea? If so I would like to have a try on my test.

Thanks!

I'm facing the same issue and I see that #16700 is closed. Does it mean that one of the future versions of Node will support separate unrelated v8 isolates loaded within Node modules? If so, what's the planned timeline for this to be released? Thanks!

Oh, I see that it's in 9.3.0. Will give it a try. Thanks.

Just for my understanding - assuming Node is running on thread t1 that contains Isolate i1 how will it manage interoperability with another isolate i2? If a JS code in t1 is driving into a native addon code, neither the current linkage conventions allow an Isolate to be passed to it (it is implicitly understood to be the current Isolate) nor the addon can meaningfully work in a new Isolate (other than the caller's one) without ability to receive input or return output. Or are we talking about a second thread t2 managing a second Isolate? If that is the case then t2 has no bearing with Node.js and its Platform / Isolate instances, and t1 and t2 are fully independent, unless they attempt to touch other's data?

We are talking about the 2nd case.

The class NodePlatform introduced in commit 9e08695 need to know all isolates. If t2 created a new isolate i2 using standard v8 API, there is no way for NodePlatform to know the existence of i2, which breaks this assumption.

Until Node exposes the v8 platform in some way or allows users to register their isolates with the v8 platform, it seems not safe to hook into Node's copy of the v8 and maintain separate isolates in the application. I ended up building the v8 as a static library and using it independently from Node. Here's the discussion in the v8 forums, if anyone is interested:

https://groups.google.com/d/msg/v8-users/wZHFUrGS2ms/OxosgTf4BQAJ

@gh-andre We do have GetMainThreadMultiIsolatePlatform() available now, though. Does that help?

@addaleax Thanks for pointing out the new method. It will certainly help once it makes it into an LTS release.

thanks @fs-eire for the clarification and @gh-andre for more context.

So the main motivation behind using a native add-on is to get hold of the Node's existing Platform, so as to pin the new isolate into the existing and fully booted v8 instance rather than spawning a separate one?

And the main motivation behind having a separate isolate is to run some logically related (but structurally separated) JS workload in that, to improve concurrency and throughput?

And t2 (or t3, ... tn) will be executing only pure JS code (not involving Node.js APis that involve libuv)?

Hope you don't mind these questions - for me this is an interesting use case.

Thanks for pointing out the new method. It will certainly help once it makes it into an LTS release.

I don't know whether @fs-eire and @gh-andre work on the same project or not, but for testing purpose if you can use the API (GetMainThreadMultiIsolatePlatform) and see how it fares, it helps to see if there are other minor gaps and fill those, else it helps us to resolve this case.

@gireeshpunathil Thanks for chiming in.

So the main motivation behind using a native add-on is to get hold of the Node's existing Platform, so as to pin the new isolate into the existing and fully booted v8 instance rather than spawning a separate one?

I have a large native app running on Node and a lot of legacy JavaScript code implemented using MS COM JavaScript that runs a ton of JS in parallel within the same process. The desire was to leverage the knowledge of v8 from the Node-based part of the project and reduce the application footprint by not having to host two JS engines. A separate v8 also poses various challenges, like managing which ICU copy is being plugged in, etc.

but for testing purpose if you can use the API (GetMainThreadMultiIsolatePlatform) and see how it fares

I wish this would came up a couple of months ago. I went through a process of testing Node/v8, Chakra and running a separate v8 instance while looking for a solution and ended up using a statically-linked v8, which is completely isolated from the v8 in Node. I'm in the middle of this project and cannot switch at this point, but I am very interested in this new development and when this new method makes into the version of Node we are using (usually an LTS release), I will revisit using a standalone v8 within a Node process.

just wondering whether landing of worker will make things better here? /cc @addaleax

I think the node worker helps when the requirement can be fit by using worker API (i.e. require('worker-threads')) in javascript code. If seeking for some ways to manipulate those worker threads in native code ( addon or embedding), those are not ready at the moment.

Worker threads, which are similar to how JXCore approached it years ago, require the main script to route all requests and marshal/unmarshal text between isolates. If one wants to run independent scripts in parallel (i.e. no JS data sharing at all), it seems to be impossible in the current Node. I'm loading a separate instance of v8 to do that, which takes up a lot of resources and requires some development work. It would be nice if there was a way in Node to run scripts independently in their own isolates.

@gh-andre @fs-eire I don’t really see which pieces are missing at this point to create an addon that runs scripts in new Isolates on a different thread?

For the context:
$ cat w.js

const w = require('worker_threads')
if (w.isMainThread) {
  new w.Worker(__filename)
  console.log('main')
}
else {
  console.log('worker')
}

$ node --experimental-worker --prof w.js

worker

$ ls -lrt isolate*

-rw-r--r--  1 gireesh user 167250 Jun 14 10:57 isolate-0x37d9ac0-v8.log
-rw-r--r--  1 gireesh user 204231 Jun 14 10:57 isolate-0x3719c70-v8.log

evidently they are running in two different isolates.

@addaleax It's not about create an addon that runs scripts in new Isolates on a different thread. It's about how the addon can manipulate the thread. For my requirement, specificly, I am thinking about building a thread-pool based on current worker. Some requirements are -

  • A long-run worker instead of an one-time worker so that it can be re-usable;
    It should be a trival change
  • Some functions that passing messages through different thread;
    This requirement seems can be fulfilled since we already have MessagePort and MessageChannel.
  • Addon support in worker.
    This is not supported yet but it is on the way.
  • Idle notification: a machenicam to notify the addon when a worker becomes idle.
  • Advanced ability for addon. like force abort the execution, manipulate the uv loop.

I believe that it's debatable for some of the items above for whether those requirement are valid or not. Those are the current pain point. Just trying to understand what is the preferred way to do.

I think https://github.com/nodejs/node/pull/30324 should have addressed the requests here.

Feel free to re-open or open a new issue if there is more, though.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

loretoparisi picture loretoparisi  Â·  3Comments

danielstaleiny picture danielstaleiny  Â·  3Comments

srl295 picture srl295  Â·  3Comments

cong88 picture cong88  Â·  3Comments

jmichae3 picture jmichae3  Â·  3Comments