v14.6.0, v12.14.14.15.0-1091-oem #101-Ubuntu SMP Thu Jun 25 17:55:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linuxworker_threadsI have opened up a repository here that can be used to reliably crash node. Follow the steps in README.md to replicate.
This reproduces so long as a native node module has an active async handle for async completion.
Worker thread should not terminate until all handles are cleaned up. Modules should be able to exit gracefully when terminate is requested.
npm test
> [email protected] test /home/tylerw/github/test/native-node-tests
> node -e "w = new (require('worker_threads').Worker)('./runDoAsyncThing.js'); setTimeout(w.terminate.bind(w), 2000);"
Entering execute thread
npm test
> [email protected] test /home/tylerw/github/test/native-node-tests
> node -e "w = new (require('worker_threads').Worker)('./runDoAsyncThing.js'); setTimeout(w.terminate.bind(w), 2000);"
Entering execute thread
uv loop at [0x7f18151e1ad8] has open handles:
uv loop at [0x7f18151e1ad8] has 0 open handles in total
node[23076]: ../src/debug_utils.cc:322:void node::CheckedUvLoopClose(uv_loop_t*): Assertion `0 && "uv_loop_close() while having open handles"' failed.
1: 0x9fb000 node::Abort() [node]
2: 0x9fb07e [node]
3: 0x98e9f1 [node]
4: 0xab2e44 node::worker::Worker::Run() [node]
5: 0xab2e88 [node]
6: 0x7f1817de06db [/lib/x86_64-linux-gnu/libpthread.so.0]
7: 0x7f1817b09a3f clone [/lib/x86_64-linux-gnu/libc.so.6]
Aborted (core dumped)
npm ERR! Test failed. See above for more details.
I think this is basically looking for async versions of AddEnvironmentCleanupHook()/RemoveEnvironmentCleanupHook(). That鈥檚 possible, but demand has been pretty low since mixing Workers with libuv features is not something that usually makes sense.
Hopefully I can represent demand from 2 of my larger native projects :smile:.
I maintain https://github.com/nodegit/nodegit which I think would still overall benefit from async environment clean up hooks. In that library we leverage a separate thread pool from libuv (more information can be found here).
We kind of have 2 options for enabling nodegit in worker threads:
I maintain https://github.com/axosoft/nsfw.git. This is a custom built recursive file watcher that replaces the standard node capabilities off thread and at an OS level. It implements linux, osx, and windows file watching and requires at least one internal thread.
When file events are arriving, depending on the scale of the operation occurring in the folder you're watching, you might have to do a lot of synchronous work in javascript to find an important event. It would be nice if we could leverage NSFW in a worker thread instead of on the main thread, so that we could handle those events and do more javascript parsing of what's important off main thread and only communicate interesting changes back to the main thread.
Again my example here is imagine you're a text editor built in Electron. If you're watching an entire directory tree, you might care about events like changes to .gitignore. As well as active file changes. You might want to parse changes to the gitignore and calculate changes to full directory structures on a worker thread and have those results be posted to the main thread when the heavy lifting is done.
I think eliminating the potential double hop of retrieving events in the main thread, shipping them off to a worker thread to be parsed, and then having them come back from the worker thread for response can be limiting architecturally.
NSFW also suffers a similar abort crash through it's use of N-API. I'd love to see that resolved as well.
Hopefully these help establish why it could make sense to leverage async behavior even in worker threads. I believe that even though I'm mentioning Electron, providing the async clean up at this level would allow WebWorker in Electron to handle these concerns as well in a future release.
https://github.com/nodejs/node/pull/34572 should provide APIs that enable this, feel free to take a look :)
First I just want to thank you for such a quick response and an unexpectedly quick solution.
Ok I have read over those changes. If I'm understanding everything correctly, the async cleanup hook will provide a finalize function pointer to FinishAsyncCleanupHook to the user of the library via a callback hook provided to the library from the user. When the environment is going to shut down, the user's hook will be called. When the user of the library has completed their async work safely and is ready to shutdown, the user of the library should call FinishAsyncCleanupHook which will complete termination of the environment?
If that's what we're looking at I think that would solve my use-case.
One more question for you, will this keep the v8 context alive until the FinishAsyncCleanupHook has been called?
Part of doing async work is frequently built around persistent handles of v8 objects that have external pointers associated with them (like ObjectWrap). The usual flow is to persist the object that the pointer is attached to long enough for the async work to complete the execute thread. If any of those objects were to be marked for GC while the execute thread is in motion, but before the library has finished shutting down its async work, that would definitely lead to a segfault.
@implausible Just to avoid misunderstandings, FinishAsyncCleanupHook() is not a public API, and not exposed in any way. The intent is that the user library should call the callback (which currently is named FinishAsyncCleanupHook in the PR but might be another function at some point in the future) once it鈥檚 done, yes.
One more question for you, will this keep the v8 context alive until the
FinishAsyncCleanupHookhas been called?
Yes. However, you are not allowed to perform any calls into JS anymore.
Part of doing async work is frequently built around persistent handles of v8 objects that have external pointers associated with them (like ObjectWrap). The usual flow is to persist the object that the pointer is attached to long enough for the async work to complete the execute thread. If any of those objects were to be marked for GC while the execute thread is in motion, but before the library has finished shutting down its async work, that would definitely lead to a segfault.
The persistent handles remain alive, but keep in mind that the cleanup hook should also tear all of them down at some point before it is finished, otherwise you鈥檒l end up with a memory leak.
Perfect, I can understand all of that :smile:! Thank you again for your help.
Most helpful comment
Hopefully I can represent demand from 2 of my larger native projects :smile:.
NodeGit
I maintain https://github.com/nodegit/nodegit which I think would still overall benefit from async environment clean up hooks. In that library we leverage a separate thread pool from libuv (more information can be found here).
We kind of have 2 options for enabling nodegit in worker threads:
NSFW (Node Sentinel File Watcher)
I maintain https://github.com/axosoft/nsfw.git. This is a custom built recursive file watcher that replaces the standard node capabilities off thread and at an OS level. It implements linux, osx, and windows file watching and requires at least one internal thread.
When file events are arriving, depending on the scale of the operation occurring in the folder you're watching, you might have to do a lot of synchronous work in javascript to find an important event. It would be nice if we could leverage NSFW in a worker thread instead of on the main thread, so that we could handle those events and do more javascript parsing of what's important off main thread and only communicate interesting changes back to the main thread.
Again my example here is imagine you're a text editor built in Electron. If you're watching an entire directory tree, you might care about events like changes to
.gitignore. As well as active file changes. You might want to parse changes to the gitignore and calculate changes to full directory structures on a worker thread and have those results be posted to the main thread when the heavy lifting is done.I think eliminating the potential double hop of retrieving events in the main thread, shipping them off to a worker thread to be parsed, and then having them come back from the worker thread for response can be limiting architecturally.
NSFW also suffers a similar abort crash through it's use of N-API. I'd love to see that resolved as well.
Hopefully these help establish why it could make sense to leverage async behavior even in worker threads. I believe that even though I'm mentioning Electron, providing the async clean up at this level would allow
WebWorkerin Electron to handle these concerns as well in a future release.