Emscripten: Support for node.js workers

Created on 22 May 2018  Â·  28Comments  Â·  Source: emscripten-core/emscripten

Work has been started on workers in node.js (https://github.com/nodejs/node/pull/20876). The idea looks great, and we should make sure our pthreads code works there too. The API (which presumably isn't final) looks more or less like web workers, so it shouldn't be too hard to make it work.

help wanted

All 28 comments

I had forgotten that someone had already filed a bug about this previously: https://github.com/kripken/emscripten/issues/6332 In any case, we should make this work for node.js as well as browsers.

Experimenting with this, it is blocked on https://github.com/nodejs/node/issues/25630 - looks like in node a blocking main does not send postMessages to workers (or perhaps blocks all logging from them).

or perhaps blocks all logging from them

Just to confirm: that's what happens.

You can postMessage from main to worker, no problem, but console.log() in a worker sends the string to the main thread, to print it on the worker's behalf.

The main thread won't process that string until control returns to the event loop and fixing that is not trivial.

I see, thanks @bnoordhuis!

Maybe we can work around it by using stdout/stderr directly? Or would that not work either?

No, unfortunately. All stdio is mediated in Node's workers.

I see, thanks. Ok, sadly I think that means we can't use node workers in a simple way here.

I believe bleeding edge Node should have all the support necessary.

Latest node may be enough, yeah. Last I tried it seemed possible, but getting logging working from a pthread (while the main is busy) was hard, so it was difficult to make progress. I have a wip branch here:

https://github.com/emscripten-core/emscripten/tree/node-pthreads

I may not have time for this myself in the near future. If someone has time to look at that that would be great, it might be a matter of finding some hack for debugging (like maybe printing to a file?), then fixing some minor issues.

I looked at this a little more now, there is some progress in the branch. Overall I keep hitting the many small differences between a Web Worker and a Node Worker.

For example the current blocker is that on the web we use importScripts to load the main x.js from the x.worker.js. Node doesn't have importScripts. The closest I can find is to load the code and do a global eval, but when doing that, require() is not defined, and we need that to be able to read files etc. Any ideas?

@kripken Maybe I’m misunderstanding, but what’s a “global eval” in that context? eval('require') should give the require function as long as the outer context already has it.

@addaleax

It's something like eval.call(null, '...'), which can add stuff to the global scope (and not just the current function scope if there is one), which we need to match importScripts. The emscripten code is here and here's an example of why it's needed:

function setup() {
  eval('function foo() {}'); // normal eval
  console.log(typeof foo); // ok
  eval.call(null, 'function baz() {}'); // global eval
  console.log(typeof baz); // ok
}
setup();
console.log(typeof foo); // bad!
console.log(typeof baz); // ok

Turns out in node a normal eval does see require, but a global one doesn't :( But it does work in V8 and SpiderMonkey...

Maybe there's a better way I'm missing?

Okay, I see – thanks for the explanation.

In that case, maybe doing something like we do for the REPL in Node.js is a good approach? Basically, we copy/provide Node’s special per-module variables, require, module, exports, require, __dirname and __filename to the global object, and then run eval() on the script.

I think that’s something that makes sense when your goal is to have everything in the global scope, as it would be in a Web Worker?

Interesting, thanks - so there's a way I can copy require to the global scope, and global eval would see it? How would I do that copy?

So, just to avoid any confusion, I’m basically talking about doing this inside the Worker:

global.require = require;
eval.call(null, 'require("console").log("hello, world!")');

The catch is that this picks one require function out of possibly many, namely the one for the main Worker script, but I think that that’s what you might want?

Thanks @addaleax! That works great.

However, we do have more things than require that we'll need (I hit this on Module now), and some of them must be written to (like noExitRuntime, buffer, etc.). Here's a testcase:

var x = 0;
eval('x = x + 1');
console.log(x);
global.x = x; // must be commented out if not in in node
eval.call(null, 'x = x + 1');
console.log(x);

That prints 1 and then 2 in non-node environments, but in node it prints 1 twice. Is there a way to make that work?

@kripken I think that’s because var in Node.js does not create global variables, because the individual scripts are run inside function wrappers; Replacing the first line with global.x = 0 seems to make this work?

@addaleax Yeah, replacing all those could work. We do have a bunch of them, though, so it's not obvious what to do here, and I worry about making the code less maintainable while still supporting both the web and node.

I'll think some more on this and try to get back to it when I have time, but if someone else has a good idea of how to proceed that would be great too.

@kripken If you do want everything to work like in the browser scope-wise and hav, you could probably just wrap it all in a global eval (or Node’s vm.runInThisContext() which does a similar thing) except for a small intro bit that copies some variables to the global scope?

(Maybe I’m having a simplistic view of things here, but ultimately I’m invested in making this work, too)

I think in nodejs module scope is different to global scope (I am not a nodejs programmer to be clear).
I got this working by assigning process, __filename etc like this in globalEval

function globalEval(x) {
    this.exports = exports;
    this.require = require;
    this.module = module;
    this.__filename = __filename;
    this.__dirname = __dirname;
    eval.call(null, x);
  }

And then just changing all of the variable definitions in worker.js to global variables, instead of using var. Not a particularly elegant solution, but it does work (tested on a fairly complicated threaded application).

@addaleax @cambeaney

A problem is that we want our builds to run on both the Web and Node.js. But it seems like in worker.js we need var threadInfoStruct = 0; on the Web but global.threadInfoStruct = 0; on Node?

If there's no other option we can force such builds to be either Web or Node but not both, but this is the first time we've run into such a situation, so it would be something we need to consider carefully.

@kripken A var inside a global eval does result in a global variable, both in Node.js and in the Web; so the main options for consistency between the Web and Node.js seem to be either a) grabbing the global object and then always setting properties on that, or b) wrapping most of the worker.js code itself in a global eval and using var declarations, e.g. along the lines of @cambeaney’s suggestion?

Oh thanks @addaleax, I missed the part about running worker.js itself in a global eval. Yes, that seems like it might help!

Meanwhile I am refactoring this code in #9569 which will remove some of those globals, and may remove almost all the ones we need to share between the files. After that lands I'll get back to this and see how things look.

I just had a quick look, and one other issue I ran into was that node.js workers don't support addEventListener. They have a bug open to add support: https://github.com/nodejs/node/issues/26856.

@VirtualTim Node.js currently doesn’t support EventTarget for any built-in, and I wouldn’t count on that changing soon (unless you’re willing to put work into that yourself :slightly_smiling_face:). The only thing we do support is .onmessage for MessagePort instances, for alignment with the web.

Ok, after merging in that PR there is good news and bad news. The good news is that it looks like a simple program can work! It creates 3 pthreads and they each do some work and print out their progress :)

The bad news is that it looks like a pthread exiting shuts down the whole node.js instance. Shutdown is done by throwing an exception, which on the web stays in the worker (it's logged to the console, but that's it), while in node it seems to halt the entire application,

a.out.js:154
      throw ex;
      ^

Error [ERR_UNHANDLED_ERROR]: Unhandled error. (55)
    at Worker.emit (events.js:198:17)
    at Worker.[kOnErrorMessage] (internal/worker.js:176:10)
    at Worker.[kOnMessage] (internal/worker.js:186:37)
    at MessagePort.<anonymous> (internal/worker.js:118:57)
    at MessagePort.emit (events.js:209:13)
    at MessagePort.onmessage (internal/worker/io.js:70:8) {
  context: 55
}

and the node.js process exits with code 1. This is a problem since threads should call pthread_exit(), and that method must throw (it's marked as noexit in C, so clang emits an unreachable at the end, which throws; we normally don't reach it and throw a JS exception, which is on line 154 there; removing that, it just prints that the unreachable is thrown). We should be able to handle pthreads exiting and new ones being started etc., so it's not immediately obvious to me what to do.

@kripken Would trying to call process.exit() in Node.js Workers an option (possibly guarded by a check for the presence of it)? It throws an un-catchable JS exception that terminates the Worker, so it should do exactly what you want.

Otherwise, listening to the 'error' event on the Worker instance should be the way to go…

Excellent, thanks @addaleax! That works :)

PR is finally up, with a first passing test! #9745 Thanks again for all the help here :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

answer1103 picture answer1103  Â·  4Comments

JCash picture JCash  Â·  3Comments

lokpoi888 picture lokpoi888  Â·  4Comments

ShawZG picture ShawZG  Â·  4Comments

hcomere picture hcomere  Â·  3Comments