[Forked from discussion in #9579]
Rather than disallowing blocking on the main thread and giving an escape hatch in form of a setting that still allows it, we could transform pthread_join and friends via Asyncify instead and use Atomics.waitAsync under the hood.
This way the main browser thread wouldn't really be blocked, but the code would be still backwards-compatible and properly wait for thread to finish execution as user expects.
This should be straightforward to add if someone's interested. For examples of asyncified functions, see for example emscripten_sleep and other functions after it. We'd want to do the same to pthread_join which is defined in library_pthread.js. Note that the function it calls, _emscripten_do_pthread_join, already has a parameter to allow nonblocking behavior, so building on top of that should help.
So would this require building with another flag, or would this just modify the existing pthreads code?
Either way being able to join on the main thread without hanging would be really great, that's probably the biggest difference between Emscripten compiled code and native. And since native code doesn't really have a concept of a 'main thread' it can be hard to adjust code around that limitation.
This would be behind the existing ASYNCIFY .. so it would still be opt in in that sense. But we could direct users towards this in the warning message.
@sbc100 Yeah, and after this is implemented, I thought we might flip the recent flag default value, so that users would need to explicitly opt in into either ASYNCIFY or ALLOW_BLOCKING_ON_MAIN_THREAD as means of using pthread_join.
I'm not sure about the particular case of pthread_join, but @RReverser brought this up in the context of workers being stuck in Atomics.wait: Would it be possible for Emscripten to not block workers while waiting for a thread to be scheduled, but instead wait in the main event loop and schedule the thread via postMessage?
@bmeurer I'm not sure I understand your context... But what this would allow is to specifically not have Atomics.wait in your code, and instead in each of these places wait using Atomics.waitAsync, so that it wouldn't actually block the thread, but rather just wait in the corresponding thread's event loop.
@RReverser Right. So I'm wondering if we could extend this in such a way that also Emscripten wouldn't use Atomics.wait for the thread pool workers, but instead just sit in the worker event loop.
Ah yeah, that's what I had in mind too.
Would it be possible for Emscripten to not block workers while waiting for a thread to be scheduled, but instead wait in the main event loop and schedule the thread via postMessage?
I think that could work with Asyncify, yes - I think we always call Atomics.wait from emscripten_futex_wait, so we'd asyncify that function.
However, a downside is that it means anything that can reach a call to emscripten_futex_wait would get asyncified, which means lots of code using locks, malloc, etc. That might make real-world code significantly slower, as it may mean the entire program ends up asyncified (instead of the more common Asyncify use case, where just code that can reach emscripten_sleep or another special function).
(Without Asyncify, I don't think that would work on general code, as it would require the main thread to yield to the event loop before the worker activates.)
@kripken Yes, that's basically what I meant, except that I was thinking only about the logic that deals with waiting on the workers for threads to be scheduled via pthread_start. maybe you could use a different version of emscripten_futex_wait just for that?
On the DevTools side we'd still need to deal with workers that are really blocked in Atomics.wait of course, but at least for the case where the worker is literally idle and not doing anything useful, it'd be parked in the main event loop.
I see, thanks @bmeurer - yes, that does sound much more practical.
Note though that I doubt we can convince most people to use Asyncify by default, because of the significant overhead. However, perhaps they can do this while debugging at least.
I'm not a big fan of having the debug build use different mechanics as that will make it harder to debug subtle issues that were detected in production. I feel like we should use asyncify where it makes sense (in both debug and release) and otherwise come up with a debug story for Atomics.wait
I think asyncify should continue to be opt-in at the point; its a pretty big hammer
I also think the defaults should not be different for debug and release.
@bmeurer can you link specifically the code you are referring to as " the logic that deals with waiting on the workers for threads to be scheduled via pthread_start"? Are you talking specifically about the thread pool, or all thread creation?
I looked into the worker startup wait in depth this morning, and it turns out Emscripten doesn't do Atomics.wait in the worker pool or elsewhere in worker startup. It actually does a postMessage, and the worker's startup waits for that message.
The sync startup issue appears to be confined to Web Worker creation. Once the Worker has been asynchronously created due to PTHREAD_POOL_SIZE, it waits for a message (specifically 'run') to start running.
To verify this, I removed Atomics.wait entirely from emscripten and ran a bunch of tests, and as expected, tests can pthread_create etc., without ever calling Atomics.wait. Only tests that use mutexes and such end up requiring it.
(Note that if the application did not use PTHREAD_POOL_SIZE, then pthread_create becomes async, and the main thread must yield before the worker is responsive, but it still won't use Atomics.wait anywhere.)
@bmeurer Where did we see an application do Atomics.wait during pthread startup? Perhaps that happens in a userspace worker pool - if so, then I think our main options are to document the issue + ask people to refactor, or have Asyncify run on futex_wait (but I agree that's of limited value).
@kripken Uhm, now I also looked through the generated .worker.js in detail (which I should have done before posting...), and you're right, it's all done via the main event loop. We had seen this in an issues that we've been looking into for a while now, and I was somehow convinced that the Atomics.wait originates from Emscripten. Sorry for the confusion.
So this is already working as expected regarding thread creation, which I just verified on a trivial demo.
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.
bump
Most helpful comment
This would be behind the existing ASYNCIFY .. so it would still be opt in in that sense. But we could direct users towards this in the warning message.