If we can refactor Worker, Dispatcher, and Actor into a general purpose background processing utility (or find such an existing utility), GL JS's architecture will simplify nicely, especially around custom sources. We will remove a lot of unnecessary coupling and pave the way for new background processing features like #1042.
I think "stateless" is going to be a tough challenge here, because: (a) efficient communication of large amounts of data to/from worker threads is limited to ArrayBuffers (is this true?); (b) Parsing ArrayBuffers (i.e. VT pbf data) is moderately expensive, so that it's desirable to cache the parsed representation for reuse, e.g. when reloading a tile.
@lucaswoj do you have a vision for how we might manage this?
My vision (originally @ansis' vision) is to use StructArray for all persistent data that needs to be used by the workers.
Oh, cool. That makes sense... and definitely seems more tractable now than it did at midnight!
Did a quick scan and couldn't find any existing WebWorker libraries that might give us the functionality we need.
I'm not necessarily intent on this being stateless, so long as state is somehow associated with the caller's scope rather than managed globally within the worker. The thinner the abstraction the better.
// test.js
var PooledWorker = require('../util/pooled_worker');
var worker = new PooledWorker(require('./test_worker'));
worker.send('marco');
worker.on('polo', function() {
worker.send('marco');
});
// test_worker.js
module.exports = function() {
this.on('marco', function() {
this.send('polo');
});
}
This is almost exactly the Worker API. They key difference is that we have a m:n ratio of PooledWorkers to Workers, allowing us to create arbitrarily many PooledWorkers without any perf consequences.
@lucaswoj Can you please make sure that the following issue is also handled as part of Worker refactoring? It will have a huge impact on productivity.
This is a known issue with web workify. It hasn't been fixed for over an year.
I don't think this is possible to fix, since the worker source is not persistent in the browser โ with every reload, there are new sources generated through Blob URLs. A simple workaround is using debugger statements.
@mourner Are you suggesting that it is acceptable/normal to modify the source code every time one needs to debug a web worker in GL JS codebase? Is this how everyone normally debugs when they have to analyze any issues pertaining to this?
@mb12 yes. Also see https://github.com/substack/webworkify/issues/9#issuecomment-245328915 for a workaround that doesn't require modifying the source.
I have begun implementing the API proposed in https://github.com/mapbox/mapbox-gl-js/issues/3034#issuecomment-241140144. It is doable, though it requires some changes to webworkify (it might be worthwhile to fork and embed into GL JS).
@lucaswoj I can take on webworkify, what changes do you need specifically?
The changes we'd need to webworkify are:
require implementation to allow the addition of new modules by a call to importScripts (this will need to work seamlessly when the new modules come from a 3rd party plugin)Worker's parent thread to avoid re-transferring modules that are already loaded on the WorkerDo you think these make sense to implement in webworkify core @mourner? I'm on the fence.
@lucaswoj it's not clear to me that it will be possible to make the require/importScripts thing work transparently for 3rd party modules, because the script source => blob URL transformation needs to happen at browserify-time. So the 3rd-party module would still need to export a module that was wrapped in something like webworkify.
I think a custom/forked implementation of webworkify might be a good way to go. Right now, webworkify expects a module exporting (self) => void, and bundles it into a script that simply invokes that function on the Worker's global scope (i.e. [module code](self)).
Instead, we could have a gl-js-specific variation of it that expects a module with a more specific, useful signature -- e.g. something like (uid, style) => { methods targetable from main thread }. Then the gl-webworkify wrapper could generate a blob url for a script like self.register([module code]), where register is a method that the main worker sets up.
^ a 3rd party worker-side module would then look like:
var worker = requre('mapbox-gl-js-worker')
module.exports = worker(function (uid, style) { ... } )
@lucaswoj it's not clear to me that it will be possible to make the require/importScripts thing work transparently for 3rd party modules, because the script source => blob URL transformation needs to happen at
browserify-time. So the 3rd-party module would still need to export a module that was wrapped in something likewebworkify.
Agreed. 3rd party modules will need to be wrapped in something like webworkify.
Instead, we could have a gl-js-specific variation of it that expects a module with a more specific, useful signature -- e.g. something like
(uid, style) => { methods targetable from main thread }.
That's the general idea!
Our goal is to build a general purpose utility. The architecture will be cleanest if the concept of styles are introduced at a higher layer of abstraction.
Rather than exposing _methods_ targetable from the main thread, workers will send / receive _messages_. This will result in the simplest implementation because it closely mirrors how Workers work. The GL Native team is implementing a similar architecture.
I sketched my proposed interface above https://github.com/mapbox/mapbox-gl-js/issues/3034#issuecomment-241140144 (recently updated)
Then the gl-webworkify wrapper could generate a blob url for a script like
self.register([module code]), whereregisteris a method that the main worker sets up.
Beautiful. ๐
@lucaswoj a GL-JS-specific webworkify variation sounds good. It's a pretty simple module, and some of the code that GL JS doesn't need could be removed.
adding some state to the Worker's parent thread to avoid re-transferring modules that are already loaded on the Worker
I didn't really understand this part. Can you give an example?
@lucaswoj a GL-JS-specific webworkify variation sounds good. It's a pretty simple module, and some of the code that GL JS doesn't need could be removed.
๐
I didn't really understand this part. Can you give an example?
Suppose we have 3 modules: worker_foo.js, worker_bar.js and util.js.
Both worker_foo.js and worker_bar.js require util.js.
// worker_foo.js
require('./util');
...
// worker_bar.js
require('./util');
...
If we instantiate both worker_foo.js and worker_bar.js on the same Worker, we should transfer the code for util.js exactly once.
PooledWorker.workerCount = 1;
var workerFoo = new PooledWorker(require('./worker_foo'));
var workerBar = new PooledWorker(require('./worker_bar'));
If we're going the route of a GL-JS-specific webworkify variation, I'll keep working on this project. I've got a fair bit of code and momentum on this project already ๐
@lucaswoj I think deduping dependencies will be quite a tricky problem, but I agree it would be pretty awesome. Might be worth looking into https://github.com/substack/factor-bundle
@lucaswoj what do you mean by "transfer the code"? Do you want to create just one global worker blob with the same contents no matter how many different "pooled workers" are used?
Not exactly. I want to transfer code lazily as PooledWorkers are created and ensure no duplicate code is transferred.
PooledWorker.workerCount = 1;
// transfers the code for "worker_foo" and "util" modules to the Worker
var workerFoo = new PooledWorker(require('./worker_foo'));
// transfers the code for "worker_bar" module to the Worker
var workerBar = new PooledWorker(require('./worker_bar'));
Does that answer your question @mourner?
@lucaswoj Thanks! I think I start to grasp the concept... So the code is going to be transferred by creating a blob URL that is later loaded with importScripts and then injected into the browserify sources/cache bundle?
I think I have a good idea of how to implement bundling/deduping logic after getting into webworkify/browserify internals for that recent PR โ if you need a hand on this, I could work on it tomorrow. How much do you have already implemented locally? Want to push anything?
Also, instead of the current situation where we create a single actor/dispatcher that distributes work across workers based on an id (e.g. so that we know what worker to send updates to for a particular tile), we would simply create a PooledWorker instance for every single tile?
we would simply create a
PooledWorkerinstance for every single tile?
This is essentially what native is doing as of https://github.com/mapbox/mapbox-gl-native/pull/6337. Once you have that flexibility, you can have different types if convenient, e.g. VectorTileWorker, RasterTileWorker, GeoJSONTileWorker.
That sounds wonderful and can simplify GL JS code a lot. Thanks for taking time to explain. I'll check out the PR.
The lazy loading logic is indeed tricky but coming along... Meanwhile, there is another question to discuss: suppose we have that custom worker pool module that specifically works with browserify; what do we do with other build systems?
For webpack, currently there's webpack-workify module, and we wouldn't want to write an alternative webpack implementation of our pool module because it would be much more complicated that webworkify, and I'm not even sure lazy loading would be possible there. Do we drop the possibility of requiring mapbox-gl from WebPack and recommend requiring a browserified build instead?
Also, Are we OK with the risk of facing difficulties if we ever decide to adopt ES6 modules (e.g. with rollup bundling instead of browserify)?
suppose we have that custom worker pool module that specifically works with browserify; what do we do with other build systems?
My compatability goals are
PooledWorkers must use browserify Does that sound reasonable to you?
Also, Are we OK with the risk of facing difficulties if we ever decide to adopt ES6 modules (e.g. with rollup bundling instead of browserify)?
I haven't thought much about this... I'm open to ideas.
other developers may continue to use their existing build system
They can't โ e.g. GL JS can only be used with Webpack because of webworkify-webpack (see https://github.com/mapbox/mapbox-gl-js/blob/master/webpack.config.example.js#L45), even without any worker plugins, because it's used internally. If we switch to a custom worker spawner that depends on browserify, will we also maintain a Webpack version in parallel? I think not.
@mourner @lucaswoj this might be a little bit weird, but would it be possible to use browserify to create a bundled distribution that can be consumed by webpack? Basically an analog to what's done to support using gl-js directly from a <script> tag, but with a small wrapper to re-export mapboxgl as a node module...
It should already work without wrappers, since browserify with --standalone switch should bundle with a umd wrapper, as far as I remember. That's what I referred to when suggesting to require browserified build.
Ah, okay. But in that case, any develop who isn't using a plugin that needs PooledWorkers _could_ in fact continue using whatever build system they're already using, right? As long as they just target the browserified bundle?
Yes, although it would be a breaking change that webpack users may not like. @tmcw Do you think it would be fine?
Published an initial implementation of the worker pool module here: https://github.com/mapbox/workerpoolify. A few notes:
webworkify. I didn't see reasons not to, and it may be very useful in other contexts.worker.onmessage = function (type, data) { ... } method to handle messages, rather than doing evented-style worker.on(type, ...). This is for the sake of simplicity for initial proof of concept โ we can change this at any time; also, I didn't want to include a full-blown Evented implementation in the module, and this style is also close to native web workers API.TestWorker.sharedState = 'foo';) โ this should be good enough initially, and we can follow-up with more clever solutions.Waiting for your feedback @lucaswoj @jfirebaugh.
๐ on all the points above @mourner. I appreciate the simplicity of the API and state sharing solutions. I'll do a quick code review of the workerpoolify module now.
workerpoolifyWorker -> "workerside pooled worker" _(there are many different types of workers, Worker is the canonical identifier for a WebWorker)_// make a blog URL out of any worker bundle additions
revokeObjectURL automatically once a bundle has been received by all workers?var Worker = self.require(data.moduleId);
Worker.prototype.send = send;
Worker.prototype.workerId = data.workerId;
workerCache[data.workerId] = new Worker();
Worker's send and workerId properties@lucaswoj awesome feedback, thank you! I agree 100% with all terminology points. The module has some rough edges too โ I wanted to get it working first to get early feedback.
// make a blog URL out of any worker bundle additions
- โ๏ธ did you mean to say "bundle URL"?
Ah, typo โ I meant blob.
- Is it possible to call
revokeObjectURLautomatically once a bundle has been received by all workers?
I think so. Were there some problems with revokeObjectURL in IE11? @anandthakker I remember you adding something related in webworkify and then reverting because of some issues.
- Why did you decide to propagate all bundle additions to all workers rather than track each worker's bundles separately?
For the sake of simplicity. In practice, I think this won't matter because in 99.9% of cases all the dependencies will get there anyway โ if the amount of pooled workers is not less than the amount of native workers, all deps will be across all the pool even if you track independently.
- Modifying the prototype โ๏ธ could change existing instances of
Worker'ssendandworkerIdproperties
This is not ideal, but the only alternative I see is requiring all workers to inherit from a base prototype (that has send etc.), which will make it a bit harder to use, and I also couldn't decide on the terminology โ if the instance you create on the main thread side is of type PooledWorker, how do you call its corresponding worker-side type?
I think an acceptable solution would be to just throw an error if such methods already exist before overwriting them, and make a note in the docs.
I think so. Were there some problems with revokeObjectURL in IE11? @anandthakker I remember you adding something related in webworkify and then reverting because of some issues.
Yeah - the issue was with Edge (https://github.com/substack/webworkify/issues/26), and led to backing off of automatically calling revokeObjectURL, in favor of simply exposing the generated objectURL so that client code could decide when it was appropriate to revoke it (e.g., perhaps after getting a first message back from the worker?)
I love seeing this becoming a reality as a general-purpose module!
@lucaswoj addressed all your terminology points, added throwing an error if you try to define a custom send or workerId property, and cleaned up the code a bit. Next high-level actions for the library include:
webworkify doesn't have tests at all)styleLayerIndex).importScripts and in theory shouldn't be needed after that.@mourner This looks great! The API feedback I have is all around polishing the rough edges:
workerSources, pooledWorkers, and so on -- should be per-pool.PooledWorker API be identical to Worker. I.e. postMessage rather than send, and implements the EventTarget interface. This allows drop-in replacement for Worker and piggybacking on existing Worker documentation.Do you have ideas for how to handle broadcasts / shared state? That sounds like an interesting additional challenge.
- Why did you decide to propagate all bundle additions to all workers rather than track each worker's bundles separately?
For the sake of simplicity. In practice, I think this won't matter because in 99.9% of cases all the dependencies will get there anyway โ if the amount of pooled workers is not less than the amount of native workers, all deps will be across all the pool even if you track independently.
This is an assumption tied to the GL JS usage patterns. GL JS's usage patterns may change with the introduction of this new architecture.
var Worker = self.require(data.moduleId); Worker.prototype.send = send; Worker.prototype.workerId = data.workerId; workerCache[data.workerId] = new Worker();
- Modifying the prototype โ๏ธ could change existing instances of
Worker'ssendandworkerIdpropertiesThis is not ideal, but the only alternative I see is requiring all workers to inherit from a base prototype (that has
sendetc.), which will make it a bit harder to use, and I also couldn't decide on the terminology โ if the instance you create on the main thread side is of typePooledWorker, how do you call its corresponding worker-side type?I think an acceptable solution would be to just throw an error if such methods already exist before overwriting them, and make a note in the docs.
I think there may be some misunderstanding here. The case I'm worried about is when a user creates two workers with the same moduleId. Will both workers will end up with the same workerId and send function? Why not write the code as
var Worker = self.require(data.moduleId);
Worker.send = send;
Worker.workerId = data.workerId;
workerCache[data.workerId] = new Worker();
@lucaswoj oh, I see your point. You're right about overwriting prototype.workerId. I assume you meant the following code instead of the one above?
var Worker = self.require(data.moduleId);
var worker = workerCache[data.workerId] = new Worker();
worker.send = send;
worker.workerId = data.workerId;
I wanted workerId and send to live in the prototype so that send can be called within the constructor, like this:
module.exports = function Worker() {
// do stuff
this.send(...);
};
To make it work, I'd probably have to either make send async (with setTimeout), or make a subclass of the worker for each instance:
var WorkerClass = self.require(data.moduleId);
function Worker() {
WorkerClass.apply(this, arguments);
}
Worker.prototype = Object.create(WorkerClass.prototype);
Worker.prototype.send = send;
Worker.prototype.workerId = data.workerId;
workerCache[data.workerId] = new Worker();
Another option would be to ban calling send from within the constructor. Which option do you prefer? I think I'm inclined towards the subclass one.
@jfirebaugh thanks, good feedback! I'll make it a factory.
implements the EventTarget interface
This is the one I'm not sure about โ would you duplicate the code from evented.js in the library, or pick some other EventTarget implementation?
It's too bad that native EventTarget is not usable or subclassable in JavaScript.
Do you have ideas for how to handle broadcasts / shared state? That sounds like an interesting additional challenge.
I'm thinking of creating something like a GlobalPooledWorker class that creates a single workerside instance in each native worker of the pool, with broadcast(data) sending the data to all instances. We could then send the shared state along with mapId and save it in the global native worker scope (self.layerIndices[mapId] = ...) in specific GlobalPoolWorker implementation, then access it from the usual PooledWorker instances. This is looking pretty ugly, but I haven't yet come up with a nicer approach โ perhaps you have better ideas?
Do you think we could get away with passing send (aka postMessage) as a constructor arg?
var Worker = self.require(data.moduleId);
var worker = workerCache[data.workerId] = new Worker(send, workerId);
@lucaswoj this would make the API clunkier; would you need to always do this.send = send in the constructor to be able to call send from methods?
module.exports = function Worker(send) {
this.send = send;
};
Worker.prototype.doStuff = () {
this.send('foo');
};
Could you get away with a minimal addEventListener and removeEventListener? E.g.:
Worker.prototype.addEventListener = function (type, listener) {
if (type !== 'message') {
return nativeWorker.addEventListener(type, listener);
}
listener.__id__ = listener.__id__ || unique++;
listeners[this.workerId][listener.__id__] = listener;
}
Worker.prototype.removeEventListener = function (type, listener) {
if (type !== 'message') {
return nativeWorker.removeEventListener(type, listener);
}
delete listeners[this.workerId][listener.__id__];
}
However, on the worker side, the typical code would be self.addEventListener('message', ...), and I don't know if you can change what self refers to in that context. You might have to require that addEventListener be called on WorkerClass constructor parameter rather than self, at which point it's no longer a drop-in replacement.
Anyway, not 100% sure what the best solution is, mainly just brainstorming.
I agree that requiring this.send = send in the constructor is about as clunky as requiring a setTimeout in the constructor.
A third option here is to use composition rather than inheritance:
// workerpoolify
var createWorker = self.require(data.moduleId);
var worker = workerCache[data.workerId] = createWorker({send, workerId, on});
// worker implementation
module.exports = function(worker) {
worker.on('marco', function() {
worker.send('polo');
});
};
Another option would be to ban calling
sendfrom within the constructor.
This might be acceptable.
@lucaswoj composition is an option too. I'll try the subclass option first to see whether it works well โ it doesn't appear to have any drawbacks apart from a potentially small performance overhead, but we can switch this at any time if we discover problems.
@jfirebaugh Since targeting a full drop-in Worker replacement is non-trivial, I would opt for a barebones implementation with this.onmessage = instead of addEventListener for now. We can revisit later after we flesh out other details.
Did a bunch of work on workerpoolify today based on your feedback @lucaswoj @jfirebaugh:
workerId collision with prototype inheritance, so multiple workers of the same type on the same native worker work as expected.importScripts. Need to test on Edge though.The only major thing left to address is the shared state challenge. I'm going to try approaching this as described in https://github.com/mapbox/mapbox-gl-js/issues/3034#issuecomment-255880139, @jfirebaugh do you have anything to add/suggest on this? After it's solved, we can try refactoring GL JS with workerpoolify.
Looks great @mourner! Thank you.
@jfirebaugh I just realized that we can implement shared state without any special classes. All it takes is creating N pooled worker instances (where N is the number of native workers) for setting shared state. They will be instantiated in each native worker, and we'll be able to broadcast changes by sending data from each instance in a simple loop.
Outline of a potential future architecture using a worker pool:
VectorTileSource spawns:VectorTileWorker for each tile, which handles:loadreloadabortredoPlacementGeoJSONSource spawns:GeoJSONWorker (creates geojson-vt/supercluster index)GeoJSONTileWorker for each tile (extends VectorTileWorker)StyleStyleWorker (one for each native worker, manages styleLayerIndex)setLayersupdateLayers