One of the main determinants of time-to-first-render (TTFR) is how fast the workers are able to boot and begin processing tile data. On my laptop, the earliest I see tile requests getting made by workers is at about the 2000ms mark. We should try to improve this.
Ideas:
Style is created)StructArray format?Reduce the amount of code included in the worker blob
I found a way to reduce it to only the things it needs — see PR https://github.com/substack/webworkify/pull/30. It reduces the size from 1.12MB to 467KB, although I'm not sure whether it actually affects time-to-first-render that much — can you check @jfirebaugh?
Create only one blob, not one per worker
It seems to be too cheap to bother — the whole workify process up to blob URL creation takes just a few milliseconds in my tests.
Before:

After:

It definitely helps blob creation and worker boot time, although the overall effect on TTFR is only a few hundred milliseconds. I think much of the benefit is being lost due to poor parallelization, and we can recoup it with improved main thread scheduling (boot workers as early as possible, start style XHR as early as possible, reduce validation overhead).
Also it seems that the first message sent to the worker incurs a significant penalty (seen as orange "Function Call" bar in the DedicatedWorker timeline of the "After" results. I wonder if Chrome does lazy evaluation of worker source. Maybe we should try posting a no-op message right after creating the worker.
I wonder if Chrome does lazy evaluation of worker source
@jfirebaugh I've been looking into TTFR as well this morning -- I'm seeing a "compile script" block before the first function call:

Yeah, I see that too... just wondering why processing the first message takes so much time, but it's not attributed to any specific function in gl-js.
Looking at this further, my hunch is the unattributed time is actually deserialization of the message data, so this goes back to "Reduce the overhead of transferring layers to the worker".
Here's what contributes to TTFR if you set up explicit console.log checkpoints across the code (timings are ms since previous checkpoints):
| thread | event | time since prev |
| --- | --- | --- |
| main | loaded GL JS | 269ms |
| main | created map | 64ms |
| main | style loaded | 191ms |
| main | style created | 47ms |
| worker | worker initialized | 497ms |
| worker | got style layers | 14ms |
| worker | started parsing tile | 247ms |
| worker | parsed non-symbol layers | 85ms |
| worker | got symbol deps | 55ms |
| worker | symbols placed | 90ms |
| main | got tile buffers | 20ms |
You can see here that sending style layers isn't the bottleneck. Here's where most bottlenecks are instead:
We need to focus on investigating and fixing the first if possible.
If you use a minified GL JS build, worker initialization happens in 290ms after creating the style, down from 500ms. So it looks like it's linearly dependent on the size of the worker bundle.
Another thing that might help is rewiring some dependencies so that unnecessary code is not bundled on the worker side. One example is validation code, which takes 7% of the bundle — it's required by some StyleLayer methods but none of those get called on the worker side.
Also it seems that the first message sent to the worker incurs a significant penalty (seen as orange "Function Call" bar in the DedicatedWorker timeline of the "After" results. I wonder if Chrome does lazy evaluation of worker source. Maybe we should try posting a no-op message right after creating the worker.
It doesn't look like it's a first message penalty (tried, doesn't make any difference). According to my checkpoints research, it simply takes a while for a worker to start a thread, load the blob and evaluate the JS bundle.
Around ~120ms is spent evaluating the bundle (measured by inserting console.log checkpoints in the beginning and the end of the generated bundle in webworkify). Which is yet another reason to reduce the worker bundle size and/or break it down into parts.
Here's a minimal snippet of code that proves that it takes a long time for a Worker to parse its code:
var src = 'console.log("worker: " + performance.now());' + Array(100000).join('(function(){})();');
new Worker(URL.createObjectURL(new Blob([src], {type: 'text/javascript'})));
console.log('main: ' + performance.now());
It takes the same time if you create barebones worker and then call importScripts of an expensive script from it.
Worker startup is paying the cost of both parsing/executing the bundle for the first time, and then when actually doing work, running very slowly at first before the optimizer kicks in. And all of that is on a per-worker basis -- AFAICT there's no sharing of compiler/optimizer data between workers. This is why creating only a single worker at startup time is better for TTFR, even when multiple workers are better once reaching a steady state.
Just stumbled upon this tweet and it sounds promising for TTFR — we should definitely test it out.
In Chrome, any JavaScript files in a service worker cache are bytecode-cached automatically.
This means there is 0 parse + compile cost for them on repeat visits. 🤯
https://v8.dev/blog/code-caching-for-devs#use-service-worker-caches
Most helpful comment
Just stumbled upon this tweet and it sounds promising for TTFR — we should definitely test it out.