Mapbox-gl-js: Improve startup performance of workers

Created on 6 Sep 2016  Â·  11Comments  Â·  Source: mapbox/mapbox-gl-js

One of the main determinants of time-to-first-render (TTFR) is how fast the workers are able to boot and begin processing tile data. On my laptop, the earliest I see tile requests getting made by workers is at about the 2000ms mark. We should try to improve this.

Ideas:

  • Reduce the amount of code included in the worker blob (https://github.com/substack/webworkify/issues/12 may be relevant). (My guess is this is the most important thing we can do to improve TTFR.)
  • Reduce the overhead of creating the blob URLs for the worker

    • Create only one, not one per worker

    • Start loading them earlier (at JS load time, rather than the first time a Style is created)

  • Create only one worker at boot, and additional workers when idle

    • In my tests, booting with only a single worker (vs 3) reduces TTFR by several hundred milliseconds

  • Reduce the overhead of transferring layers to the worker

    • Manual JSON.stringify/parse?

    • Reload from style URL on worker?

    • Custom StructArray format?

  • Reduce style validation overhead (#3149, #3151, #3152)
  • Benchmark removing *.tiles.mapbox.com DNS sharding altogether (use api.mapbox.com for everything). This would avoid additional DNS and SSL startup costs.
performance

Most helpful comment

Just stumbled upon this tweet and it sounds promising for TTFR — we should definitely test it out.

In Chrome, any JavaScript files in a service worker cache are bytecode-cached automatically.
This means there is 0 parse + compile cost for them on repeat visits. 🤯
https://v8.dev/blog/code-caching-for-devs#use-service-worker-caches

All 11 comments

Reduce the amount of code included in the worker blob

I found a way to reduce it to only the things it needs — see PR https://github.com/substack/webworkify/pull/30. It reduces the size from 1.12MB to 467KB, although I'm not sure whether it actually affects time-to-first-render that much — can you check @jfirebaugh?

Create only one blob, not one per worker

It seems to be too cheap to bother — the whole workify process up to blob URL creation takes just a few milliseconds in my tests.

Before:

image

After:

image

It definitely helps blob creation and worker boot time, although the overall effect on TTFR is only a few hundred milliseconds. I think much of the benefit is being lost due to poor parallelization, and we can recoup it with improved main thread scheduling (boot workers as early as possible, start style XHR as early as possible, reduce validation overhead).

Also it seems that the first message sent to the worker incurs a significant penalty (seen as orange "Function Call" bar in the DedicatedWorker timeline of the "After" results. I wonder if Chrome does lazy evaluation of worker source. Maybe we should try posting a no-op message right after creating the worker.

I wonder if Chrome does lazy evaluation of worker source

@jfirebaugh I've been looking into TTFR as well this morning -- I'm seeing a "compile script" block before the first function call:

screen shot 2016-09-07 at 1 21 15 pm

Yeah, I see that too... just wondering why processing the first message takes so much time, but it's not attributed to any specific function in gl-js.

Looking at this further, my hunch is the unattributed time is actually deserialization of the message data, so this goes back to "Reduce the overhead of transferring layers to the worker".

Here's what contributes to TTFR if you set up explicit console.log checkpoints across the code (timings are ms since previous checkpoints):

| thread | event | time since prev |
| --- | --- | --- |
| main | loaded GL JS | 269ms |
| main | created map | 64ms |
| main | style loaded | 191ms |
| main | style created | 47ms |
| worker | worker initialized | 497ms |
| worker | got style layers | 14ms |
| worker | started parsing tile | 247ms |
| worker | parsed non-symbol layers | 85ms |
| worker | got symbol deps | 55ms |
| worker | symbols placed | 90ms |
| main | got tile buffers | 20ms |

You can see here that sending style layers isn't the bottleneck. Here's where most bottlenecks are instead:

  • Getting the worker to run (by far the biggest contributor for some reason)
  • Loading assets ("time to first byte" when requesting things like style, tilejson, sprites, tiles & glyphs)

We need to focus on investigating and fixing the first if possible.

If you use a minified GL JS build, worker initialization happens in 290ms after creating the style, down from 500ms. So it looks like it's linearly dependent on the size of the worker bundle.

3034 should help with this a bit because worker bundle parts will be loaded lazily, e.g. it won't bundle geojson-vt & supercluster until you add a GeoJSON source.

Another thing that might help is rewiring some dependencies so that unnecessary code is not bundled on the worker side. One example is validation code, which takes 7% of the bundle — it's required by some StyleLayer methods but none of those get called on the worker side.

Also it seems that the first message sent to the worker incurs a significant penalty (seen as orange "Function Call" bar in the DedicatedWorker timeline of the "After" results. I wonder if Chrome does lazy evaluation of worker source. Maybe we should try posting a no-op message right after creating the worker.

It doesn't look like it's a first message penalty (tried, doesn't make any difference). According to my checkpoints research, it simply takes a while for a worker to start a thread, load the blob and evaluate the JS bundle.

Around ~120ms is spent evaluating the bundle (measured by inserting console.log checkpoints in the beginning and the end of the generated bundle in webworkify). Which is yet another reason to reduce the worker bundle size and/or break it down into parts.

Here's a minimal snippet of code that proves that it takes a long time for a Worker to parse its code:

var src = 'console.log("worker: " + performance.now());' + Array(100000).join('(function(){})();'); 
new Worker(URL.createObjectURL(new Blob([src], {type: 'text/javascript'})));
console.log('main: ' + performance.now());

It takes the same time if you create barebones worker and then call importScripts of an expensive script from it.

Worker startup is paying the cost of both parsing/executing the bundle for the first time, and then when actually doing work, running very slowly at first before the optimizer kicks in. And all of that is on a per-worker basis -- AFAICT there's no sharing of compiler/optimizer data between workers. This is why creating only a single worker at startup time is better for TTFR, even when multiple workers are better once reaching a steady state.

Just stumbled upon this tweet and it sounds promising for TTFR — we should definitely test it out.

In Chrome, any JavaScript files in a service worker cache are bytecode-cached automatically.
This means there is 0 parse + compile cost for them on repeat visits. 🤯
https://v8.dev/blog/code-caching-for-devs#use-service-worker-caches

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stevage picture stevage  Â·  3Comments

jfirebaugh picture jfirebaugh  Â·  3Comments

aderaaij picture aderaaij  Â·  3Comments

rasagy picture rasagy  Â·  3Comments

iamdenny picture iamdenny  Â·  3Comments