Three.js: Implement KHR_parallel_shader_compile support

Created on 23 Apr 2019  路  13Comments  路  Source: mrdoob/three.js

This new KHR_parallel_shader_compile extension will bring shader compilation off to the main thread and even if the total compiling time could remain the same it won't be blocking the main thread which is a huge benefit.

It will require some modifications on the way three.js is handling the compilation and linking of shaders and move it from sync to async. Some ideas have been proposed on the Khronos mailing list and the Khronos repo PR

Just opening the issue so we could discuss ideas on how to address this change.

Enhancement

Most helpful comment

An alternative take.

The Khronos WebGL group recommend not checking shader compilation status, but just doing:

  • compile shaders
  • link program
  • check link error and only check shader errors if link fails

for all cases, not just using KHR_parallel_shader_compile.

The link operation serialises automatically on the compiles in the background. So at least with Chrome you get the benefits of any built in parallel/asynchronous operation that already exists. Firefox actually checks retrieves the completion status and logs as part of the compile and link operations and caches for later use at the moment, so no real benefit there.

So I have removed error checking and display from WebGLShaders.js and reworked it in WebGLProgram.js, this could be submitted upstream without changing functionality.

https://github.com/aardgoose/three.js/tree/parallel1

Building on this to enable KHR_parallel_shader_compile, but now only needings two states rather than three.

https://github.com/aardgoose/three.js/tree/parallel2

Still some minor issues, probably something that should be enabled on a renderer or material basis as required rather than by default. Tested with Chrome canary and my hacked version of Firefox.

@takahirox

All 13 comments

Some initial thoughts from my understanding after reading the thread and the spec:

Currently without parallel compiling, the pipeline looks more or less like this:
image
using the extension, even on the worst case with just one thread, it will looks the same and it will take as much time as without the extension, but it won't block the main thread, so we will still have a huge benefit using it.

With the latest debug (false by default) to the status check on the shader compilation we avoid unnecessary blockings WebGLShader.js#L23-L40, but with the extension enabled we will still stall on linkProgram WebGLProgram.js#L586-L605
In this case, without modifying anything and enabling the extension we could get benefits of parallel compiling the fs and vs shaders, while linkProgram will wait for them to complete, looking similar to this (assuming 2 threads):
image

In order to get the best of it we should group the compilation and linking, as @jdashg proposed on https://github.com/KhronosGroup/WebGL/pull/2855#issuecomment-483486677:

for (const x of shaders)
   gl.compileShader(x);
for (const x of programs)
   gl.linkProgram(x);

Using this way we could get a timeline similar to:
image

Grouping the shaders compilation together could be easy enough to achieve as we could just compile them as they go, while grouping the linkProgram will need substantial modifications on the code, to queue them and call them after all the compile shader calls have been executed.
It could be done using also the approach proposed by Jeff:

function* linkingProgress(programs) {
   const ext = gl.getExtension('KHR_parallel_shader_compile');
   let todo = programs.slice();
   while (todo.length) {
      if (ext) {
         todo = todo.filter(x => !gl.getProgramParameter(x, ext.COMPLETION_STATUS_KHR));
      } else {
         const x = todo.pop();
         gl.getProgramParameter(x, gl.LINK_STATUS);
      }
      yield 1.0 - (todo.length / programs.length);
   }
}

Introducing this asynchrony we need to define a isReady() function to determine if the compilation and linking has been finished or not, similar to how babylon is doing currently.

We should use that isReady() function on the WebGLRenderer's AnimationLoop so we could avoid executing it until isReady() returns true.
Although it could be interesting to add that check on WebGLRenderer::render().
At the same time it should be handy to provide a callback so users could be notified when this happens in case they are not using setAnimationLoop and rely on a requestAnimationFrame/animate() loop instead.

I've prototyped splitting the compilation phase and link phase of programs to see if you can get much benefit from using the fact the Chrome compiles shaders asynchronously in a separate GPU thread rather than the main JS thread. not very useful, but the same mechanism could be used with parallel compile.

There are less mods than you might expect, primarily initMaterial() needs splitting into two halves since the first half setups up the environment for creating a new program if required and the second half which relies on interrogating the program state so in effect serialises with the compilation and linking.

WebGLProgram again needs two parts and an isReady status that would check compilation completion. This avoids the need for global lists of shaders and programs in various states and maintains the information in WebGLProgram() objects where is belongs.

One interesting question then with a parallel compilation mechanism:

You don't have an atomic change of scene after adding several new materials. Some meshes or parts of meshes with multiple materials, would display initially in different frames when intended to display at the same time which could be ugly especially for very slow compiles. How do you handle this? Preserving the atomic nature could be difficult.

@aardgoose cool! do you have any PR open already for that? It could be interesting to see that approach.

Regarding the atomic change, I agree that things always get weird with async in place. We will need to add some validation before using a material that is not fully ready yet, and probably we could include some helpers for the case when you want to load multiple materials like an explosion effect and you want to render it just when all of them are ready.
Maybe an attribute to the object that you are going to render, or a three-state value on ready: ready, not ready, ready but waiting for a friend program to finish :).

Personally started trial implementation to see the behavior. If anyone is interested in

Branch
https://github.com/takahirox/three.js/tree/ParallelShaderExtension

Diffs
https://github.com/mrdoob/three.js/compare/dev...takahirox:ParallelShaderExtension?expand=1

I need to clean up, optimize, and still take care some stuffs but it seems working on Canary.

FYI, Chrome Canary which is the only browser currently supporting KHR_parallel_shader_compile extension might have the extension performance issue now. gl.getProgramParameter(program, ext.COMPLETION_STATUS_KHR) is slow because it seems to wait for compile/link completion although it shouldn't wait for.

https://www.khronos.org/webgl/public-mailing-list/public_webgl/1904/msg00042.php
https://bugs.chromium.org/p/chromium/issues/detail?id=957001

FYI, COMPLETION_STATUS_KHR performance issue seems to have been resolved on Canary https://www.khronos.org/webgl/public-mailing-list/public_webgl/1905/msg00007.php

Still WIP but I want to share so far that I locally confirmed KHR_parallel_shader_compile with some optimizations improves the frame rate dropping on application start up.

https://twitter.com/superhoge/status/1128438246638333953

FWIW I have a prototype patch for Firefox adding support for KHR_parallel_shader_compile which appears to work quite well, it certainly reduces jank although not quite as smooth as Chrome canary., with the basic Khronos demo/test:

https://www.khronos.org/registry/webgl/sdk/tests/performance/parallel_shader_compile/

The main overhead now on the main thread with FF appears to be the shader translation/validation before submitting to the GPU driver.

Performance trace alternating serial and parallel passes of the above test:

Annotation 2019-05-19 190807

Very excited to see progress on this! Looking forward to this feature

An alternative take.

The Khronos WebGL group recommend not checking shader compilation status, but just doing:

  • compile shaders
  • link program
  • check link error and only check shader errors if link fails

for all cases, not just using KHR_parallel_shader_compile.

The link operation serialises automatically on the compiles in the background. So at least with Chrome you get the benefits of any built in parallel/asynchronous operation that already exists. Firefox actually checks retrieves the completion status and logs as part of the compile and link operations and caches for later use at the moment, so no real benefit there.

So I have removed error checking and display from WebGLShaders.js and reworked it in WebGLProgram.js, this could be submitted upstream without changing functionality.

https://github.com/aardgoose/three.js/tree/parallel1

Building on this to enable KHR_parallel_shader_compile, but now only needings two states rather than three.

https://github.com/aardgoose/three.js/tree/parallel2

Still some minor issues, probably something that should be enabled on a renderer or material basis as required rather than by default. Tested with Chrome canary and my hacked version of Firefox.

@takahirox

My site freezes on load for a second or two. I can delay compilation, but the freeze is unavoidable, and the shader in question isn't even that many lines of code (500?). Anyway would love to see this in THREE.js.

Been looking into this recently. Put up a pair of PRs (Primarily #19752, which depends on #19745) that offer one way of taking advantage of parallel shader compilation, though to get the most benefit out of it apps would need to make a fairly minor change to their loading code. If you've been following this issue feel free to let me know how/if that approach works for you!

Thank you so much for doing this work! I've been dying for this feature for ages.

Was this page helpful?
0 / 5 - 0 ratings