Webrender: PBO copies are slow on Angle

Created on 27 Nov 2017  Â·  17Comments  Â·  Source: servo/webrender

It appears that Windows/Angle spends tons of time in update_texture_from_pbo: https://perfht.ml/2zJnkg4
Angle decided to defer mapping and filling an actual D3D11 staging buffer up until this call. Since the copy itself is done in a draw call, it has to stall the GPU until the copy is done. This is one issue, but the actual current one is different: Map itself is waiting. I suppose it's waiting for the GPU considering the texture is still in use, which implies our PBO orphaning doesn't work as expected (see Device::orphan_pbo).

cc @jrmuizel @glennw

performance bugzilled bug

All 17 comments

This is showing up during motionmark bouncing circles.

I should also note that to reproduce this slowness I need to run the test case in ramp mode first and then reload with constant complexity. i.e.
Constant complexity @ 400 => 6ms compositor time
Ramp => up to 11ms compositor time
Constant complexity @ 400 => 16ms compositor time

So it does seem like this problem is being made worse through some invisible persistent state (the webrender texture cache?, the ANGLE buffer cache?)

@jrmuizel we should have a separate issue for the (suspected) cache pollution

Thanks to Angle team I got some answers! This is not about orphaning, but rather about some texture formats not being supported by the fast path of buffer->texture copies: https://cs.chromium.org/chromium/src/third_party/angle/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp?type=cs&q=supportsFastCopyBufferToTexture&sq=package:chromium&l=3006

In particular, Angle doesn't like our RGB8 and A8.

Awesome, thanks for following this up!

The profile shows us hitting the fast path not missing it.

On Nov 27, 2017 8:29 PM, "Glenn Watson" notifications@github.com wrote:

Awesome, thanks for following this up!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/servo/webrender/issues/2110#issuecomment-347383918,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAUTbYZ7dMM78BEAVnp_cSJJxA_sLIsIks5s62HlgaJpZM4QsHT-
.

@jrmuizel Are you using the Gecko profiler for this or something else?

@kvark From your investigations of angle, would it be worth considering alternative update strategies (e.g. render target with float points, or perhaps map/unmap or something else?)

Yeah. If you look at the perfht.ml link in the first comment

On Nov 27, 2017 9:18 PM, "Glenn Watson" notifications@github.com wrote:

@jrmuizel https://github.com/jrmuizel Are you using the Gecko profiler
for this or something else?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/servo/webrender/issues/2110#issuecomment-347392224,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAUTbWx_4_m36NUxINr0mjgU5fklNZ06ks5s6218gaJpZM4QsHT-
.

When I originally set up the GPU cache, the idea was to be able to use an unsynchronized map to update the texture, on platforms where that made sense (i.e. a persistent pointer into the texture data). It does two things to enable this:

  • Items are never evicted that are newer than an arbitrary amount of frames (currently set to 10).
  • When items are invalidated for update, the old location is orphaned - that is, a new location is selected for the updated data, and the old location is left to be evicted by the cache by the eviction policy mentioned above.

In theory, this means we don't need any synchronization here - these two policies should guarantee that we never write to a location that is < 10 frames old, which should thus guarantee that the GPU is never reading incorrect data.

Admittedly, I've never actually tested this in practice - I intended to revisit it at a later time. But perhaps we can signal to ANGLE that it doesn't need to do any blocking and see how it goes?

Another possibility (more of a temporary quick fix / hack) could be to round-robin a series of backing textures for the GPU cache, and update / upload the entire texture each frame. This sounds bad, but that data is actually quite small, and may be perfectly fine as an interim solution, if most of the time is spend blocking on a GPU fence or similar.

@kvark @jrmuizel These might be worth pursuing if we're not able to get the current path running well on ANGLE.

@jrmuizel

The profile shows us hitting the fast path not missing it.

Damn, that is correct. Back to square one!

@glennw

From your investigations of angle, would it be worth considering alternative update strategies (e.g. render target with float points, or perhaps map/unmap or something else?)

Will see after this issue is resolved.

the idea was to be able to use an unsynchronized map to update the texture

TBH, that sounds perfect :) We definitely need to check out persistent mapping for the GPU cache texture. However, this is quite specific (GPU cache only, not the texture cache or other data) and will probably need to wait till we resolve the outstanding performance issues first (assuming we can resolve the Angle problem without rewriting the upload path too much).

Had a long discussion today with Angle peers. Ended up filing an upstream issue - https://bugs.chromium.org/p/angleproject/issues/detail?id=2268

TL;DR:
They don't expect us to use PBOs at all, and STREAM_DRAW on them in particular. Using STATIC_DRAW is our best hope for a minimally-intrusive fix. They'd prefer us just calling glTexSubImage2D directly on the GPU textures...

We can quite easily switch to using glTexSubImage2D. That should only be a few lines to change. However, that definitely introduces stalls on Intel drivers (at least on Linux), so we should support both paths and select one based on the GL renderer. We could start by doing a local test on ANGLE to see how much this improves things?

Yes, that's my plan :) We'll definitely need separate paths on Windows/Angle versus the world :)

Oh, didn't mean to close this just yet.

Was this page helpful?
0 / 5 - 0 ratings