Wgpu: DX12 backend requests redraws too quickly and render queue fills up

Created on 9 Feb 2020  路  7Comments  路  Source: gfx-rs/wgpu

I'm encountering an issue where, with the DX12 backend, redraws are requested at a much higher pace than they are actually drawn. I could not reproduce it with the Vulkan or the DX11 backends.

With winit, I'm calling window.request_redraw() on the MainEventsCleared event as per the wgpu-rs examples. With the DX12 backend, this results in very fast updates, while what's rendered appears increasingly late (noticeable e.g. by moving imagery appearing in slow motion and inputs appearing delayed). This is accompanied by increasing memory usage, which I assume is a symptom of a growing render queue.

This occurs regardless of VSync or how GSync or max frame rate are configured in the driver. In rare cases the issue doesn't appear directly after starting, only after some time has passed and after some inputs, but I couldn't find out anything deterministic about it. The full code can be seen here, apart from the shaders there's not much more than the relevant code in it currently.


Info output during setup:

2020-02-09 18:31:30,019 INFO  [mathilda] mathilda 0.0.1 (debug build, windows [x86_64])
2020-02-09 18:31:30,372 INFO  [gfx_backend_dx11] DXGI version: Dxgi1_5
2020-02-09 18:31:30,462 INFO  [wgpu_core::instance] Adapter Dx12 AdapterInfo { name: "NVIDIA GeForce RTX 2060", vendor: 4318, device: 7944, device_type: DiscreteGpu }
2020-02-09 18:31:30,610 WARN  [wgpu_core::instance] max_bind_groups limit is missing
2020-02-09 18:31:30,610 INFO  [wgpu_core::device] creating swap chain SwapChainDescriptor { usage: OUTPUT_ATTACHMENT, format: Bgra8Unorm, width: 1024, height: 1024, present_mode: NoVsync }
2020-02-09 18:31:30,611 WARN  [wgpu_core::device] Surface does not support present mode: IMMEDIATE, falling back to FIFO
2020-02-09 18:31:31,126 INFO  [gfx_backend_dx12::device] Creating a pipeline layout with 1 sets and 0 root constants
2020-02-09 18:31:31,127 INFO  [gfx_backend_dx12::device] create_descriptor_pool with 64 max sets
2020-02-09 18:31:31,127 INFO  [gfx_backend_dx12::device] total 64 views and 0 samplers
2020-02-09 18:31:31,127 INFO  [gfx_backend_dx12::resource] allocate_set
2020-02-09 18:31:31,306 INFO  [gfx_backend_dx12::device] Creating a pipeline layout with 1 sets and 0 root constants
2020-02-09 18:31:31,306 INFO  [gfx_backend_dx12::device] create_descriptor_pool with 64 max sets
2020-02-09 18:31:31,307 INFO  [gfx_backend_dx12::device] total 192 views and 0 samplers
2020-02-09 18:31:31,307 INFO  [gfx_backend_dx12::resource] allocate_set
2020-02-09 18:31:31,315 INFO  [gfx_backend_dx12::device] Creating a pipeline layout with 1 sets and 0 root constants
2020-02-09 18:31:31,325 INFO  [gfx_backend_dx12::device] create_descriptor_pool with 64 max sets
2020-02-09 18:31:31,325 INFO  [gfx_backend_dx12::device] total 128 views and 64 samplers
2020-02-09 18:31:31,325 INFO  [gfx_backend_dx12::resource] allocate_set

System is Windows 10 1909 with a NVIDIA GeForce RTX 2060 and driver version 441.87. The issue is accompanied by issue https://github.com/gfx-rs/gfx/issues/3088 after it appears, which might be related.

I'm not perfectly sure which repository this issue best belongs to, feel free transfer it if necessary. Please let me know if you need more information.

help wanted bug

All 7 comments

Thank you for filing!
So we end up queuing too many redraws? I wonder why it doesn't show up as an issue with the other examples then...
Edit: also, do you see it on the actual examples?

I looked into it a bit more, and I assume the issue might be caused by the way my render passes are set up. On RedrawRequested, I first call regenerate where I render into a texture, and then render where this texture is then used and rendered to the swap chain output I get from swap_chain.get_next_texture(). (See here how the pipelines are set up.) Without the call to regenerate, I'm not seeing the update rate increase, so I assume this setup somehow prevents the backend from blocking when rendering to the screen, or the synchronization between the two command buffers gets messed up in some way.

I could not reproduce the exact issue on another system I tried it on (with an NVIDIA GeForce 940MX GPU and driver version 425.45), but there still is a noticeable delay until changes are shown on screen which is not the case with the other backends.

Would you have the exact repro steps for me?

You should be able to just clone the repo and run it with cargo run --bin mathilda - it's set up in a way that it exhibits the issue. (The code might need a somewhat recent Rust nightly.)

On the systems I ran it on it starts to show the problem at least after letting it run for a few seconds if not immediately, making a few inputs (scrolling the mousewheel or hitting spacebar repeatedly) should make it apparent. While the main issue only appears on one system for me, in both cases the app crashes with the same error as in https://github.com/gfx-rs/gfx/issues/3088 upon closing:

The application panicked (crashed).
Message:  Unexpected wait status 0xFFFFFFFF
Location: <::std::macros::panic macros>:5


Backtrace

                         (11 post panic frames hidden)
11: gfx_backend_dx12::device::{{impl}}::wait_for_fences<core::iter::adapters::Map<core::slice::Iter<wgpu_core::device::life::ActiveSubmission<gfx_backend_dx12::Backend>>, closure-0>>
    at <::std::macros::panic macros>:5
12: wgpu_core::device::life::LifetimeTracker<gfx_backend_dx12::Backend>::check_last_done<gfx_backend_dx12::Backend>
    at C:\Users\Chris\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\d4a46cb\wgpu-core\src\device\life.rs:213
13: wgpu_core::device::life::LifetimeTracker<gfx_backend_dx12::Backend>::cleanup<gfx_backend_dx12::Backend>
    at C:\Users\Chris\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\d4a46cb\wgpu-core\src\device\life.rs:254
14: wgpu_core::device::Device<gfx_backend_dx12::Backend>::maintain<gfx_backend_dx12::Backend,lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, wgpu_core::hub::IdentityManager>>
    at C:\Users\Chris\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\d4a46cb\wgpu-core\src\device\mod.rs:257
15: wgpu_core::hub::Global<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, wgpu_core::hub::IdentityManager>>::device_poll<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, wgpu_core::hub::IdentityManager>,gfx_backend_dx12::Backend>
    at C:\Users\Chris\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\d4a46cb\wgpu-core\src\device\mod.rs:1864
16: wgpu_native::device::wgpu_device_poll
    at C:\Users\Chris\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\d4a46cb\wgpu-native\src\device.rs:361
17: core::ptr::drop_in_place<closure-1>
    at /rustc/58b834344fc7b9185e7a50db1ff24e5eb07dae5e\src\libcore\ptr\mod.rs:174
18: core::ptr::drop_in_place<alloc::boxed::Box<FnMut<(winit::event::Event<()>, mut winit::event_loop::ControlFlow*)>>>
    at /rustc/58b834344fc7b9185e7a50db1ff24e5eb07dae5e\src\libcore\ptr\mod.rs:174
19: core::ptr::drop_in_place<winit::platform_impl::platform::event_loop::runner::EventLoopRunner<()>>
    at /rustc/58b834344fc7b9185e7a50db1ff24e5eb07dae5e\src\libcore\ptr\mod.rs:174
20: core::ptr::drop_in_place<core::option::Option<winit::platform_impl::platform::event_loop::runner::EventLoopRunner<()>>>
    at /rustc/58b834344fc7b9185e7a50db1ff24e5eb07dae5e\src\libcore\ptr\mod.rs:174
21: winit::platform_impl::platform::event_loop::runner::ELRShared<()>::destroy_runner<()>
    at C:\Users\Chris\.cargo\git\checkouts\winit-c2fdb27092aba5a7\5f52d7c\src\platform_impl\windows\event_loop\runner.rs:115
22: winit::platform_impl::platform::event_loop::EventLoop<()>::run_return<(),closure-1>
    at C:\Users\Chris\.cargo\git\checkouts\winit-c2fdb27092aba5a7\5f52d7c\src\platform_impl\windows\event_loop.rs:287
23: winit::platform_impl::platform::event_loop::EventLoop<()>::run<(),closure-1>
    at C:\Users\Chris\.cargo\git\checkouts\winit-c2fdb27092aba5a7\5f52d7c\src\platform_impl\windows\event_loop.rs:192
24: winit::event_loop::EventLoop<()>::run<(),closure-1>
    at C:\Users\Chris\.cargo\git\checkouts\winit-c2fdb27092aba5a7\5f52d7c\src\event_loop.rs:148
25: mathilda::main
    at C:\Users\Chris\Development\mathilda\src\main.rs:295
                        (12 runtime init frames hidden)

Changing the backend bit to any other than DX12 in main.rs:253 should make the issue disappear completely.

I added minimal input handling to the wgpu-rs shadow example (see my branch here) to see if I could replicate the issue with it, and it seems that I was wrong in my assumption that it depended on my render pass setup.
I'm seeing the same results there, a noticeable but consistent delay until inputs and consequently changes in the buffer are rendered to the screen on one system, and an ever increasing delay with growing memory usage, a very fast update rate, and a crash on exit on the other - both with debug and release builds in the former case, in the latter everything seems to work as expected with release builds but as the issue sometimes just takes a while to appear I'm not sure if it would just take more time in that case.

Thank you, that's very useful data!
If we want to reduce it further (to, say, a broken back pressure logic in D3D12 new swapchain implementation), we'd need to reproduce it on the quad example in https://github.com/gfx-rs/gfx/

I believe this is fixed by https://github.com/gfx-rs/gfx/pull/3239, which is now published as a patch. You can pick it up by cargo update -p gfx-backend-dx12. Please feel free to re-open if the issue persists!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

simast picture simast  路  15Comments

unrelentingtech picture unrelentingtech  路  14Comments

Aeledfyr picture Aeledfyr  路  23Comments

m4b picture m4b  路  14Comments

lordnoriyuki picture lordnoriyuki  路  21Comments