(apologies if this bug report is sub-par, I haven't yet had any time to narrow this bug down to a more minimal case)
Description
Whenever I allocate more memory to a buffer than the default Limits::max_uniform_buffer_binding_size, Device::drop times out when trying to clean up GPU resources and panics with "GPU got stuck" and then segfaults.
Repro steps
I ran into this bug creating my first real program which used wgpu-rs for computation. I've just made it public at https://github.com/MightyAlex200/plife/tree/wgpu (it's quite messy I know, I was in the middle of converting it from using arrayfire to wgpu. everything relevant on the wgpu branch)
The following command causes a segfault on my computer despite not reaching any (user) unsafe-blocks
cargo run -- --headless new --points 5000 --wall-type wrapping --wall-dist 500 template cool
--release is optional as both release and debug mode segfault.--points is relevant, as it is what most directly controls the amount of vram usage.--headless is optional, and it might be preferable to omit if you do run this, as the headless mode currently has nothing limiting its resource consumption, and can drastically slow down your computer.Expected vs observed behavior
Expected behavior: either works fine or panics telling me what I've done wrong
Actual behavior: works fine until stopping, then freezes computer for a bit to free resources and then panics and segfaults:
command line output
note: this was done with 10,000 points as it mysteriously decided to start working on lower values when I wanted to record this
Using discrete GPU GeForce GTX 1660 SUPER (Vulkan)
^CRan 3143 steps for 2.620529926s
SAVING... PLEASE WAIT...
Saved in 1.28µs
thread 'main' panicked at 'Error in Device::drop: GPU got stuck :(', /home/taylor/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:129:9
stack backtrace:
0: rust_begin_unwind
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:495:5
1: std::panicking::begin_panic_fmt
at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:437:5
2: wgpu::backend::direct::Context::handle_error_fatal
at /home/taylor/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:129:9
3: ::device_drop
at /home/taylor/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1252:29
4: ::drop
at /home/taylor/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/lib.rs:1636:13
5: core::ptr::drop_in_place
at /home/taylor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:175:1
6: plife::main_async::{{closure}}
at ./src/main.rs:317:1
7: as core::future::future::Future>::poll
at /home/taylor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:80:19
8: futures_executor::local_pool::block_on::{{closure}}
at /home/taylor/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.12/src/local_pool.rs:317:23
9: futures_executor::local_pool::run_executor::{{closure}}
at /home/taylor/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.12/src/local_pool.rs:87:37
10: std::thread::local::LocalKey::try_with
at /home/taylor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:272:16
11: std::thread::local::LocalKey::with
at /home/taylor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:248:9
12: futures_executor::local_pool::run_executor
at /home/taylor/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.12/src/local_pool.rs:83:5
13: futures_executor::local_pool::block_on
at /home/taylor/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.12/src/local_pool.rs:317:5
14: plife::main
at ./src/main.rs:195:5
15: core::ops::function::FnOnce::call_once
at /home/taylor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fish: “RUST_BACKTRACE=1 cargo run -- -…” terminated by signal SIGSEGV (Address boundary error)
Extra materials
Here is the trace that I recorded, zipped up. I tried playing it back myself but it just panics, so I'm not sure how useful this is.
Platform
GPU: GTX 1660S
OS: Manjaro Linux
Backend used: Vulkan
wgpu version: 0.7.0
Just made a discovery, and pushed some of my changes to the repository. For reference the commit was originally 491bb6402fdabddc627d44ca2b3744f677626a84. The repository is currently at 335eb1ef61d8fe0d77e87a9f6ea73f747fe0bb1e, and I have uploaded a video showing that this crash may be related to the GPU being saturated with inputs and not having the time to process them in real-time.
Running the API trace on AMD/Linux doesn't show anything wrong, except for this bit:
VALIDATION [UNASSIGNED-CoreValidation-Shader-InconsistentSpirv (7060244)] : Validation Error: [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle = 0x55bd6ee80d90, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 | SPIR-V module not valid: For Vulkan, an OpTypeStruct variable containing an OpTypeRuntimeArray must be decorated with BufferBlock if it has storage class Uniform.
%types = OpVariable %_ptr_Uniform_Types Uniform
It sounds like a bug we fixed not a while ago. Interestingly with todays Naga we have a different problem, filed as https://github.com/gfx-rs/naga/issues/508
NVidia/Vulkan consistently throws Out of Memory errors for me, in both replaying the trace and running the test project:
[1.543024 ERROR]()(no module):
VALIDATION [UNASSIGNED-CoreValidation-Shader-InconsistentSpirv (7060244)] : Validation Error: [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle = 0x5562f33d11c8, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 | SPIR-V module not valid: For Vulkan, an OpTypeStruct variable containing an OpTypeRuntimeArray must be decorated with BufferBlock if it has storage class Uniform.
%types = OpVariable %_ptr_Uniform_Types Uniform
object info: (type: DEVICE, hndl: 93883476021704)
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Queue(OutOfMemory)', /hub/moz/wgpu/player/src/lib.rs:344:59
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[2.172167 ERROR]()(gpu_descriptor::allocator): `DescriptorAllocator` is dropped while some descriptor sets were not deallocated
``
Runningtarget/debug/plife --headless new --points 5000 --wall-type wrapping --wall-dist 500 template cool
Using discrete GPU GeForce GTX 1050 Ti with Max-Q Design (Vulkan)
thread 'main' panicked at 'Error in Queue::submit: not enough memory left', /home/kvark/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:129:9
note: run withRUST_BACKTRACE=1` environment variable to display a backtrace
Segmentation fault (core dumped)
We are missing the validation for your shader, which isn't correct. The unsized arrays are *only* allowed in storage buffers, while your shader has:
```rust
[[block]]
struct CacheRadius {
data : [[stride(4)]] array<f32>;
};
[[group(0), binding(4)]] var<uniform> cache_max_r : CacheRadius;
We'll have this bug associated with the validation we are missing here, but you should fix the program on your side (use storage buffers there) and see how it goes.
Thanks so much for looking into this. I did as you suggested and converted all usages of uniform buffers which contained unsized array objects into storage buffers. This actually improved the performance significantly, allowing the program to run many more particles in a simulation without slowdowns! Unfortunately that didn't solve the issue of the program building up input-latency at higher speeds then and freezing/panicking/segfaulting upon exiting. However upon looking back I realized that it was only headless mode which was affected by this bug at all times, which is unbounded in ticks per second! With that I found that running device.poll(Maintain::Wait) will fix the segfaulting on both headless and headed mode, and will regulate peak-performance to the same level no matter how often it is called. I hope this is the preferred solution to limiting framerate to match throughput constraints?
You are facing a known problem that we haven't got a solution for. It needs a larger dialog within WebGPU group (I'll try to raise it up!). What you need is some form of back-pressure, to prevent oversaturating the GPU with work. Doing the poll(Wait) is a crude solution to this, and it's not portable to the web.
I think, ideally, you'd need to schedule N pieces of work at any time. So, you'd split your work into batches, and you'd need to wait for the GPU to finish working on the oldest submitted batch before submitting a new one. Today, this can be done with a mapAsync call on a buffer used by a submission. In the upstream spec there is also onSubmittedWorkDone, which is more straightforward than buffer mapping for this case, but we don't have it implemented in wgpu yet (help is appreciated!).
Also, I just finished working on the appropriate validation that would tell you what's wrong earlier - https://github.com/gfx-rs/naga/pull/510