Wgpu: vulkan validation error: VUID-vkDestroyFramebuffer-framebuffer-00892

Created on 12 Jul 2020 · 18Comments · Source: gfx-rs/wgpu

Occasionally (every few hundred frames on average) I get VUID-vkDestroyFramebuffer-framebuffer-00892 validation errors. The timing seems irregular, so I guess something is racing under the hood.

My render loop is quite straighforward (removed error handling):

let frame = swap_chain.lock().await.get_next_frame().unwrap()
...
let mut encoder =  device.create_command_encoder(&wgpu::CommandEncoderDescriptor { label: None });
...
let mut rpass = geometry_pass.begin(&mut encoder, &frame.output.view);
...
let mut rpass = lighting_pass.begin(&mut encoder, &frame.output.view);
...
queue.submit(Some(encoder.finish()));

I played around a little and adding:

std::thread::sleep(std::time::Duration::from_millis(10));
drop(frame);

at the end prevents the validation errors.

bug

Source

FlorianUekermann

👍2

Most helpful comment

I'm running into this validation error as well. I see the validation error go away after applying one of the following changes:

Simplify the scene (by rendering fewer instances, or a mesh with fewer vertices).
Add a sleep after queue.submit as mentioned in https://github.com/gfx-rs/wgpu/issues/785#issue-655407567 (10ms was too short for me, 30ms worked though). EDIT: Adding the sleep before queue.submit also works for me. The location of the sleep doesn't seem to matter too much. Also it might be relevant that device.poll takes about 30ms for me as well, or maybe that's just a coincidence.
Call device.poll(wgpu::Maintain::Wait) before queue.submit.

I also tried calling device.poll _after_ queue.submit, which did get rid of the validation error mentioned in this issue, but triggered a different one instead: UNASSIGNED-CoreValidation-DrawState-QueueForwardProgress. The result was the same when calling device.poll in a loop in a separate thread. I could make a separate issue for that?

benfrankel on 3 Aug 2020

👍2

All 18 comments

Would you have a repro case?

kvark on 12 Jul 2020

I can try to record a trace. Does that help?

FlorianUekermann on 12 Jul 2020

Yes, an API trace would be wonderful, plus your adapter info for the reference (we should include it in the trace).

kvark on 12 Jul 2020

I attached the vulkaninfo output and the trace with current master (had to compress it twice, because github doesn't like .xz and just using zip makes it 20x larger).

lspci says: Advanced Micro Devices, Inc. [AMD/ATI] Navi 14 [Radeon RX 5500/5500M / Pro 5500M] (rev c5)
trace.zip
vulkaninfo.txt

FlorianUekermann on 13 Jul 2020

👍1

@FlorianUekermann thank you! What wgpu-rs (or wgpu) revision was that from?

kvark on 14 Jul 2020

wgpu-rs master at time of posting. I updated right before recording. I can't check right now.

FlorianUekermann on 14 Jul 2020

I'm having trouble replaying this API trace due to API mismatches. Could you specify the exact revision (of either wgpu-rs or wgpu) this was taken from?

kvark on 22 Jul 2020

After a bit more wrangling, I'm able to replay this API trace. It's panicking on old.is_none() in the HUB registry for buffers. No validation errors before that.

kvark on 22 Jul 2020

I'll play around with some traces on the weekend. Is it possible that a race during recording has a different outcome when being replayed?

FlorianUekermann on 23 Jul 2020

I don't currently see how this would be possible with regards to the buffer IDs. We record Action::DestroyBuffer right before we free the corresponding ID in the hub, and until we do that nothing can technically allocate this ID for a new buffer.
This is still fairly fresh tech though, so I might be missing something

kvark on 23 Jul 2020

This still happens on latest wgpu-rs master.

I fixed my panic on shutdown and tried again on wgpu-rs master, which uses wgpu b35209398823e95d71cce718fd98fa54ce43271d right now. Unfortunately I can't play back the traces with that revision:

$ RUST_LOG=info cargo run --features winit -- ../../../trace/
[0.009707 INFO]()(no module): Loading trace '"../../../trace/"'
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Parser(Eof, Position { col: 1, line: 4780556 })', player/src/bin/play.rs:42:70
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Am I using the player wrong? Its not really a problem right now, so I can try again in a few weeks.

Btw, when shutting down I can also produce the occasional segfault and a few other validation errors. Maybe it is worth mentioning that I am rendering in a separate task, not the winit event loop.

Edit: Another observation - The validation error happens a lot more frequently when I am not recording a trace.

FlorianUekermann on 2 Aug 2020

The trace can't be replayed because it wasn't finished gracefully, since the app has panicked. All you need to finish it is to edit the file by appending "]" to it.

kvark on 2 Aug 2020

I wonder if this is just https://github.com/gfx-rs/gfx/issues/3184

kvark on 2 Aug 2020

All you need to finish it is to edit the file by appending "]" to it.

Ah thanks, now it works. I've tried with a few traces now and I don't get the validation error when playing the trace, even though I get multiple on the original run...

I've been playing around a little more and noticed that I get very noticeable stalls (~1s) right before the validation error prints. introducing a sleep before acquiring a new frame gets rid of the validation error as well as the stalls.

I wonder if this is just gfx-rs/gfx#3184

I hadn't seen that. Seems likely.
There aren't really any examples that are heavy enough to get frames in flight for long. I'll see if I can patch one to reproduce this issue.

FlorianUekermann on 2 Aug 2020

❤1

Filed #861 to make this less painful

kvark on 2 Aug 2020

I modified the hello-triangle example to have trivial reproducer for this. If you think a stress test example is generally useful I can send a pull request:
https://github.com/FlorianUekermann/wgpu-rs
I get the error every few ms with: RUST_LOG=info cargo run --features subscriber --example hello-triangle -- --stress

Btw, it took me longer than I care to admit to figure out how to run the example with logs. Maybe that should be documented in a very obvious place (README?).

FlorianUekermann on 2 Aug 2020

Btw, it took me longer than I care to admit to figure out how to run the example with logs. Maybe that should be documented in a very obvious place (README?).

Sorry about that! I agree, we should definitely put it into README. @cwfitzgerald could you make a small PR with this to wgpu-rs, please?

kvark on 2 Aug 2020

I'm running into this validation error as well. I see the validation error go away after applying one of the following changes:

Simplify the scene (by rendering fewer instances, or a mesh with fewer vertices).
Add a sleep after queue.submit as mentioned in https://github.com/gfx-rs/wgpu/issues/785#issue-655407567 (10ms was too short for me, 30ms worked though). EDIT: Adding the sleep before queue.submit also works for me. The location of the sleep doesn't seem to matter too much. Also it might be relevant that device.poll takes about 30ms for me as well, or maybe that's just a coincidence.
Call device.poll(wgpu::Maintain::Wait) before queue.submit.