OS: Arch Linux
GPU: GTX 960
Driver: Nvidia proprietary driver, version 418
I am hitting deadlocks when I take the compute example and run it multiple times in parallel.
https://github.com/rukai/wgpu/blob/217a76eaa6bdf1468319c0b7147acb2a8a594aae/examples/deadlock/main.rs
It will only complete 1-4 iterations before reaching a deadlock.
However if I were to run the same code sequentially, the device will fail to initialize on the 64th iteration every time.
This can be reduced down to just the device initialization code.
https://github.com/rukai/wgpu/blob/217a76eaa6bdf1468319c0b7147acb2a8a594aae/examples/initialization_failed/main.rs
You can easily run these examples by cloning https://github.com/rukai/wgpu/tree/bugs
and then doing:
cargo run --release --features vulkan --bin deadlock
cargo run --release --features vulkan --bin initialization_failed
confirmed both of these appear problematic for me as well, on archlinux + intel GPU Intel(R) HD Graphics 620 (Kaby Lake GT2) (type: IntegratedGpu)
running initialization_failed I get some validation errors, and device is lost first iteration:
m4b@efrit :: [ /tmp/wgpu/examples ] cargo run --features vulkan --bin initialization_failed
Finished dev [unoptimized + debuginfo] target(s) in 0.14s
Running `/tmp/wgpu/target/debug/initialization_failed`
iteration: 0
Xlib: extension "NV-GLX" missing on display ":0".
ERROR 2019-05-07T05:10:16Z: gfx_backend_vulkan: [Validation] [ VUID_Undefined ] Object: VK_NULL_HANDLE (Type = 0) | vkWaitForFences: parameter fenceCount must be greater than 0.
VUID_Undefined(ERROR / SPEC): msgNum: 0 - vkWaitForFences: parameter fenceCount must be greater than 0.
Objects: 1
[0] 0, type: 0, name: NULL
ERROR 2019-05-07T05:10:16Z: gfx_backend_vulkan: [anv] ../mesa-18.3.1/src/intel/vulkan/anv_queue.c:538: drm_syncobj_wait failed: Invalid argument (VK_ERROR_DEVICE_LOST)
INTEL-MESA: error: ../mesa-18.3.1/src/intel/vulkan/anv_queue.c:538: drm_syncobj_wait failed: Invalid argument (VK_ERROR_DEVICE_LOST)
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: DeviceLost(DeviceLost)', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
and confirmed, deadlock appears to deadlock after some time:
m4b@efrit :: [ /tmp/wgpu/examples ] cargo run --features vulkan --bin deadlock
Compiling examples v0.1.0 (/tmp/wgpu/examples)
Finished dev [unoptimized + debuginfo] target(s) in 3.16s
Running `/tmp/wgpu/target/debug/deadlock`
0
500
250
375
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
Times: [25, 25, 25]
376
1
Times: [25, 25, 25]
501
Times: [25, 25, 25]
251
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
377
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
252
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
502
Times: [25, 25, 25]
2
Times: [25, 25, 25]
378
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
253
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
503
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
3
Times: [25, 25, 25]
379
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
254
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
504
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
4
Times: [25, 25, 25]
380
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
255
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
505
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
5
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
381
Times: [25, 25, 25]
256
Times: [25, 25, 25]
506
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
6
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
382
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
257
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
507
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
7
Times: [25, 25, 25]
383
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
258
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
508
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
8
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
384
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
259
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
509
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
9
Times: [25, 25, 25]
Times: [25, 25, 25]
Times: [25, 25, 25]
260
510
385
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
10
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
511
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
11
Times: [25, 25, 25]
261
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
386
Times: [25, 25, 25]
512
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
12
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
262
Times: [25, 25, 25]
387
Times: [25, 25, 25]
513
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
13
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
263
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
514
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
388
Times: [25, 25, 25]
14
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
264
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
515
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
389
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
15
Times: [25, 25, 25]
265
Times: [25, 25, 25]
516
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
390
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
16
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
266
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
517
Times: [25, 25, 25]
391
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
17
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
267
Times: [25, 25, 25]
518
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
392
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
18
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
268
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
519
Xlib: extension "NV-GLX" missing on display ":0".
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
393
Times: [25, 25, 25]
19
Xlib: extension "NV-GLX" missing on display ":0".
Times: [25, 25, 25]
Xlib: extension "NV-GLX" missing on display ":0".
Thank you for the testcase! It looks to be exposing 2 different issues:
maintain(), which has the device locked for writing, so the callbacks aren't able to lock it for reading. It's still unclear to me why this works even in a single thread, but it is clear that we are doing wrong here, and we can do better.@rukai the deadlock problem is addressed in #154. It would be great to have that logic upstreamed, perhaps as a test that is only enabled when a real backend is enabled. Would you be willing to try making such a PR?
The other problem is now moved out into #155.
Are you asking me to add a unittest consisting of https://github.com/rukai/wgpu/blob/217a76eaa6bdf1468319c0b7147acb2a8a594aae/examples/deadlock/main.rs ?
Yes, but this would be an integration test.
On May 7, 2019, at 17:27, Lucas Kent notifications@github.com wrote:
Are you asking me to add a unittest consisting of https://github.com/rukai/wgpu/blob/217a76eaa6bdf1468319c0b7147acb2a8a594aae/examples/deadlock/main.rs ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.