After updating to latest version of wgpu-rs, I've gotten a consistent crash issue in my code. If I roll back to the version of wgpu-rs before gpu-alloc replaced gfx-memory, then the app runs fine. The crash isn't immediate, but does happen consistently after the app renders enough frames.
I'll see what I can do to reproduce in some smaller codebase, but the problem I'm having now is that I've only gotten a segfault once - every other time it just locks up the computer, forcing me to reboot. That's making it more than a little difficult to track down.
The best I've got so far is that it's happening in queue.rs queue_write_buffer when it calls:
stage.memory.write_bytes(&device.raw, 0, data)?;
MemoryBlock::write_bytes has some unsafe code calling into gpu-alloc that is presumably where things go bad.
The system at least logged the segfault before it hung. Here's the relevant portion:
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x000000012651d000
Exception Note: EXC_CORPSE_NOTIFY
Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler [1038]
VM Regions Near 0x12651d000:
IOAccelerator 1244ed000-1264ed000 [ 32.0M] rw-/rw- SM=SHM
-->
STACK GUARD 700007320000-700007321000 [ 4K] ---/rwx SM=NUL stack guard for thread 1
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_platform.dylib 0x00007fff2037a0a9 _platform_memmove$VARIANT$Haswell + 41
1 strategic 0x000000010d7c51f6 core::intrinsics::copy_nonoverlapping::h1f6aa0a8620d1132 + 8 (intrinsics.rs:1860) [inlined]
2 strategic 0x000000010d7c51f6 gpu_alloc::block::MemoryBlock$LT$M$GT$::write_bytes::hb720e46216fbac3f + 273 (block.rs:248) [inlined]
3 strategic 0x000000010d7c51f6 wgpu_core::device::alloc::MemoryBlock$LT$B$GT$::write_bytes::h21d20e56a84d86ca + 310 (alloc.rs:149)
4 strategic 0x000000010d779dbc wgpu_core::device::queue::_$LT$impl$u20$wgpu_core..hub..Global$LT$G$GT$$GT$::queue_write_buffer::ha0909694bc2b410b + 892 (queue.rs:215)
5 strategic 0x000000010d809a95 _$LT$wgpu..backend..direct..Context$u20$as$u20$wgpu..Context$GT$::queue_write_buffer::h6dccde0659d80adf + 101 (direct.rs:1482)
6 strategic 0x000000010d6a9bc4 renderer::renderer::Renderer::end_frame::h9c20d6895798ee3d + 1028 (renderer.rs:1134)
7 strategic 0x000000010d54efb6 strategic::main::_$u7b$$u7b$closure$u7d$$u7d$::h7f3db76ca6301c7a + 422 (main.rs:89)
I'm calling this every frame:
queue.write_buffer(&vertex_buffer, 0, vertices.as_bytes());
and vertex_buffer was created to hold 600,000 bytes with wgpu::BufferUsage::VERTEX | wgpu::BufferUsage::COPY_DST, and vertices.as_bytes() is 101,640 bytes, and is the same contents for many frames before this segfaults.
To help repro this issue, it might be helpful to have a wgpu trace created with the offending version. https://github.com/gfx-rs/wgpu/wiki/Debugging-wgpu-Applications#tracing-infrastructure are instructions on how to record one.
Thank you for filing! It's critical to record this API trace for us to reproduce. Filed https://github.com/zakarumych/gpu-alloc/issues/27 in the meantime.
Trace attached. No stack trace on this one, as it never got written to the log. Looks like WindowServer died.
Not sure how helpful this is going to be, but I printed the data in the call to:
gpu_alloc::block::MemoryBlock::write_bytes()
specifically before the segfault call:
copy_nonoverlapping(data.as_ptr(), ptr.as_ptr(), size);
Here are the last 10 frames (so the last line is the call that segfaults) logging data.as_ptr() ptr.as_ptr() size:
0x7fd4f5819000 0x12850bc00 18720
0x7ffee445ef7c 0x128510600 20
0x7ffee445f688 0x128510700 16
0x7fd4f5819000 0x128510800 18720
0x7ffee445ef7c 0x128515200 20
0x7ffee445f688 0x128515300 16
0x7fd4f5819000 0x128515400 18720
0x7ffee445ef7c 0x128519e00 20
0x7ffee445f688 0x128519f00 16
0x7fd4f5819000 0x12851a000 18720
0x7ffee445ef7c 0x12851ea00 20
0x7ffee445f688 0x12851eb00 16
0x7fd4f5819000 0x12851ec00 18720
0x7ffee445ef7c 0x128523600 20
0x7ffee445f688 0x128523700 16
0x7fd4f5819000 0x128523800 18720
0x7ffee445ef7c 0x128528200 20
0x7ffee445f688 0x128528300 16
0x7fd4f5819000 0x128528400 18720
0x7ffee445ef7c 0x12852ce00 20
0x7ffee445f688 0x12852cf00 16
0x7fd4f5819000 0x12852d000 18720
0x7ffee445ef7c 0x128531a00 20
0x7ffee445f688 0x128531b00 16
0x7fd4f5819000 0x128531c00 18720
0x7ffee445ef7c 0x128536600 20
0x7ffee445f688 0x128536700 16
0x7fd4f5819000 0x128536800 18720
And it didn't hang on this run, so here's the stack trace:
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x000000012853a000
Exception Note: EXC_CORPSE_NOTIFY
Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler [39378]
VM Regions Near 0x12853a000:
mapped file 1224b6000-1225be000 [ 1056K] r--/rwx SM=COW Object_id=42a7e497
-->
STACK GUARD 7000014cf000-7000014d0000 [ 4K] ---/rwx SM=NUL stack guard for thread 1
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_platform.dylib 0x00007fff203740a9 _platform_memmove$VARIANT$Haswell + 41
1 strategic 0x000000010c608b68 core::intrinsics::copy_nonoverlapping::h298f13d14b347174 + 40 (intrinsics.rs:1860)
2 strategic 0x000000010bde9cc3 gpu_alloc::block::MemoryBlock$LT$M$GT$::write_bytes::h850253fbc0424ff9 + 1955 (block.rs:248) [inlined]
3 strategic 0x000000010bde9cc3 wgpu_core::device::alloc::MemoryBlock$LT$B$GT$::write_bytes::h0130ad9960e38064 + 2067 (alloc.rs:149)
4 strategic 0x000000010bb508a2 wgpu_core::device::queue::_$LT$impl$u20$wgpu_core..hub..Global$LT$G$GT$$GT$::queue_write_buffer::h70e000e8b5af4a06 + 2994 (queue.rs:215)
5 strategic 0x000000010bc83080 _$LT$wgpu..backend..direct..Context$u20$as$u20$wgpu..Context$GT$::queue_write_buffer::h52b3769ea2d125d2 + 192 (direct.rs:1485)
6 strategic 0x000000010bd135ff wgpu::Queue::write_buffer::h503124c68151fae3 + 111 (lib.rs:2609)
7 strategic 0x000000010ba87256 renderer::renderer::Renderer::end_frame::hce81f0e7379b54f3 + 2102 (renderer.rs:1135)
The destination memory for the copy starts at 0x128536800 (4_971_522_048). The copy size is 18_720. The copy should finish at 4_971_540_768. The access is for 0x12853a000 (4_971_536_384). So it looks like it copied 14_336 bytes, with 4_384 bytes remaining, when it segfaulted.
BTW. I tried to run "play" (super cool!), and the trace.ron was an invalid RON file as it's missing the closing ']'. I added that and then it panicked with thread 'main' panicked at 'internal error: entered unreachable code', player/src/bin/play.rs:148:29 (and I turned winit feature on).
It looks like the closing bracket is written in Trace::drop, so maybe that's expected that it's missing. It looks like it isn't a buffered write, so it should be complete aside from Trace::drop not getting called. However, I'd expect that, given where it crashes, the last call would be a WriteBuffer, not a Submit?
@lordnoriyuki in addition to the closing ']' you'd also need to run with winit feature:
cargo run --features winit
I did add winit, and got:
thread 'main' panicked at 'internal error: entered unreachable code', player/src/bin/play.rs:148:29
stack backtrace:
...
0: std::panicking::begin_panic
at /Users/ln/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:519:12
1: play::main::{{closure}}
at ./src/bin/play.rs:148:29
2: <alloc::boxed::Box<F,A> as core::ops::function::FnMut<Args>>::call_mut
at /Users/ln/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1325:9
3: <winit::platform_impl::platform::app_state::EventLoopHandler<T> as winit::platform_impl::platform::app_state::EventHandler>::handle_nonuser_event
at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/app_state.rs:71:9
4: winit::platform_impl::platform::app_state::Handler::handle_nonuser_event
at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/app_state.rs:172:21
5: winit::platform_impl::platform::app_state::AppState::cleared
at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/app_state.rs:330:17
6: winit::platform_impl::platform::observer::control_flow_end_handler
at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/observer.rs:135:13
7: <unknown>
8: <unknown>
9: <unknown>
10: <unknown>
11: <unknown>
12: <unknown>
13: <unknown>
14: <unknown>
15: <unknown>
16: <unknown>
17: <() as objc::message::MessageArguments>::invoke
at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/mod.rs:128:17
18: objc::message::platform::send_unverified
at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/apple/mod.rs:27:9
19: objc::message::send_message
at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/mod.rs:178:5
20: winit::platform_impl::platform::event_loop::EventLoop<T>::run_return
at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/event_loop.rs:102:25
21: winit::platform_impl::platform::event_loop::EventLoop<T>::run
at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/event_loop.rs:89:9
22: winit::event_loop::EventLoop<T>::run
at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/event_loop.rs:149:9
23: play::main
at ./src/bin/play.rs:124:9
24: core::ops::function::FnOnce::call_once
at /Users/ln/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
With info logging:
[0.010523 INFO]()(no module): Loading trace '"../../trace"'
[2.767758 INFO]()(no module): Found 22101 actions
[2.767859 INFO]()(no module): Creating a window
[2.851047 INFO]()(no module): Initializing the device for backend: Metal
[2.856255 INFO](Instance::pick_adapter)(wgpu_core::instance): Adapter Metal AdapterInfo { name: "AMD Radeon Pro Vega 56", vendor: 0, device: 0, device_type: DiscreteGpu }
[2.856390 INFO]()(no module): Picked 'AMD Radeon Pro Vega 56'
[2.856415 WARN](Adapter::request_device)(wgpu_core::instance): Feature MAPPABLE_PRIMARY_BUFFERS enabled on a discrete gpu. This is a massive performance footgun and likely not what you wanted
[2.856440 WARN](Adapter::request_device)(wgpu_core::instance): Missing internal features: ROBUST_BUFFER_ACCESS
[2.856613 WARN](Adapter::request_device)(wgpu_core::instance): Shader validation is disabled
[2.856672 INFO](Adapter::request_device)(wgpu_core::command::allocator): Starting on (internal) thread ThreadId(1)
[2.857033 INFO]()(no module): Executing actions
OK, I had to change wgt::Backend::Empty in one place to wgt::Backend::Metal to get it to not fail as above. Post that change, I can reliably reproduce the segfault using the uploaded trace.
Weird, I was able to replay your trace without entered unreachable code, and I don't see any Empty used in it.
Anyway, machine just totally went nuts upon replaying, unfortunately. The driver doesn't catch writing outside of bounds, and we are stomping on some driver userland data.
Interestingly, seeing the following Metal validation errors when replaying the trace:
-[MTLDebugRenderCommandEncoder validateFramebufferWithRenderPipelineState:], line 1288: error 'Framebuffer With Render Pipeline State Validation
For color attachment 0, the texture sample count (1) does not match the renderPipelineState colorSampleCount (4).
The sample count (1) does not match the renderPipelineState's sample count (4)
'
-[MTLDebugRenderCommandEncoder validateFramebufferWithRenderPipelineState:]:1288: failed assertion `Framebuffer With Render Pipeline State Validation
For color attachment 0, the texture sample count (1) does not match the renderPipelineState colorSampleCount (4).
The sample count (1) does not match the renderPipelineState's sample count (4)
'
That's odd. I'm doing:
RenderPassColorAttachmentDescriptor {
attachment: multisample_framebuffer,
resolve_target: resolved_framebuffer,
...
}
where multisample_framebuffer is created with sample count 4, and resolved_framebuffer is created with sample count 1. That should be ok, right?
Is it possible to replay trace on any other backend?
@zakarumych yes, all you need is to open RON file in an editor and search+replace the backend type. Hopefully, we can make this process smoother in the future.
That's odd. I'm doing:
RenderPassColorAttachmentDescriptor { attachment: multisample_framebuffer, resolve_target: resolved_framebuffer, ... }where multisample_framebuffer is created with sample count 4, and resolved_framebuffer is created with sample count 1. That should be ok, right?
if that's what you are doing, then it's our bug. It's strange however that we'd not see it on our msaa-line example.
@zakarumych yes, all you need is to open RON file in an editor and search+replace the backend type. Hopefully, we can make this process smoother in the future.
Unfortunately it doesn't work
Talked over chat, and the problems @zakarumych sees appear to be unrelated to API tracing. It's the annoying surfman on Linux.
FYI. Seeing a new issue after updating for the fix of this one:
https://github.com/gfx-rs/wgpu-rs/issues/638#issue-749960912
@zakarumych yes, all you need is to open RON file in an editor and search+replace the backend type. Hopefully, we can make this process smoother in the future.
Unfortunately it doesn't work
fixed by #1050