Wgpu: Segfault/hang related to gpu-alloc (macOS 11, metal)

Created on 23 Nov 2020 · 21Comments · Source: gfx-rs/wgpu

After updating to latest version of wgpu-rs, I've gotten a consistent crash issue in my code. If I roll back to the version of wgpu-rs before gpu-alloc replaced gfx-memory, then the app runs fine. The crash isn't immediate, but does happen consistently after the app renders enough frames.

I'll see what I can do to reproduce in some smaller codebase, but the problem I'm having now is that I've only gotten a segfault once - every other time it just locks up the computer, forcing me to reboot. That's making it more than a little difficult to track down.

The best I've got so far is that it's happening in queue.rs queue_write_buffer when it calls:
stage.memory.write_bytes(&device.raw, 0, data)?;
MemoryBlock::write_bytes has some unsafe code calling into gpu-alloc that is presumably where things go bad.

bug

Source

lordnoriyuki

👍1

All 21 comments

The system at least logged the segfault before it hung. Here's the relevant portion:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000012651d000
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Segmentation fault: 11
Termination Reason:    Namespace SIGNAL, Code 0xb
Terminating Process:   exc handler [1038]

VM Regions Near 0x12651d000:
    IOAccelerator               1244ed000-1264ed000    [ 32.0M] rw-/rw- SM=SHM  
--> 
    STACK GUARD              700007320000-700007321000 [    4K] ---/rwx SM=NUL  stack guard for thread 1

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_platform.dylib        0x00007fff2037a0a9 _platform_memmove$VARIANT$Haswell + 41
1   strategic                       0x000000010d7c51f6 core::intrinsics::copy_nonoverlapping::h1f6aa0a8620d1132 + 8 (intrinsics.rs:1860) [inlined]
2   strategic                       0x000000010d7c51f6 gpu_alloc::block::MemoryBlock$LT$M$GT$::write_bytes::hb720e46216fbac3f + 273 (block.rs:248) [inlined]
3   strategic                       0x000000010d7c51f6 wgpu_core::device::alloc::MemoryBlock$LT$B$GT$::write_bytes::h21d20e56a84d86ca + 310 (alloc.rs:149)
4   strategic                       0x000000010d779dbc wgpu_core::device::queue::_$LT$impl$u20$wgpu_core..hub..Global$LT$G$GT$$GT$::queue_write_buffer::ha0909694bc2b410b + 892 (queue.rs:215)
5   strategic                       0x000000010d809a95 _$LT$wgpu..backend..direct..Context$u20$as$u20$wgpu..Context$GT$::queue_write_buffer::h6dccde0659d80adf + 101 (direct.rs:1482)
6   strategic                       0x000000010d6a9bc4 renderer::renderer::Renderer::end_frame::h9c20d6895798ee3d + 1028 (renderer.rs:1134)
7   strategic                       0x000000010d54efb6 strategic::main::_$u7b$$u7b$closure$u7d$$u7d$::h7f3db76ca6301c7a + 422 (main.rs:89)

lordnoriyuki on 23 Nov 2020

I'm calling this every frame:

queue.write_buffer(&vertex_buffer, 0, vertices.as_bytes());

and vertex_buffer was created to hold 600,000 bytes with wgpu::BufferUsage::VERTEX | wgpu::BufferUsage::COPY_DST, and vertices.as_bytes() is 101,640 bytes, and is the same contents for many frames before this segfaults.

lordnoriyuki on 23 Nov 2020

To help repro this issue, it might be helpful to have a wgpu trace created with the offending version. https://github.com/gfx-rs/wgpu/wiki/Debugging-wgpu-Applications#tracing-infrastructure are instructions on how to record one.

cwfitzgerald on 23 Nov 2020

👍1

Thank you for filing! It's critical to record this API trace for us to reproduce. Filed https://github.com/zakarumych/gpu-alloc/issues/27 in the meantime.

kvark on 23 Nov 2020

trace.zip

Trace attached. No stack trace on this one, as it never got written to the log. Looks like WindowServer died.

lordnoriyuki on 23 Nov 2020

❤1

Not sure how helpful this is going to be, but I printed the data in the call to:
gpu_alloc::block::MemoryBlock::write_bytes()
specifically before the segfault call:
copy_nonoverlapping(data.as_ptr(), ptr.as_ptr(), size);

Here are the last 10 frames (so the last line is the call that segfaults) logging data.as_ptr() ptr.as_ptr() size:

0x7fd4f5819000 0x12850bc00 18720
0x7ffee445ef7c 0x128510600 20
0x7ffee445f688 0x128510700 16
0x7fd4f5819000 0x128510800 18720
0x7ffee445ef7c 0x128515200 20
0x7ffee445f688 0x128515300 16
0x7fd4f5819000 0x128515400 18720
0x7ffee445ef7c 0x128519e00 20
0x7ffee445f688 0x128519f00 16
0x7fd4f5819000 0x12851a000 18720
0x7ffee445ef7c 0x12851ea00 20
0x7ffee445f688 0x12851eb00 16
0x7fd4f5819000 0x12851ec00 18720
0x7ffee445ef7c 0x128523600 20
0x7ffee445f688 0x128523700 16
0x7fd4f5819000 0x128523800 18720
0x7ffee445ef7c 0x128528200 20
0x7ffee445f688 0x128528300 16
0x7fd4f5819000 0x128528400 18720
0x7ffee445ef7c 0x12852ce00 20
0x7ffee445f688 0x12852cf00 16
0x7fd4f5819000 0x12852d000 18720
0x7ffee445ef7c 0x128531a00 20
0x7ffee445f688 0x128531b00 16
0x7fd4f5819000 0x128531c00 18720
0x7ffee445ef7c 0x128536600 20
0x7ffee445f688 0x128536700 16
0x7fd4f5819000 0x128536800 18720

And it didn't hang on this run, so here's the stack trace:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000012853a000
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Segmentation fault: 11
Termination Reason:    Namespace SIGNAL, Code 0xb
Terminating Process:   exc handler [39378]

VM Regions Near 0x12853a000:
    mapped file                 1224b6000-1225be000    [ 1056K] r--/rwx SM=COW  Object_id=42a7e497
--> 
    STACK GUARD              7000014cf000-7000014d0000 [    4K] ---/rwx SM=NUL  stack guard for thread 1

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_platform.dylib        0x00007fff203740a9 _platform_memmove$VARIANT$Haswell + 41
1   strategic                       0x000000010c608b68 core::intrinsics::copy_nonoverlapping::h298f13d14b347174 + 40 (intrinsics.rs:1860)
2   strategic                       0x000000010bde9cc3 gpu_alloc::block::MemoryBlock$LT$M$GT$::write_bytes::h850253fbc0424ff9 + 1955 (block.rs:248) [inlined]
3   strategic                       0x000000010bde9cc3 wgpu_core::device::alloc::MemoryBlock$LT$B$GT$::write_bytes::h0130ad9960e38064 + 2067 (alloc.rs:149)
4   strategic                       0x000000010bb508a2 wgpu_core::device::queue::_$LT$impl$u20$wgpu_core..hub..Global$LT$G$GT$$GT$::queue_write_buffer::h70e000e8b5af4a06 + 2994 (queue.rs:215)
5   strategic                       0x000000010bc83080 _$LT$wgpu..backend..direct..Context$u20$as$u20$wgpu..Context$GT$::queue_write_buffer::h52b3769ea2d125d2 + 192 (direct.rs:1485)
6   strategic                       0x000000010bd135ff wgpu::Queue::write_buffer::h503124c68151fae3 + 111 (lib.rs:2609)
7   strategic                       0x000000010ba87256 renderer::renderer::Renderer::end_frame::hce81f0e7379b54f3 + 2102 (renderer.rs:1135)

The destination memory for the copy starts at 0x128536800 (4_971_522_048). The copy size is 18_720. The copy should finish at 4_971_540_768. The access is for 0x12853a000 (4_971_536_384). So it looks like it copied 14_336 bytes, with 4_384 bytes remaining, when it segfaulted.

lordnoriyuki on 23 Nov 2020

BTW. I tried to run "play" (super cool!), and the trace.ron was an invalid RON file as it's missing the closing ']'. I added that and then it panicked with thread 'main' panicked at 'internal error: entered unreachable code', player/src/bin/play.rs:148:29 (and I turned winit feature on).

It looks like the closing bracket is written in Trace::drop, so maybe that's expected that it's missing. It looks like it isn't a buffered write, so it should be complete aside from Trace::drop not getting called. However, I'd expect that, given where it crashes, the last call would be a WriteBuffer, not a Submit?

lordnoriyuki on 23 Nov 2020

@lordnoriyuki in addition to the closing ']' you'd also need to run with winit feature:

cargo run --features winit

kvark on 23 Nov 2020

I did add winit, and got:

thread 'main' panicked at 'internal error: entered unreachable code', player/src/bin/play.rs:148:29
stack backtrace:
...
   0: std::panicking::begin_panic
             at /Users/ln/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:519:12
   1: play::main::{{closure}}
             at ./src/bin/play.rs:148:29
   2: <alloc::boxed::Box<F,A> as core::ops::function::FnMut<Args>>::call_mut
             at /Users/ln/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1325:9
   3: <winit::platform_impl::platform::app_state::EventLoopHandler<T> as winit::platform_impl::platform::app_state::EventHandler>::handle_nonuser_event
             at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/app_state.rs:71:9
   4: winit::platform_impl::platform::app_state::Handler::handle_nonuser_event
             at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/app_state.rs:172:21
   5: winit::platform_impl::platform::app_state::AppState::cleared
             at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/app_state.rs:330:17
   6: winit::platform_impl::platform::observer::control_flow_end_handler
             at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/observer.rs:135:13
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: <unknown>
  17: <() as objc::message::MessageArguments>::invoke
             at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/mod.rs:128:17
  18: objc::message::platform::send_unverified
             at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/apple/mod.rs:27:9
  19: objc::message::send_message
             at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/mod.rs:178:5
  20: winit::platform_impl::platform::event_loop::EventLoop<T>::run_return
             at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/event_loop.rs:102:25
  21: winit::platform_impl::platform::event_loop::EventLoop<T>::run
             at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/platform_impl/macos/event_loop.rs:89:9
  22: winit::event_loop::EventLoop<T>::run
             at /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.22.2/src/event_loop.rs:149:9
  23: play::main
             at ./src/bin/play.rs:124:9
  24: core::ops::function::FnOnce::call_once
             at /Users/ln/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5

lordnoriyuki on 23 Nov 2020

With info logging:

[0.010523 INFO]()(no module): Loading trace '"../../trace"'
[2.767758 INFO]()(no module): Found 22101 actions
[2.767859 INFO]()(no module): Creating a window
[2.851047 INFO]()(no module): Initializing the device for backend: Metal
[2.856255 INFO](Instance::pick_adapter)(wgpu_core::instance): Adapter Metal AdapterInfo { name: "AMD Radeon Pro Vega 56", vendor: 0, device: 0, device_type: DiscreteGpu }
[2.856390 INFO]()(no module): Picked 'AMD Radeon Pro Vega 56'
[2.856415 WARN](Adapter::request_device)(wgpu_core::instance): Feature MAPPABLE_PRIMARY_BUFFERS enabled on a discrete gpu. This is a massive performance footgun and likely not what you wanted
[2.856440 WARN](Adapter::request_device)(wgpu_core::instance): Missing internal features: ROBUST_BUFFER_ACCESS
[2.856613 WARN](Adapter::request_device)(wgpu_core::instance): Shader validation is disabled
[2.856672 INFO](Adapter::request_device)(wgpu_core::command::allocator): Starting on (internal) thread ThreadId(1)
[2.857033 INFO]()(no module): Executing actions

lordnoriyuki on 23 Nov 2020

OK, I had to change wgt::Backend::Empty in one place to wgt::Backend::Metal to get it to not fail as above. Post that change, I can reliably reproduce the segfault using the uploaded trace.

lordnoriyuki on 23 Nov 2020

Weird, I was able to replay your trace without entered unreachable code, and I don't see any Empty used in it.
Anyway, machine just totally went nuts upon replaying, unfortunately. The driver doesn't catch writing outside of bounds, and we are stomping on some driver userland data.

kvark on 24 Nov 2020

👍1

Interestingly, seeing the following Metal validation errors when replaying the trace:

 -[MTLDebugRenderCommandEncoder validateFramebufferWithRenderPipelineState:], line 1288: error 'Framebuffer With Render Pipeline State Validation
For color attachment 0, the texture sample count (1) does not match the renderPipelineState colorSampleCount (4).
The sample count (1) does not match the renderPipelineState's sample count (4)
'
-[MTLDebugRenderCommandEncoder validateFramebufferWithRenderPipelineState:]:1288: failed assertion `Framebuffer With Render Pipeline State Validation
For color attachment 0, the texture sample count (1) does not match the renderPipelineState colorSampleCount (4).
The sample count (1) does not match the renderPipelineState's sample count (4)
'

kvark on 24 Nov 2020

That's odd. I'm doing:

RenderPassColorAttachmentDescriptor {
 attachment: multisample_framebuffer,
 resolve_target: resolved_framebuffer,
 ...
}

where multisample_framebuffer is created with sample count 4, and resolved_framebuffer is created with sample count 1. That should be ok, right?

lordnoriyuki on 24 Nov 2020

Is it possible to replay trace on any other backend?

zakarumych on 24 Nov 2020

@zakarumych yes, all you need is to open RON file in an editor and search+replace the backend type. Hopefully, we can make this process smoother in the future.

kvark on 24 Nov 2020

That's odd. I'm doing:
RenderPassColorAttachmentDescriptor {
 attachment: multisample_framebuffer,
 resolve_target: resolved_framebuffer,
 ...
}
where multisample_framebuffer is created with sample count 4, and resolved_framebuffer is created with sample count 1. That should be ok, right?

if that's what you are doing, then it's our bug. It's strange however that we'd not see it on our msaa-line example.

kvark on 24 Nov 2020

@zakarumych yes, all you need is to open RON file in an editor and search+replace the backend type. Hopefully, we can make this process smoother in the future.

Unfortunately it doesn't work

zakarumych on 24 Nov 2020

Talked over chat, and the problems @zakarumych sees appear to be unrelated to API tracing. It's the annoying surfman on Linux.

kvark on 24 Nov 2020

FYI. Seeing a new issue after updating for the fix of this one:

https://github.com/gfx-rs/wgpu-rs/issues/638#issue-749960912

lordnoriyuki on 24 Nov 2020

@zakarumych yes, all you need is to open RON file in an editor and search+replace the backend type. Hopefully, we can make this process smoother in the future.

Unfortunately it doesn't work

fixed by #1050

kvark on 26 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Standalone pass recording

kvark · 11Comments

Garbled framebuffer output on Nvidia

fintelia · 23Comments

Convert `tracing` to `log` + `profiling`

cwfitzgerald · 20Comments

Every 676 frames, Queue::write_buffer takes ~25ms

Aeledfyr · 23Comments

Potential memory leak

YuuriMomo · 18Comments