Wgpu: Leaking buffers macOS?

Created on 4 Apr 2020 · 17Comments · Source: gfx-rs/wgpu

I haven't got a small reproducible case, but it looks like I'm seeing leaking memory. Using wgpu-rs on macOS. AMD Radeon Pro Vega 56 DiscreteGpu Metal

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: AllocationError(OutOfMemory(Device))', /Users/ln/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/05ba7a5/wgpu-core/src/device/mod.rs:328:22
stack backtrace:
   0:        0x107737575 - backtrace::backtrace::libunwind::trace::h5c84db184dbe55ed
                               at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1:        0x107737575 - backtrace::backtrace::trace_unsynchronized::hfabf504f184a4062
                               at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2:        0x107737575 - std::sys_common::backtrace::_print_fmt::hb18545a457444b58
                               at src/libstd/sys_common/backtrace.rs:77
   3:        0x107737575 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hff7732c2e44ef8b9
                               at src/libstd/sys_common/backtrace.rs:59
   4:        0x107754bad - core::fmt::write::hd42cb3dea57bae40
                               at src/libcore/fmt/mod.rs:1052
   5:        0x107734f3b - std::io::Write::write_fmt::ha39f6009af02b1d2
                               at src/libstd/io/mod.rs:1426
   6:        0x10773938a - std::sys_common::backtrace::_print::h5cfb8cdd320f1e64
                               at src/libstd/sys_common/backtrace.rs:62
   7:        0x10773938a - std::sys_common::backtrace::print::hef683e3bc77ce269
                               at src/libstd/sys_common/backtrace.rs:49
   8:        0x10773938a - std::panicking::default_hook::{{closure}}::h389f076017b5df43
                               at src/libstd/panicking.rs:204
   9:        0x10773908a - std::panicking::default_hook::h04b06ec20c41bf02
                               at src/libstd/panicking.rs:224
  10:        0x1077399dd - std::panicking::rust_panic_with_hook::hccde7faed9a5c398
                               at src/libstd/panicking.rs:472
  11:        0x1077395a2 - rust_begin_unwind
                               at src/libstd/panicking.rs:380
  12:        0x107767ebf - std::panicking::begin_panic::hc520a8a43176ea4c
  13:        0x107767dc5 - std::panicking::begin_panic::hc520a8a43176ea4c
  14:        0x106eff899 - core::result::Result<T,E>::unwrap::hfdd5819c5946a227
                               at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/result.rs:963
  15:        0x106e3fd8c - wgpu_core::device::Device<B>::create_buffer::h40f8e83dda14e0f2
                               at /Users/ln/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/05ba7a5/wgpu-core/src/device/mod.rs:328
  16:        0x106da5b15 - wgpu_core::device::<impl wgpu_core::hub::Global<G>>::device_create_buffer_mapped::h93275aab33da2756
                               at /Users/ln/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/05ba7a5/wgpu-core/src/device/mod.rs:554
  17:        0x106e43bbc - wgpu_device_create_buffer_mapped
                               at /Users/ln/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/05ba7a5/wgpu-native/src/device.rs:215
  18:        0x106d39d19 - wgpu::Device::create_buffer_mapped::h9cd4b0ad4b3c5391
                               at /Users/ln/.cargo/git/checkouts/wgpu-rs-40ea39809c03c5d8/e198d63/src/lib.rs:946
  19:        0x106d39e75 - wgpu::Device::create_buffer_with_data::h2645df23584f5aa9
                               at /Users/ln/.cargo/git/checkouts/wgpu-rs-40ea39809c03c5d8/e198d63/src/lib.rs:964

bug

Source

lordnoriyuki

Most helpful comment

Thanks to your test case, it was not too difficult to track down.
Now fixed upstream in https://github.com/gfx-rs/gfx-extras/pull/8
Please pick up the new version by doing cargo update -p gfx-memory.

kvark on 5 Apr 2020

👍2 🎉1

All 17 comments

In Instruments, it looks like "All Heap Allocations" looks ok. There is growth consistently frame by frame that eventually gets freed. The biggest of which is I'm seeing allocations by [GFX9_MtlCmdBuffer blitCommandEncoder] through wgpu::CommandEncoder::copy_buffer_to_buffer growing to more than 150k total allocations over minutes, that then eventually gets released. So that looks like a red herring.

The one that looks like the root cause is from "All VM Regions". VM: IOAccelerator - this doesn't seem to leak all the time, my app was running for 11 minutes with no incremental allocations from IOAccelerator, but once it starts leaking it grew to 1GB and then the app panicked as above. I'm not sure whether anything in my code triggered it. I'll dig some more and see if I can find something.

Here's the allocation call trace of the call that is definitely problematic:

   0 IOAccelerator IOAccelResourceCreate
   1 Metal -[MTLIOAccelResource initWithDevice:remoteStorageResource:options:args:argsSize:]
   2 Metal -[MTLIOAccelResource initWithDevice:options:args:argsSize:]
   3 Metal -[MTLIOAccelBuffer initWithDevice:pointer:length:options:sysMemSize:vidMemSize:args:argsSize:deallocator:]
   4 AMDRadeonX5000MTLDriver -[AMD_MtlBuffer initWithDevice:pointer:length:options:sysMemSize:vidMemSize:args:argsSize:deallocator:]
   5 AMDRadeonX5000MTLDriver -[AMD_MtlBuffer initInternalWithDevice:pointer:length:options:deallocator:]
   6 AMDRadeonX5000MTLDriver -[AMD_MtlDevice newBufferWithLength:options:]
   7 myapp _$LT$$LP$A$C$$u20$B$RP$$u20$as$u20$objc..message..MessageArguments$GT$::invoke::hfd269ed3df516a3b /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/mod.rs:128
   8 myapp objc::message::platform::send_unverified::ha3bb8f66d92d4c47 /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/apple/mod.rs:27
   9 myapp objc::message::send_message::he77867efdfe3ea45 /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/mod.rs:178
  10 myapp metal::device::DeviceRef::new_buffer::h610be1f6ca0bf17b /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/metal-0.18.0/src/device.rs:1653
  11 myapp _$LT$gfx_backend_metal..device..Device$u20$as$u20$gfx_hal..device..Device$LT$gfx_backend_metal..Backend$GT$$GT$::allocate_memory::hf145286829816ae2 /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-backend-metal-0.5.1/src/device.rs:2275
  12 myapp gfx_memory::allocator::allocate_memory_helper::h8514d362e5d363d4 /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-memory-0.1.1/src/allocator/mod.rs:61
  13 myapp gfx_memory::allocator::general::GeneralAllocator$LT$B$GT$::alloc_chunk_from_device::h3f5e412513142799 /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-memory-0.1.1/src/allocator/general.rs:236
  14 myapp gfx_memory::allocator::general::GeneralAllocator$LT$B$GT$::alloc_chunk::hfb5f92c151ec8ce1 /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-memory-0.1.1/src/allocator/general.rs:271
  15 myapp gfx_memory::allocator::general::GeneralAllocator$LT$B$GT$::alloc_from_entry::h1d010a782d4c4b3f /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-memory-0.1.1/src/allocator/general.rs:364
  16 myapp gfx_memory::allocator::general::GeneralAllocator$LT$B$GT$::alloc_block::h5349f67da9b9b5f4 /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-memory-0.1.1/src/allocator/general.rs:418
  17 myapp _$LT$gfx_memory..allocator..general..GeneralAllocator$LT$B$GT$$u20$as$u20$gfx_memory..allocator..Allocator$LT$B$GT$$GT$::alloc::hbeda516620781f02 /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-memory-0.1.1/src/allocator/general.rs:498
  18 myapp gfx_memory::heaps::memory_type::MemoryType$LT$B$GT$::alloc::h0228881f8dde543d /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-memory-0.1.1/src/heaps/memory_type.rs:92
  19 myapp gfx_memory::heaps::Heaps$LT$B$GT$::allocate_from::h6008f56da3692568 /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-memory-0.1.1/src/heaps/mod.rs:168
  20 myapp gfx_memory::heaps::Heaps$LT$B$GT$::allocate::h87c66e912681c07a /Users/ln/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-memory-0.1.1/src/heaps/mod.rs:137
  21 myapp wgpu_core::device::Device$LT$B$GT$::create_buffer::h40f8e83dda14e0f2 /Users/ln/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/05ba7a5/wgpu-core/src/device/mod.rs:328
  22 myapp wgpu_core::device::_$LT$impl$u20$wgpu_core..hub..Global$LT$G$GT$$GT$::device_create_buffer_mapped::h93275aab33da2756 /Users/ln/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/05ba7a5/wgpu-core/src/device/mod.rs:554
  23 myapp wgpu_device_create_buffer_mapped /Users/ln/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/05ba7a5/wgpu-native/src/device.rs:215
  24 myapp wgpu::Device::create_buffer_mapped::h9cd4b0ad4b3c5391 /Users/ln/.cargo/git/checkouts/wgpu-rs-40ea39809c03c5d8/e198d63/src/lib.rs:946
  25 myapp wgpu::Device::create_buffer_with_data::h2645df23584f5aa9 /Users/ln/.cargo/git/checkouts/wgpu-rs-40ea39809c03c5d8/e198d63/src/lib.rs:964

lordnoriyuki on 4 Apr 2020

One more piece of data, the problematic call from my code is creating the vertex buffer. I dynamically create a vec of vertex data every frame, then create the wgpu buffer from that. The byte array passed to create_buffer_with_data is in the range of 1.16-1.24 mb every frame consistently. That's still true of the last call into create_buffer_with_data which then panics (so it's not my code suddenly generating some crazy buffer size or something).

lordnoriyuki on 4 Apr 2020

Thank you for the info!
Sounds like a basic thing of not properly destroying the buffers that are created with mapping.
Could be a regression from https://github.com/gfx-rs/wgpu/pull/547

kvark on 5 Apr 2020

I added more logging and tested our skybox example on macos. It appears so far that all the tracking and freeing works as expected (this fragment repeats every frame):

[2020-04-04T23:31:48Z TRACE gfx_memory::heaps] Allocate memory block: type '1', kind  'Linear', size: '256', align: '256'
[2020-04-04T23:31:48Z TRACE gfx_memory::heaps] Free memory block: type '1', size: '256'

Could you share your test case with us to test?

kvark on 5 Apr 2020

It's not really easily isolated as a test case right now. I'm trying to figure out when it triggers. It's absolutely not every frame (like one in thousands of frames) even when I can get it to trigger. It's somehow linked to exactly what is being rendered on each frame - and I think it needs to be significant buffer sizes (> 1MB from my testing so far). When I simply render nothing, but create the vertex buffer, but clip its size to under a megabyte, I can't reproduce it. But even then it's not obvious when the problem occurs to me yet. I'll turn on trace logging for gfx_memory and see what I can see in the log.

lordnoriyuki on 5 Apr 2020

If it's not happening consistently, are you sure this is a leak? versus, say memory fragmentation.

kvark on 5 Apr 2020

It's consistent in that I can get it to fail every time, just not consistent timing. There are certainly allocations from the Instruments logging that are not getting freed. It's just very few of them, but each of them are 10MB, so after I have about 100 of them I'm hitting a memory cap somewhere.

I've got a VERY large trace log from gfx_memory, but finding any problematic entries is difficult.

Right near the beginning I get:
[2020-04-04T23:51:55Z WARN gfx_memory::heaps] Unable to allocate 4194304 with Linear: TooManyObjects

Not sure whether that's a red herring or not (it occurs way before anything starts not getting deallocated).

lordnoriyuki on 5 Apr 2020

no, that should be ok. Linear allocator doesn't handle arbitrary sizes, and we are just falling back to a dedicated allocation.
Please try to share your work in some way that I could test it.

kvark on 5 Apr 2020

Sorry, I really appreciate the attempt to help here. Let me see if I can minimize the reproducible case. I'm trying to avoid giving you 6k lines of code.

lordnoriyuki on 5 Apr 2020

OK. I think I found reproducible code for my problem.

Am I doing something dumb here?

This panics in under 5secs on my machine.

Note that the variable size vertex buffer seem to matter here:

let vertices = vec![0u8; 2_000_000 + (count % 500) * 1000];

If I do a constant allocation for vertices, it doesn't reproduce the problem:

let vertices = vec![0u8; 2_000_000];

cargo.toml

[package]
name = "memleak"
version = "0.1.0"
authors = ["lordnoriyuki <[email protected]>"]
edition = "2018"

[dependencies]
winit = "0.22"
wgpu = { git = "https://github.com/gfx-rs/wgpu-rs" }
futures = "0.3"

main.rs

use winit;

fn main() {
    let winit_event_loop = winit::event_loop::EventLoop::new();
    let logical_size = winit::dpi::LogicalSize::new(400, 400);
    let window = winit::window::WindowBuilder::new()
            .with_inner_size(logical_size)
            .build(&winit_event_loop).unwrap();

    let surface = wgpu::Surface::create(&window);

    let (adapter, device, queue) = futures::executor::block_on(create_device());

    let mut swap_chain = device.create_swap_chain(
        &surface,
        &wgpu::SwapChainDescriptor {
            usage: wgpu::TextureUsage::OUTPUT_ATTACHMENT,
            format: wgpu::TextureFormat::Bgra8Unorm,
            width: 400,
            height: 400,
            present_mode: wgpu::PresentMode::Immediate,
        },
    );

    let mut count = 0;

    winit_event_loop.run(move |event, _, control_flow| {
        *control_flow = winit::event_loop::ControlFlow::Poll;
        match event {
            winit::event::Event::MainEventsCleared => {
                *control_flow = winit::event_loop::ControlFlow::Poll;
                window.request_redraw();
            },
            winit::event::Event::RedrawRequested(_) => {
                let frame = swap_chain.get_next_texture().unwrap();
                let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor { label: None });

                let vertices = vec![0u8; 2_000_000 + (count % 500) * 1000];
                device.create_buffer_with_data(&vertices, wgpu::BufferUsage::VERTEX);
                encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
                    color_attachments: &[wgpu::RenderPassColorAttachmentDescriptor {
                      attachment: &frame.view,
                      resolve_target: None,
                      load_op: wgpu::LoadOp::Clear,
                      store_op: wgpu::StoreOp::Store,
                      clear_color: wgpu::Color { r: 0.0, g: 0.0, b: 0.5, a: 1.0 },
                    }],
                    depth_stencil_attachment: None,
                });

                queue.submit(&[encoder.finish()]);

                count += 1;
            },
            winit::event::Event::WindowEvent { event, .. } => match event {
                winit::event::WindowEvent::CloseRequested => {
                    *control_flow = winit::event_loop::ControlFlow::Exit
                }
                _ => {}
            },
            _ => {}
        };
    })
}

async fn create_device() -> (wgpu::Adapter, wgpu::Device, wgpu::Queue) {
    let adapter = wgpu::Adapter::request(&wgpu::RequestAdapterOptions {
      power_preference: wgpu::PowerPreference::HighPerformance,
      compatible_surface: None,
    }, wgpu::BackendBit::PRIMARY).await.unwrap();

    let (device, queue) = adapter.request_device(&wgpu::DeviceDescriptor {
      extensions: wgpu::Extensions {
        anisotropic_filtering: false,
      },
      limits: wgpu::Limits::default(),
    }).await;

    (adapter, device, queue)
  }

lordnoriyuki on 5 Apr 2020

Oh interesting, you are just creating a buffer that is being dropped! I can test this for sure, thank you 👍

kvark on 5 Apr 2020

👍1

I tried creating 16Mb buffers in the cube example this way, and it appears that we are freeing all of them correctly. Could you push your repro code in a branch for me to check out?

kvark on 5 Apr 2020

Code is here:

https://github.com/lordnoriyuki/memleak

lordnoriyuki on 5 Apr 2020

It was this commit to wgpu-rs:

https://github.com/gfx-rs/wgpu-rs/commit/31e80d99b3f06a711a8ee7cc9474016a77c64590

Which pulled in latest wgpu. The commit before this doesn't panic on my machine.

Looking at that, the wgpu commit when it worked was https://github.com/gfx-rs/wgpu/commit/08e8d406c175579da5ef18c1abf4d6c00e2a9726. Between then and the current wgpu commit (https://github.com/gfx-rs/wgpu/commit/306554600ab7479ec3e54d0c076c71f02474237a) used in wgpu-rs, I'm guessing it's likely it was the "Port to gfx-extras and gfx-hal-0.5" changes. Only other one that looks like it could be related was "Recycled identity management (#533)".

lordnoriyuki on 5 Apr 2020

❤1

Was running your testcase, and I think I got a lead: we never end up calling free_chunk. So the problem is definitely in gfx-memory.

kvark on 5 Apr 2020

❤1

kvark on 5 Apr 2020

👍2 🎉1

Confirmed fixes it in the test case, and in my original code. Thank you very much!

lordnoriyuki on 5 Apr 2020

🎉1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Support for object labels

kvark · 4Comments

Validate resource usage as specified on creation

kvark · 4Comments

Segfault on Linux/Vulkan when closing application

alichay · 4Comments

Presenting blocks other operations with VSync on

LaylConway · 3Comments

Validation error DescriptorSetNotBound when using (compute-)pipline without push constants after one with

Wumpf · 4Comments