Wgpu: Garbled framebuffer output on Nvidia

Created on 26 Jan 2021  路  23Comments  路  Source: gfx-rs/wgpu

Description
Running Terra on Nvidia GPUs produces garbled framebuffer output and some Vulkan validation errors.

This was first reported as https://github.com/fintelia/terra/issues/13.

Repro steps
Either clone and run Terra or just use the trace file attached below.

Extra materials

trace.zip

Validation errors:

[UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout] Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x7fb69a0af848, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | Submitted command buffer expects VkImage 0x22d000000022d[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_GENERAL--instead, current layout is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.
[VUID-VkRenderPassBeginInfo-framebuffer-03210] Validation Error: [ VUID-VkRenderPassBeginInfo-framebuffer-03210 ] Object 0: handle = 0x8970000000897, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0xcc018073 | VkRenderPassBeginInfo: Image view #0 created from an image with usage set as 0x11, but image info #0 used to create the framebuffer had usage set as 0x10 The Vulkan spec states: If framebuffer was created with a VkFramebufferCreateInfo::flags value that included VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT, each element of the pAttachments member of a VkRenderPassAttachmentBeginInfo structure included in the pNext chain must be a VkImageView of an image created with a value of VkImageCreateInfo::usage equal to the usage member of the corresponding element of VkFramebufferAttachmentsCreateInfo::pAttachments used to create framebuffer (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-VkRenderPassBeginInfo-framebuffer-03210)
[VUID-VkRenderPassBeginInfo-framebuffer-03210] Validation Error: [ VUID-VkRenderPassBeginInfo-framebuffer-03210 ] Object 0: handle = 0x8970000000897, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0xcc018073 | VkRenderPassBeginInfo: Image view #1 created from an image with usage set as 0x27, but image info #1 used to create the framebuffer had usage set as 0x20 The Vulkan spec states: If framebuffer was created with a VkFramebufferCreateInfo::flags value that included VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT, each element of the pAttachments member of a VkRenderPassAttachmentBeginInfo structure included in the pNext chain must be a VkImageView of an image created with a value of VkImageCreateInfo::usage equal to the usage member of the corresponding element of VkFramebufferAttachmentsCreateInfo::pAttachments used to create framebuffer (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-VkRenderPassBeginInfo-framebuffer-03210)

Platform
Ubuntu 20.10
nvidia-driver-460
latest wgpu

driver-bug help wanted

All 23 comments

Looks very similar to #1172

Running terra on Linux/NV with VVL 1.2.162.0, I'm seeing no validation errors.
I hooked up env_logger to it, and still the only thing I'm seeing is regular output, no validation issues.

Given the story of #1172, I wonder if the validation layers are just buggy? Alternatively, we are doing something unsafe and broken, but it somehow works.

As for your trace, it tries to create a swapchain of size 800x600, and then resizes to 1920x1080. This is impossible to replay on my machine atm. Perhaps, a trace without resizing could help, in case there are issues with debug output on terra for some reason.

Here is a trace without the resize: trace.zip

On my machine, this results in the screenshot below. Notice that most 8x8 blocks have seemingly random noise, but the bottom row and a handful scattered throughout the rest of the image seem normal. The other image is what the output should look like (I use lower res textures so the trace file isn't 400MB...)

Image from Nvidia GPU:
image

Correct image (via swiftshader):
image

Also to clarify, are you using the proprietary Nvidia drivers?

I'm using the proprietary driver, yes, and I can see the garbled output. I just don't see the validatation errors.
It could be that the validation checks are broken in latest VVL, but the garbled output itself is concerning, obviously. It's just not clear if we are doing anything wrong.

@fintelia could you run the wgpu-rs examples on that machine and see if any validation errors show up?

I get these validations errors from the hello-triangle example, although the rendered output seems fine:

RDOC 036810: [11:17:45]          vk_core.cpp(3563) - Warning - [VUID-VkRenderPassBeginInfo-framebuffer-03210] Validation Error: [ VUID-VkRenderPassBeginInfo-framebuffer-03210 ] Object 0: handle = 0x1e300000001e3, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0xcc018073 | VkRenderPassBeginInfo: Image view #0 created from an image with usage set as 0x17, but image info #0 used to create the framebuffer had usage set as 0x10 The Vulkan spec states: If framebuffer was created with a VkFramebufferCreateInfo::flags value that included VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT, each element of the pAttachments member of a VkRenderPassAttachmentBeginInfo structure included in the pNext chain must be a VkImageView of an image created with a value of VkImageCreateInfo::usage equal to the usage member of the corresponding element of VkFramebufferAttachmentsCreateInfo::pAttachments used to create framebuffer (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-VkRenderPassBeginInfo-framebuffer-03210)
RDOC 036810: [11:17:45]          vk_core.cpp(3563) - Warning - [UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout] Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x7f01f8a80798, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | Submitted command buffer expects VkImage 0x1d700000001d7[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_GENERAL--instead, current layout is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.
RDOC 036810: [11:17:45]          vk_core.cpp(3563) - Warning - [VUID-VkRenderPassBeginInfo-framebuffer-03210] Validation Error: [ VUID-VkRenderPassBeginInfo-framebuffer-03210 ] Object 0: handle = 0x1e300000001e3, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0xcc018073 | VkRenderPassBeginInfo: Image view #0 created from an image with usage set as 0x17, but image info #0 used to create the framebuffer had usage set as 0x10 The Vulkan spec states: If framebuffer was created with a VkFramebufferCreateInfo::flags value that included VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT, each element of the pAttachments member of a VkRenderPassAttachmentBeginInfo structure included in the pNext chain must be a VkImageView of an image created with a value of VkImageCreateInfo::usage equal to the usage member of the corresponding element of VkFramebufferAttachmentsCreateInfo::pAttachments used to create framebuffer (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-VkRenderPassBeginInfo-framebuffer-03210)
RDOC 036810: [11:17:45]          vk_core.cpp(3563) - Warning - [UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout] Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x7f01f8a9a038, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | Submitted command buffer expects VkImage 0x1d700000001d7[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_GENERAL--instead, current layout is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.
RDOC 036810: [11:17:45]          vk_core.cpp(3563) - Warning - [VUID-VkRenderPassBeginInfo-framebuffer-03210] Validation Error: [ VUID-VkRenderPassBeginInfo-framebuffer-03210 ] Object 0: handle = 0x1e300000001e3, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0xcc018073 | VkRenderPassBeginInfo: Image view #0 created from an image with usage set as 0x17, but image info #0 used to create the framebuffer had usage set as 0x10 The Vulkan spec states: If framebuffer was created with a VkFramebufferCreateInfo::flags value that included VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT, each element of the pAttachments member of a VkRenderPassAttachmentBeginInfo structure included in the pNext chain must be a VkImageView of an image created with a value of VkImageCreateInfo::usage equal to the usage member of the corresponding element of VkFramebufferAttachmentsCreateInfo::pAttachments used to create framebuffer (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-VkRenderPassBeginInfo-framebuffer-03210)
RDOC 036810: [11:17:45]          vk_core.cpp(3563) - Warning - [UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout] Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x7f01f93a9618, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | Submitted command buffer expects VkImage 0x1d700000001d7[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_GENERAL--instead, current layout is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.

I should add that I'm getting these logs by capturing a frame in RenderDoc and seeing what validation issues it reports.

@fintelia could you please investigate a tiny bit further: check the Vulkan objects in that RenderDoc capture, and see if the descriptor values for image usage are indeed matching what validation reports. I'm not seeing us doing the wrong here in the code, so I'm a bit puzzled.

Perhaps, you could share the capture here (zipped) as well, so that I can look at it and see if anything is different.

Here's the RenderDoc capture: hello-triangle-capture.zip.

I'm not 100% confident, but as best I can tell the swapchain usages seem to be right.

Thank you! Here is what I see in the capture.
The begin_render_pass is called with an image view created from theswapchain with:

imageUsage VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT

The framebuffer there was created with:

usage VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT

So everything is good.

I think I know what's the real issue here. It's the way we create the framebuffer:

        info.attachment_count = renderpass.attachment_count as u32;
        info.p_attachments = ptr::null();

We leave pAttachments as NULL because we don't have them - it's the KHR_imageless_framebuffer in action. But the VVL still require us to assign the attachment count.

pAttachments is a pointer to an array of VkImageView handles, each of which will be used as the corresponding attachment in a render pass instance. If flags includes VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT, this parameter is ignored.

So what's happening here is VVL erroneously accessing pAttachments without proper checks for the extension. So it tries to read VkImageView and the associated usage info... of course it gets garbage, and it complains about garbage.

TL;DR: it's a vulkan validation layers bug that an error is reported. Still don't know what causes the rendering issue, but it could be NVidia's bug :/

Any idea on the "Submitted command buffer expects VkImage [...] to be in layout VK_IMAGE_LAYOUT_GENERAL--instead, current layout is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL." validation error? Is that also caused by those same VVL bugs?

Good question! I'm not sure about that one. It may be a legitimate issue on our side, but I'm not seeing this error in either the API trace (replayed), RenderDoc, or running terra. If we find a way to capture it, I'll be happy to investigate!

Wait, you don't see it in the hello-triangle RenderDoc capture above? When I open that file, I see two issues listed in the "Errors and Warnings" window:

image

Nope, I only see one, the second one in your list. Are you on "1.2.162.0" as well?

I'm using the version in the ubuntu repositories "1.9+dfsg-2"

Sorry, I was asking about the vulkan validation layers version, not RenderDoc.
My RenderDoc is "1.12" fwiw, which is a bit newer.

Oh, I also have the version from the ubuntu repositories for the vulkan validation layers: 1.2.141.0-1

Now, let's figure out what to do with the rendering issue. I wonder why the examples work fine. Are all of them good?
There must be a difference in something you are doing in terra.

This was caused by a bug in Terra where it was binding the output framebuffer attachment with store disabled. The fact that this didn't trigger any errors on AMD might however point to missing validation in wgpu

Great that you figured this out!
The "store: false" should work as a clear in WebGPU. We just haven't implemented this yet. It's definitely on the radar, will come right after https://github.com/gfx-rs/wgpu/pull/1159 (cc @Wumpf ).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kvark picture kvark  路  15Comments

simast picture simast  路  15Comments

lordnoriyuki picture lordnoriyuki  路  17Comments

zicklag picture zicklag  路  84Comments

unrelentingtech picture unrelentingtech  路  14Comments