Dxvk: Squad freezes Nvidia driver: "NVRM: Xid 31 (...) ACCESS_TYPE_VIRT_WRITE"

Created on 20 Aug 2019  路  26Comments  路  Source: doitsujin/dxvk

Squad causes random but frequent freezes where the Nvidia kernel driver spews one specific error message, and Squad's (i.e., UE4's) GPU thread becomes unresponsive until the game's own watchdog kills the process 2-3 minutes later. During this time the WM is practically unusable, suffering from a lag of 1+ minutes per keystroke.

The driver error is exactly the same line every time, except of course for the pid:

NVRM: Xid (PCI:0000:2d:00): 31, pid=970, Ch 00000023, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_RAST faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_WRITE

If I set the game's internal frame limit slider to 60-ish to match my monitor's refresh rate the freezes become rare enough that I may be able to play a few hours without one. But if I set the limit to the max (which is 240 i.e. practically unlimited in my case) it's guaranteed to freeze within 10 minutes or so.

I've found at least two situations where the freeze is significantly more likely to occur than otherwise. One is when the game is changing maps, seems to be the loading of a new map that often causes it. The other situation is more interesting because it's brought on by user action: simply clicking on the role loadout tab in the deployment menu is practically guaranteed to freeze the game then and there.

The freeze will occasionally happen with no obvious trigger, too.

I'll try to upload logs here as soon as I have a moment.

Software information

  • Squad a-15.4.5.19500-4.21.2-SHIPPING.
  • Steam Proton. Tested with all recent upstream releases as well as own builds with updated Wine versions.
  • Debian unstable + i3, no compositor in use.
  • Linux 5.2.9-ck1 with fsync patches.

System information

  • GPU: Nvidia RTX 2080
  • Driver: 418.52.20. Tested several recent Vulkan beta drivers in the 418.52.x series.
  • Wine version: Tested with Protonified builds of 4.2, 4.11, 4.12.1, and 4.13.
  • DXVK version: Everything from at least v1.3 up to master exhibit the behavior. Older versions not tested.

Apitrace file(s)

  • N/A

Log files

  • Will upload these fairly soon.
nvidia

Most helpful comment

There is a bug in our Vulkan driver where, when video memory is full, certain internal allocations can fail in video memory without a fallback in system memory, which at least triggers one kind of error and could conceivably be the cause of the error seen by @imaami. Fix is in progress, I will update this bug when a release carrying the fix is available.

All 26 comments

I don't have the game, so I'll need an apitrace to debug this.

Does disabling "Allow Flipping" in nvidia-settings fix your issue?

I don't have the game

What's your Steam ID?

so I'll need an apitrace to debug this.

I'll see what I can do.

Does disabling "Allow Flipping" in nvidia-settings fix your issue?

No, I have all the known tricks such as flipping settings applied and tested through countless times at this point.

To be clear, unless it's obvious that DXVK does something invalid, I cannot really debug GPU hangs on Nvidia drivers at all, so there's probably not much I could do even if I had the game.

Can you run the game with VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation and post the full console output? This requires you to have a recent version of the Vulkan SDK/validation layers installed.

To be clear, unless it's obvious that DXVK does something invalid, I cannot really debug GPU hangs on Nvidia drivers at all, so there's probably not much I could do _even if_ I had the game.

Understood.

Can you run the game with VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation and post the full console output? This requires you to have a recent version of the Vulkan SDK/validation layers installed.

Alright, I'll do this probably tomorrow, I hope.

@imaami Assuming that you have an IOMMU, does it work if you disable your IOMMU?

@imaami
I bought this game a couple of days ago, but have not been able to get this to crash. I have not played for "hours on end", but mostly various practice fields and a few maps online.

I did however notice that DXVK had allocated just > 8GB at times, but mostly hovers around 7-7.2GB when everything setting is "Epic +". Not sure if the game could have some sort of memory leak or whatnot, cos it tends to start off around 5GB and grow. I did not crash tho.

Any particular setting you use when this happens? My RTX2070 card is not able to push more than around 50-55fps at those settings (and 8GB allocated/6-7GB used).

I think it depends a lot on the map? Huge maps at high settings = vram spending like crazy. I was watching DXVK hud when i crashed, and it had 9.6GB allocated, and 8.8GB vram used. nvidia-smi said the game used just above 7GB (total 7902/7949).
I think it would be somewhat expected to crash in a situation like that? (It happened just as i the game switched map)

Caused no XID error, so i guess it was just a vram allocation crash (and rightfully so).

Setting the option "Fully load textures" to OFF (recommended) does help a bit, and disabling supersampling and MSAA. I started out around <5GB vram, but still ended up >6GB after just a few minutes playing, so i guess there is no way around using lower settings in that game to be within reasonable vram levels. Especially since it seems the usage just goes up throughout the map.

@SveSop, video memory being full is not an abnormal situation. This shouldn't be producing crashes. Video memory being is however definitely a factor in reproducing this readily.

imaami captured an Apitrace which with I have managed to reproduced at least one bug consistently. This is not the bug being reported, but I have reason to believe they are related.

imaami captured an Apitrace which with I have managed to reproduced at least one bug consistently. This is not the bug being reported, but I have reason to believe they are related.

Proof that I am, in fact, not as lazy at it seems. ;) Just bad at updating tickets.

Try disabling IOMMU support in your kernel (or BIOS in case it has such a setting)... It may be on by default. When it's causing problems, it's likely during concurrent IO across a variety of devices in your system. @ryao already suggested to try disabling it.

I do not seem to be using IOMMU. (Intel vt-d disabled in the bios)

dmesg|grep -i iommu
[    4.987552] vboxpci: IOMMU not found (not registered)

sudo find /sys |grep dmar does not find anything either.

Try disabling IOMMU support in your kernel (or BIOS in case it has such a setting)... It may be on by default. When it's causing problems, it's likely during concurrent IO across a variety of devices in your system. @ryao already suggested to try disabling it.

I did try disabling IOMMU, no effect.

There is a bug in our Vulkan driver where, when video memory is full, certain internal allocations can fail in video memory without a fallback in system memory, which at least triggers one kind of error and could conceivably be the cause of the error seen by @imaami. Fix is in progress, I will update this bug when a release carrying the fix is available.

@ahuillet Would that be fixed by 435.19.03?

    Fixes:
        Fixed a bug which caused corruption in the following DXVK titles:
            Saints Row IV
            Saints Row: The Third
        Fall back to system memory when video memory is full for some driver-internal allocations.
            This can help fix Xid 13 and Xid 31 cases when video memory is full.

https://developer.nvidia.com/vulkan-driver

@pchome @imaami @foresto @buscher @MinIsMin All of you seem to have hit an issue that looks like this one. You might want to try out the new driver to see if it improves things for you.

I see the fix was noticed before I could announce it here. Please test this new driver release to confirm it fixes the problem.

Unfortunately the system I used doesn't work anymore due to a hardware error, so I can't test it. Sorry.

I just had the most awesome Squad match in a long time. Ended 1-0. No freezes or crashes during these past 2-3 hours with the new driver.

This isn't confirmation yet because the bug has been probabilistic from the beginning. I'd say if I don't get freezes in the next 3 days from now it's likely that it's fixed.

I tested too using the tutorial. There were no discernible issues.

I have spent 4 hours in total testing Squad with the new driver. I cannot reproduce the issue with it on my GTX 1070. This seems to be fixed. :)

Thanks for testing, I'll consider this resolved then.

@imaami Feel free to reopen if the issue persists on your end.

Thank you imaami for providing the apitrace that let us diagnose the problem!

I have this issue with 455.28 version in Borderlands 3. Before nvidia driver update i haven't this hangs so perhaps it's went back? Can someone confirm?

(Sorry for posting to a closed issue that's most likely unrelated to DXVK but :shrug:)

I have this issue with 455.28 version in Borderlands 3. Before nvidia driver update i haven't this hangs so perhaps it's went back? Can someone confirm?

Two other people have recently reported crashes with a Xid 31 error in Squad (https://github.com/ValveSoftware/Proton/issues/938#issuecomment-703769378, https://github.com/ValveSoftware/Proton/issues/938#issuecomment-706672103). That would make yours the third one.

@ahuillet might want to keep an eye out for a regression in 455.y memory management code. Very similar symptoms, it seems.

Confirming on 455.28 with multiple games (both dx9 and dx11).
This may be somehow related to gnome-shell - unchecking "Allow the WM to control the windows" in winecfg seems to remove the hang / Xid 31 for me.

It turn out that my problem go away with new proton-experimental and i guess it's was something window management related. And i am experienced it only on gnome-shell/muter, other WM's without compositing was fine.

Was this page helpful?
0 / 5 - 0 ratings