Zfs: Some userland software (e.g. Steam) cannot handle >2TB of free space

Created on 19 Sep 2018  ·  16Comments  ·  Source: openzfs/zfs

Another Gentoo developer reported an issue to me where the Steam client will mess up if the volume contains more than 2TB of free space. We very quickly determined that it was a bug in the Steam client and found out that this is a known issue:

https://github.com/ValveSoftware/steam-for-linux/issues/4982

People are currently working around this by setting quotas. In the issue, I suggested that a LD_PRELOAD shim could workaround it. However, it occurs to me that we could add a specific workaround to ZFS to truncate free space (and maybe also used space) to make software that does such stupid things happy. I am not sure how I feel about that. I'd be perfectly happy to say "use quotas" and let that be the end of it, but I thought it merited some minor discussion.

Does anyone think we should add a workaround for compatibility with broken software? I am leaning toward just telling users to use quotas. A fair number of users appear to be affected, so maybe we could document this in a FAQ entry.

good first issue

Most helpful comment

@ryao

I'd be perfectly happy to say "use quotas" and let that be the end of it.

:+1:

All 16 comments

@ryao

I'd be perfectly happy to say "use quotas" and let that be the end of it.

:+1:

If the steam client is built as a 32-bit binary this _might_ be caused by issue #7122. Setting a quota sounds like a clever immediate work around but let's understand the root cause before considering adding any workarounds.

@tchebb published a LD_PRELOAD workaround for this in ValveSoftware/steam-for-linux#3226, but I don't see how it addresses the issue (because it seems like it should make things worse), so I made some changes to it and reposted it:

https://github.com/ValveSoftware/steam-for-linux/issues/3226#issuecomment-422869718

Unfortunately, I am not in a position to test this right now.

Looking at my desktop PC, it looks like steam is using a mix of 32 and 64bit components.

Should a quota does the trick as a workaround and it be limited to that one product:
I see no point in adding special cases to work around bugs in commercial userland software.

Should it be more widespread (as the linked issue suggests) and affect all 32bit builds... different thing.
Question though if this indeed is a 32bit issue: what else breaks when using 32bit calls?

what else breaks when using 32bit calls

The only thing I'm aware of is this statfs() issue. We added compatibility code long ago to handle the case of a 32-bit user space and 64-bit kernel. That said, it's not a configuration which gets a ton of testing or is represented by any of the bots.

@GregorKopka My doctor told me to use VR to get more exercise. If I hit this issue while redoing my VR setup to eliminate Windows and other methods are insufficient to handle it, I suspect that I will feel very differently about implementing a workaround. ;)

That said, I suspect that the LD_PRELOAD hack probably would be sufficient, but I won't know for a while. I am rebuilding my VR system gradually in my spare time.

@ryao You should have a free path to you exercise as clamping down _free space_ to <2T using a quota seems to do the trick.

Just wanted to discourage walking the path that GPU drivers under windows took (as these days they seem to mainly consist out of big pile of game specific driver-side optimisations/fixes - else their size can't really be explained, unless they started to tack 4K fullscreen videos to the back of their settings dialogs), as such a bag would only get bigger over time (as others would likely start to rely on it).

@GregorKopka I suspect that could be enough, but I felt like we should track this until it is completely understood because someone reported it to me as a ZFS issue. If one person could misidentify its others could. I also want to see how things go when I have a few terabytes worth of things installed after I have finished rebuilding my VR system (my previous development system, not my current one) to run this on Linux.

I don’t want to add workarounds to the codebase to fix broken userland software unnecessarily because we would end up supporting them forever. There needs to be a strong demonstrated need for additional workarounds beyond quota support before they are considered for inclusion.

Also, it would be really lousy to add a workaround and then have Valve fix the issue immediately afterward, eliminating the need. I suspect that I will likely try to go through a few indirect back channels to Valve to try to get them to fix the issue at some point (next week?). I do not have any direct contact with them, but there are a couple of cases “I know a guy who knows a guy...”.

As for graphics drivers, I think I can explain their size:

  1. They support many generations of hardware.
  2. They have built in compilers.
  3. They are basically mini operating system kernels in how they manage the graphics hardware. (e.g. thread scheduling, memory management of physical memory, hardware initialization, providing services via API calls, etcetera)
  4. They are expected to support numerous different things (e.g. 2D, frame buffer, Vulkan, OpenGL, OpenCL) with the specifications being something like 1000 pages long.

If you look at the i915 driver alone without considering userspace, it is well over 1 million lines and comfortably 10x ZFS in terms of size. Workarounds for games tend to be in userspace as far as I know because that is where the compiler lives.

@ryao @GregorKopka based on this https://github.com/ValveSoftware/steam-for-linux/issues/4982#issuecomment-392240990 in the Steam tracker, I'm reasonably certain this is caused by the issue I referenced above, #7122. Mixing 32-bit user space binaries (Steam) with a 64-bit kernel isn't the most heavily tested configuration. I've opened #7937 with a proposed fix for review.

@behlendorf That is a good observation, but the only way that EOVERFLOW could occur is if the person who had it occur had set a low recordsize or a truly prodigious amount of free space because the default 128KB recordsize would require at least 512GB of free space to trigger it. Another Gentoo developer who reproduced this did it with only 3TB of free space and a 128KB recordsize. There are other comments where people report the issue occurring without EOVERFLOW.

I believe that the EOVERFLOW is a secondary issue and that the root cause is that userland is scaling the free block count reported by statvfs64 to a 512-byte fragment/block size. That means that we cannot address this by scaling it up to a larger fragment/block size. We would instead need to restrict the reported free blocks to fit the 2TB constraint. :/

I think I found the cause of the EOVERFLOW issue. It is this:

https://patchwork.kernel.org/patch/9987759/

That explains why the reports were inconsistent with the kernel code when I looked. I looked at versions that had that patch.

Commit e897a23 has been merged as a partial fix for this issue and queued up for the next point release. There may still be outstanding kernel or application bugs, but from the filesystem perspective this should be resolved.

@behlendorf That patch will have no effect on people affected by this because EOVERFLOW is not the root cause. Steam seems to be doing the exact opposite of that patch to scale things down to a 512byte block size and if that overflows, Steam will catch the overflow and claim that there is no space. Not everyone affected by this encountered EOVERFLOW and if they did, that mainline patch that I linked would probably address it, but this issue will occur in either case.

@ryao So is this something that ZFS can create a workaround for that will work for everyone affected by the issue? It is really sad to see that Valve have ignored the issue for so long that now other projects are going to have to implement patches and even that might not be enough...

Steam has possibly fixed it in one of their latest releases and might appreciate some testing love:
ValveSoftware/steam-for-linux#4982

Was this page helpful?
0 / 5 - 0 ratings