Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
bug
Description
ROOTLESS Podman fails to delete BTRFS subvolumes when building an image or deleting a container. This causes a cascade of errors, such as container name re-use errors, as podman believes the container was removed when using podman ps -a however when attempting to re-run the podman run command the user will receive an name re-use error message.
Using SUDO this works just fine.
I would very much assume that this is a configuration issue on my part somewhere, as without privilege elevation using sudo I cannot delete the specified BTRFS by hand using btrfs su delete <path to subvolume> either.
Thanks in advance for your help.
Steps to reproduce the issue:
Build a rootless image using the BTRFS driver.
Get error message listed below.
Describe the results you received:
ERRO[4128] error deleting build container "de8886e87ab7c7e667426e84a096695f4d434fe8ed42149fb157e7b9a398b906": Failed to destroy btrfs snapshot /home/sbrady/.local/share/containers/storage/btrfs/subvolumes for 561da8272542ab2a71977655a51b5d20c20627fd3a917165d1fe89b0370f4f93: operation not permitted
Error: Failed to destroy btrfs snapshot /home/sbrady/.local/share/containers/storage/btrfs/subvolumes for 561da8272542ab2a71977655a51b5d20c20627fd3a917165d1fe89b0370f4f93: operation not permitted
Describe the results you expected:
BTRFS subvolumes to be deleted on container deletion.
Additional information you deem important (e.g. issue happens only occasionally):
Consistent regardless of the image.
Output of podman version:
podman version 1.5.1
Output of podman info --debug:
debug:
compiler: gc
git commit: ""
go version: go1.12.9
podman version: 1.5.1
host:
BuildahVersion: 1.10.1
Conmon:
package: conmon-2.0.0-2.1.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.0, commit: unknown'
Distribution:
distribution: '"opensuse-tumbleweed"'
version: "20190904"
MemFree: 1173884928
MemTotal: 8254943232
OCIRuntime:
package: runc-1.0.0~rc8-1.4.x86_64
path: /usr/bin/runc
version: |-
runc version 1.0.0-rc8
spec: 1.0.1-dev
SwapFree: 2141966336
SwapTotal: 2147483648
arch: amd64
cpus: 4
eventlogger: file
hostname: rocinante
kernel: 5.2.11-1-default
os: linux
rootless: true
uptime: 2h 31m 17.3s (Approximately 0.08 days)
registries:
blocked: null
insecure: null
search:
- docker.io
store:
ConfigFile: /home/sbrady/.config/containers/storage.conf
ContainerStore:
number: 18
GraphDriverName: btrfs
GraphOptions: null
GraphRoot: /home/sbrady/.local/share/containers/storage
GraphStatus:
Build Version: 'Btrfs v5.2.1 '
Library Version: "102"
ImageStore:
number: 16
RunRoot: /var/run/user/1000/containers
VolumePath: /home/sbrady/.local/share/containers/storage/volumes
Package info (e.g. output of rpm -q podman or apt list podman):
podman-1.5.1-1.1.x86_64
Additional environment details (AWS, VirtualBox, physical, etc.):
Bare metal install on Intel i7 and spinning rust HDD.
Pastes of Storage.conf and libpod.conf
โ la ~/.config/containers
total 40K
-rw-r--r-- 1 sbrady users 4.4K Sep 6 14:23 libpod.conf
-rw-r--r-- 1 sbrady users 205 Aug 9 13:30 mounts.conf
drwxr-xr-x 1 sbrady users 14 Aug 9 13:30 oci/
-rw-r--r-- 1 sbrady users 256 Aug 9 13:30 policy.json
-rw-r--r-- 1 sbrady users 1.1K Aug 9 13:30 registries.conf
drwxr-xr-x 1 sbrady users 0 Aug 9 13:30 registries.d/
-rw-r--r-- 1 sbrady users 12K Aug 9 13:30 seccomp.json
-rw-r--r-- 1 sbrady users 5.0K Sep 6 13:21 storage.conf
/home/sbrady/.local/share/containers
โโโ cache
โย ย โโโ blob-info-cache-v1.boltdb
โโโ storage
โโโ btrfs
โย ย โโโ subvolumes
โโโ btrfs-containers
โย ย โโโ 1e96b0289a1d5ac651f5c521d325dd2911f73b2a6dffbcbfb2e982310ffbaf05
โย ย โโโ 30e2a4ea384aa36e4a9d5313a89a47efca248f89a6e134f71f0aa952b536b51c
โย ย โโโ 30f3976b8ab272bd229a770b0f0e9807ad8b00798178a6732909da3899308935
โย ย โโโ 3886c1a160034a4c7cae0c59b1a3ee93e0371b837bae4072336b2df16cd2a4cc
โย ย โโโ 3dd0a9edede5dd4fa3a5333665fcb69f45235e5456bb781863f39e41fe0047b7
โย ย โโโ 4f8e7486fbc706acaa2675f3cb8afa32a5e2b742810fea560c740ff7ddadfb86
โย ย โโโ 6e0b92b86a79fb1e8cc6d6c68f7c8e82d9eb3ef96b758c5d3b0a850d5ecdd30a
โย ย โโโ 7051dd05a2a3c0712b92a3d5277b17beedb221c29c7859f7279e778300cc0239
โย ย โโโ 812bfe158ec304e077e6d2e05eb7d3f9f01631e3bc7c6fc49c01d205117bba8f
โย ย โโโ 9e8c3f9cb6b6346f808a89e417d3b365f737cd849ae4e8c41b388f6740640be3
โย ย โโโ b10003959017ff33e909df830fb7acbd87c081bfcf53f997aba6c7afe5040ded
โย ย โโโ c20462222981918a27e00267ba8ba0118c1064b9d2728b529c1ff0c9d75cb238
โย ย โโโ c8bf5970068ed878adadacc10a8e04fd3471f16fa5f95750f501bc9c50cf596b
โย ย โโโ ca09086e22391e95198dd7aa5abc34f168cd64132488e74042c3aaa7860f162c
โย ย โโโ containers.json
โย ย โโโ containers.lock
โย ย โโโ d2878a44fb5ceed65a97f2a3c98f50cc7ec1f4ba02b672d8708278f9c9d1c2b9
โย ย โโโ de8886e87ab7c7e667426e84a096695f4d434fe8ed42149fb157e7b9a398b906
โย ย โโโ e0780ebf9f35bb28407986c191257acdb529d9aa198ca2e5132f06390ac3bf0c
โย ย โโโ e7315b2231cfaefb7991b403ffb18d60d26e25eb1950a090155b3c3be776ea19
โย ย โโโ fea6040e4ffbb6fcf85570aee5475fb2a03853a540b8aa89be2d945c6af64290
โโโ btrfs-images
โย ย โโโ 0868e92e943cba2ce2ed3b5705d9dcd4adeec9da9088ab69b3c44af199072b3c
โย ย โโโ 0ed5811d6d9c68658a20eb354b1917bbf5af162c773eb4edf6168fae00ff09f6
โย ย โโโ 172b73ede26844c52a903bb4e905f18ecbdc1227a6e32b86eb66b60ce999224f
โย ย โโโ 18ffcb379eccf2d03f71066d45bf3d9c6078c8dc2de843eb41361f22aa8e8430
โย ย โโโ 23b52ed766eb03c4151be9e41a0eb2fdce003c910982665b7b92a468bca1c3b3
โย ย โโโ 337ab92e6b8b755823c3363b11948f13441bcb2811988617793642dd1c5c0ac9
โย ย โโโ 5e09dd17175ecdff0b478333a6d5f444ea2314e5f9114a5443d7fc9fec86b834
โย ย โโโ 69c54a0cfa733e6fdb478b5612127feec51153ce066cff4a32f1fdce84bb8af6
โย ย โโโ 6d038d18f5017765cfcbb2262ae3933429e5be0c64f5d70130de8c788791e1d7
โย ย โโโ a8c54eebc7056cd3dfccee64c28c12d7653fbe2b1fe61159f2c08fcfb15110b0
โย ย โโโ b14710b9d573f363bbbad56f0ff69e79b5f229b83daaecaf25d9856a84308df7
โย ย โโโ b151cdb91db489ee8ab7ba84839dd420164692b3031020c8ded00436421715b6
โย ย โโโ cac4ae0c405ea55bed5402512d66abadac6ea01c31e650ffd86af31560ad98bc
โย ย โโโ ced8a8fe165881fdef10a647838751d98e0a3d7aba06316f44bdf28f86e23d25
โย ย โโโ e6da4025fb017e4e79e7339c5953f4c1aa247c11fe7862920d53ef8879341cfa
โย ย โโโ fce289e99eb9bca977dae136fbe2a82b6b7d4c372474c9235adc1741675f587e
โย ย โโโ images.json
โย ย โโโ images.lock
โโโ btrfs-layers
โย ย โโโ 06b85792928655f9c05298c2104b509396a9ae72fe22f739bb004a2ff87f99d0.tar-split.gz
โย ย โโโ 30fa407a2912badb23e133597472bec5d233e439a497d96751f5dcfb8617894e.tar-split.gz
โย ย โโโ 3f194c10a561e3b694e9602b088a4e516eb19ee468992bb5cc6845c717b06b49.tar-split.gz
โย ย โโโ 43316bbb040759a859d32501f20dd865db10b6ee62659fd09aead1a920097982.tar-split.gz
โย ย โโโ 46e841fc16afde233e84ec806cb528e3d0ae0c8ead03f074aa7699f67b8f1b4c.tar-split.gz
โย ย โโโ 51899d997ab4c1f790759deeeaaebb03d3742c34236e5a85499eb302cea6fb7b.tar-split.gz
โย ย โโโ 8297fa3a5e5f1eb097422189d1f7d9046dfcc658008fe395d8b27e93a5954691.tar-split.gz
โย ย โโโ 866d3ed87bbbc6beb540cc52cb79f8efaec0f3a8b7cf554f65c5df8fce524ec0.tar-split.gz
โย ย โโโ 9dae2a8870ddfae25701156bc993448175a4cf5bbc0f537e8563e286a5a37385.tar-split.gz
โย ย โโโ a17440e364015841337f4b80d2a42cf79936cba443ab941525785a7e35f0c0c6.tar-split.gz
โย ย โโโ a2f24cc38b696cc363f1270684da6d5ae40c5b086f6f6978462e7c7f551e4341.tar-split.gz
โย ย โโโ af0b15c8625bb1938f1d7b17081031f649fd14e6b233688eea3c5483994a66a3.tar-split.gz
โย ย โโโ b6b761c5afcb8b69e34a26df1ce16be5d16b3088c838060dcc2528cc3d4cdd5f.tar-split.gz
โย ย โโโ d548e5ae588bc66a2250ed8312145394cb9704ecdf50153fe608b93ec3c15a0f.tar-split.gz
โย ย โโโ layers.json
โย ย โโโ layers.lock
โโโ cache
โย ย โโโ blob-info-cache-v1.boltdb
โโโ libpod
โย ย โโโ bolt_state.db
โโโ mounts
โโโ storage.lock
โโโ tmp
โโโ volumes
โโโ brave-storage-volume
โโโ test
Sounds like c/storage cleanup - @nalind Agree?
Actually, wait - how are we using the btrfs driver with rootless?
@giuseppe PTAL
It's a bit confusing at the moment. For historical reasons, btrfs subvolume create is permitted unprivileged, but btrfs subvolume delete requires root privilege unless the file system was mounted with option user_subvol_rm_allowed.
However, recently, rmdir(2) will delete a btrfs subvolume, without privilege escalation required, so long as the user otherwise has privilege for the subvolume and contents. I haven't benchmarked a fully populated set of subvolumes to compare btrfs subvolume delete vs rm -rf - there could be a difference if the latter causes recursive rm of files/dirs and then rmdir on the subvolume, whereas btrfs sub del calls BTRFS_IOC_SNAP_DESTROY ioctl which generally exits quickly, and cleanup happens in the background.
Ok, I don't truly understand what is going on here. But are we doing something wrong in the storage driver that we can fix to make rootless podman with btrfs work? Or should we simply block rootless podman with BTRFS because it will never work correctly and force users to use fuse-overlay?
If kernel 4.18+ use rm -rf
That will work as if the subvolume is a directory. Just like on any other file system, it will check every file and directory for the proper permissions, and unlinkat() each one in turn.
Optimization 1: If root, use btrfs subvolume delete.
Optimization 2: If rootless, check if the btrfs volume is mounted with option user_subvol_rm_allowed and if so then use btrfs subvolume delete
btrfs subvolume delete avoids all the traversal and recursive unlinkat(), so it's way faster.
For kernels 4.17 and older, then you could check for user_subvol_rm_allowed mount option and then permit rootless podman, otherwise disallow it.
We've never actually tested the btrfs driver on rootless before, and it was never written with rootless support in mind. However, it seems like everything except cleanup is already working. Given that we seem to have a way forward there (at least on newer kernels), this sounds like a reasonable fix.
FWIW, user_subvol_rm_allowed exists since 2010, circa kernel 2.6.38. When enabled, there is only a check of subvolume owner/perm, not contents. If the user has the proper privilege for just the subvolume, btrfs subvolume delete will exit 0, everything inside instantly goes bye bye.
I'm not sure what use case would exist where a user owns the subvolume but doesn't own the contents, but...
I've never tried the btrfs driver and I am surprised it works with rootless.
@cmurf would it be enough if we attempt to rm -rf if the volume deletion fails with EPERM?
Would you like to work on a patch for that?
@giuseppe actually it's a good idea to try the optimized case first, since it's way faster, and then use rm -rf as the fallback. It'll take me longer to find the file to work on than it'll take an actual competent person to just patch it.
For users experiencing this issue, what storage driver should those running Podman rootless use before BTRFS is supported?
We'd recommend fuse-overlayfs
I'm good with closing this is as "unsupported" if you'd like to track this as a pending feature elsewhere, but I'll leave that up to the dev team. Thanks for your quick responses.
Apologies if I'm stating the obvious, but as Podman is creating the BTRFS subvolume it could easily tag the subvolume as USER_SUBVOL_RM_ALLOWED on creation, right? I'm not a Go programmer but I an read it, and it appears that support for this option could be added consistently with the other BTRFS options used on subvolume creation in func parseoptions in c/storage/drivers/btrfs/btrfs.go. This would allow the non-root user to delete BTRFS subbvolumes on supported kernels. This should cover most cases, and a verbose error message may cover the edge cases with a simple addition to code.
I think that's definitely possible. The biggest issue right now is probably finding someone to work on it - most people are on fuse-overlayfs, so fixing the btrfs driver is lower priority on the storage side.
user_subvol_rm_allowed is a Btrfs mount option, not a subvolume property. It's also not a per mount point option - once you use it, it applies to all other mounts for this file system. e.g.
[chris@fmac ~]$ mount | grep btrfs
/dev/sda6 on / type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,subvolid=496,subvol=/root)
/dev/sda6 on /boot type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,subvolid=581,subvol=/boot)
/dev/sda6 on /home type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,subvolid=468,subvol=/home)
[chris@fmac ~]$ sudo mount -o remount,user_subvol_rm_allowed /home
[chris@fmac ~]$ mount | grep btrfs
/dev/sda6 on / type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,user_subvol_rm_allowed,subvolid=496,subvol=/root)
/dev/sda6 on /boot type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,user_subvol_rm_allowed,subvolid=581,subvol=/boot)
/dev/sda6 on /home type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,user_subvol_rm_allowed,subvolid=468,subvol=/home)
[chris@fmac ~]$
With this mount option set, the user must have privilege for just the subvolume they want to delete, contents aren't checked for privileges. That lack of checking content ownership is why user_subvol_rm_allowed isn't the default behavior, but you can certainly add it to your fstab and make it a persistent mount option.
@cmurf Yep, makes total sense now... I just found all this out for troubleshooting. I can confirm Podman works as expect with BTRFS as a storage driver when running in rootless mode.
For troubleshooters, add the user_subvol_rm_allowed flag to /etc/fstab if you'd like to use podman rootless with a BTRFS file system. Be aware of potential security implications to this option, and do your homework before enabling.
The root line of my /etc/fstab looks like this:
UID=466f96bf-0e8f-44cb-a42b-c46cedca5803 / btrfs defaults,user_subvol_rm_allowed 0 0
@mheon Something to consider though, the overlay option is not supported for people running BTRFS, so in effect people with BTRFS filesystems as their root filesystem effectively cannot run Podman rootless without issue deleting containers. Just an FYI on that.
I was wrong here and had a different configuration issue. overlay is the default and preferred option here.
I think the second part needs an issue of its own. I know btrfs and overlayfs can co-exist in production environments with tens of thousands of snapshots. I'm not sure what the nature of this lack of support could be about. There are nuances that can be workload specific where one works better than the other, and even where overlayfs copy-up operation can be made more efficient using cloning (Btrfs has had reflinks since forever, and XFS enables them in the most recent xfsprogs at mkfs time).
Hmm, on Fedora 31, by default it appears podman is using fuse-overlayfs on Btrfs.
store:
ConfigFile: /home/chris/.config/containers/storage.conf
ContainerStore:
number: 2
GraphDriverName: overlay
GraphOptions:
- overlay.mount_program=/usr/bin/fuse-overlayfs
GraphRoot: /home/chris/.local/share/containers/storage
GraphStatus:
Backing Filesystem: btrfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
@SwitchedToGitlab We would welcome a PR on container/storage to better support BTRFS for rootless containers.
Or at least add some of this information on how to setup btrfs for use in rootless containers in some of the troublsheooting rootless.md files on github.
Hello.
I noticed this issue on btrfs ML and if nobody is working on, I'm willing to help (though I'm not familiar with podman and may take some time).
That would be great, Changes might be required in github.com/containers/storage though since most of the BTRFS code is in there.
Thanks, let me try.
Honestly using overlay is greatly preferred. Not sure if this is something that really needs a fix in code. It works great with BTRFS once you enable user_subvol_rm_allowed on the root filesystem.I'll see if I can get some time to update the rootless tutorial on using overlay as the storage driver, ensuring fuse-overlayfs is installed, and the advantages of doing so. I'm also planning on adding that for BTRFS support to work you need to set user_subvol_rm_allowed flag in /etc/fstab.
That said, @t-msn I think we run into trouble at or about line 308 in containers/storage. The elegant solution IMHO would be to trap for errors on this unix.Syscall then re-try deleting the subvolume using the Golang OS standard library if it fails. It's also my understanding that you will need to walk the subvolume, deleting any subvolumes as you go (if that makes sense). Not a Go programmer, but that's my understanding.
It works great with BTRFS once you enable user_subvol_rm_allowed on the root filesystem.
Right. The needed fix is when that mount option is not set, the subvolume remove fails with an error, in which case the fallback should be to 'rm -rf' the subvolume. It's slower than subvolume delete, but at least it won't fail, unless the user doesn't actually own what they're deleting.
Are either of you guys at All systems go this weekend?
Honestly using overlay is greatly preferred. Not sure if this is something that really needs a fix in code.
well, I think the fix is trivial and won't hurt anyone (please see below).
It's also my understanding that you will need to walk the subvolume, deleting any subvolumes as you go
We cannot call IOC_SNAP_DESTROY ioctl if it contains other subolumes. This is the reason subovlDelte() performs path walk to remove subvolume bottom-up, but it is not necessary for "rm -r"
I check the code and notice that system.EnsureRemoveAll() is called after subvolDelete() and it uses os.RemoveAll() to ensure the target path is removed.
Therefore the simplest solution would be just ignoring the error of subvolDelete() and fallback to system.EnsureRemoveAll(): https://github.com/t-msn/storage/commit/41c2a90841cfc42f1373dac0de240773b405f536
I followed the tutorial (https://github.com/containers/libpod/blob/master/docs/tutorials/rootless_tutorial.md and https://github.com/containers/libpod/blob/master/docs/tutorials/podman_tutorial.md) and with above fix I can do "podman rm".
(BTW, subvol quota operation needs privilege.)
This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.
@t-msn - Would you be willing to submit your change (https://github.com/t-msn/storage/commit/41c2a90841cfc42f1373dac0de240773b405f536) as a Pull Request?