Podman: Podman failed to destroy BTRFS snapshot on container delete

Created on 7 Sep 2019  ยท  29Comments  ยท  Source: containers/podman

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

bug

Description

ROOTLESS Podman fails to delete BTRFS subvolumes when building an image or deleting a container. This causes a cascade of errors, such as container name re-use errors, as podman believes the container was removed when using podman ps -a however when attempting to re-run the podman run command the user will receive an name re-use error message.

Using SUDO this works just fine.

I would very much assume that this is a configuration issue on my part somewhere, as without privilege elevation using sudo I cannot delete the specified BTRFS by hand using btrfs su delete <path to subvolume> either.

Thanks in advance for your help.

Steps to reproduce the issue:

  1. Build a rootless image using the BTRFS driver.

  2. Get error message listed below.

Describe the results you received:

ERRO[4128] error deleting build container "de8886e87ab7c7e667426e84a096695f4d434fe8ed42149fb157e7b9a398b906": Failed to destroy btrfs snapshot /home/sbrady/.local/share/containers/storage/btrfs/subvolumes for 561da8272542ab2a71977655a51b5d20c20627fd3a917165d1fe89b0370f4f93: operation not permitted 
Error: Failed to destroy btrfs snapshot /home/sbrady/.local/share/containers/storage/btrfs/subvolumes for 561da8272542ab2a71977655a51b5d20c20627fd3a917165d1fe89b0370f4f93: operation not permitted

Describe the results you expected:

BTRFS subvolumes to be deleted on container deletion.

Additional information you deem important (e.g. issue happens only occasionally):

Consistent regardless of the image.

Output of podman version:

podman version 1.5.1

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.12.9
  podman version: 1.5.1
host:
  BuildahVersion: 1.10.1
  Conmon:
    package: conmon-2.0.0-2.1.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.0, commit: unknown'
  Distribution:
    distribution: '"opensuse-tumbleweed"'
    version: "20190904"
  MemFree: 1173884928
  MemTotal: 8254943232
  OCIRuntime:
    package: runc-1.0.0~rc8-1.4.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8
      spec: 1.0.1-dev
  SwapFree: 2141966336
  SwapTotal: 2147483648
  arch: amd64
  cpus: 4
  eventlogger: file
  hostname: rocinante
  kernel: 5.2.11-1-default
  os: linux
  rootless: true
  uptime: 2h 31m 17.3s (Approximately 0.08 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
store:
  ConfigFile: /home/sbrady/.config/containers/storage.conf
  ContainerStore:
    number: 18
  GraphDriverName: btrfs
  GraphOptions: null
  GraphRoot: /home/sbrady/.local/share/containers/storage
  GraphStatus:
    Build Version: 'Btrfs v5.2.1 '
    Library Version: "102"
  ImageStore:
    number: 16
  RunRoot: /var/run/user/1000/containers
  VolumePath: /home/sbrady/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.5.1-1.1.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):
Bare metal install on Intel i7 and spinning rust HDD.

Pastes of Storage.conf and libpod.conf

โžœ la ~/.config/containers
total 40K
-rw-r--r-- 1 sbrady users 4.4K Sep  6 14:23 libpod.conf
-rw-r--r-- 1 sbrady users  205 Aug  9 13:30 mounts.conf
drwxr-xr-x 1 sbrady users   14 Aug  9 13:30 oci/
-rw-r--r-- 1 sbrady users  256 Aug  9 13:30 policy.json
-rw-r--r-- 1 sbrady users 1.1K Aug  9 13:30 registries.conf
drwxr-xr-x 1 sbrady users    0 Aug  9 13:30 registries.d/
-rw-r--r-- 1 sbrady users  12K Aug  9 13:30 seccomp.json
-rw-r--r-- 1 sbrady users 5.0K Sep  6 13:21 storage.conf
/home/sbrady/.local/share/containers
โ”œโ”€โ”€ cache
โ”‚ย ย  โ””โ”€โ”€ blob-info-cache-v1.boltdb
โ””โ”€โ”€ storage
    โ”œโ”€โ”€ btrfs
    โ”‚ย ย  โ””โ”€โ”€ subvolumes
    โ”œโ”€โ”€ btrfs-containers
    โ”‚ย ย  โ”œโ”€โ”€ 1e96b0289a1d5ac651f5c521d325dd2911f73b2a6dffbcbfb2e982310ffbaf05
    โ”‚ย ย  โ”œโ”€โ”€ 30e2a4ea384aa36e4a9d5313a89a47efca248f89a6e134f71f0aa952b536b51c
    โ”‚ย ย  โ”œโ”€โ”€ 30f3976b8ab272bd229a770b0f0e9807ad8b00798178a6732909da3899308935
    โ”‚ย ย  โ”œโ”€โ”€ 3886c1a160034a4c7cae0c59b1a3ee93e0371b837bae4072336b2df16cd2a4cc
    โ”‚ย ย  โ”œโ”€โ”€ 3dd0a9edede5dd4fa3a5333665fcb69f45235e5456bb781863f39e41fe0047b7
    โ”‚ย ย  โ”œโ”€โ”€ 4f8e7486fbc706acaa2675f3cb8afa32a5e2b742810fea560c740ff7ddadfb86
    โ”‚ย ย  โ”œโ”€โ”€ 6e0b92b86a79fb1e8cc6d6c68f7c8e82d9eb3ef96b758c5d3b0a850d5ecdd30a
    โ”‚ย ย  โ”œโ”€โ”€ 7051dd05a2a3c0712b92a3d5277b17beedb221c29c7859f7279e778300cc0239
    โ”‚ย ย  โ”œโ”€โ”€ 812bfe158ec304e077e6d2e05eb7d3f9f01631e3bc7c6fc49c01d205117bba8f
    โ”‚ย ย  โ”œโ”€โ”€ 9e8c3f9cb6b6346f808a89e417d3b365f737cd849ae4e8c41b388f6740640be3
    โ”‚ย ย  โ”œโ”€โ”€ b10003959017ff33e909df830fb7acbd87c081bfcf53f997aba6c7afe5040ded
    โ”‚ย ย  โ”œโ”€โ”€ c20462222981918a27e00267ba8ba0118c1064b9d2728b529c1ff0c9d75cb238
    โ”‚ย ย  โ”œโ”€โ”€ c8bf5970068ed878adadacc10a8e04fd3471f16fa5f95750f501bc9c50cf596b
    โ”‚ย ย  โ”œโ”€โ”€ ca09086e22391e95198dd7aa5abc34f168cd64132488e74042c3aaa7860f162c
    โ”‚ย ย  โ”œโ”€โ”€ containers.json
    โ”‚ย ย  โ”œโ”€โ”€ containers.lock
    โ”‚ย ย  โ”œโ”€โ”€ d2878a44fb5ceed65a97f2a3c98f50cc7ec1f4ba02b672d8708278f9c9d1c2b9
    โ”‚ย ย  โ”œโ”€โ”€ de8886e87ab7c7e667426e84a096695f4d434fe8ed42149fb157e7b9a398b906
    โ”‚ย ย  โ”œโ”€โ”€ e0780ebf9f35bb28407986c191257acdb529d9aa198ca2e5132f06390ac3bf0c
    โ”‚ย ย  โ”œโ”€โ”€ e7315b2231cfaefb7991b403ffb18d60d26e25eb1950a090155b3c3be776ea19
    โ”‚ย ย  โ””โ”€โ”€ fea6040e4ffbb6fcf85570aee5475fb2a03853a540b8aa89be2d945c6af64290
    โ”œโ”€โ”€ btrfs-images
    โ”‚ย ย  โ”œโ”€โ”€ 0868e92e943cba2ce2ed3b5705d9dcd4adeec9da9088ab69b3c44af199072b3c
    โ”‚ย ย  โ”œโ”€โ”€ 0ed5811d6d9c68658a20eb354b1917bbf5af162c773eb4edf6168fae00ff09f6
    โ”‚ย ย  โ”œโ”€โ”€ 172b73ede26844c52a903bb4e905f18ecbdc1227a6e32b86eb66b60ce999224f
    โ”‚ย ย  โ”œโ”€โ”€ 18ffcb379eccf2d03f71066d45bf3d9c6078c8dc2de843eb41361f22aa8e8430
    โ”‚ย ย  โ”œโ”€โ”€ 23b52ed766eb03c4151be9e41a0eb2fdce003c910982665b7b92a468bca1c3b3
    โ”‚ย ย  โ”œโ”€โ”€ 337ab92e6b8b755823c3363b11948f13441bcb2811988617793642dd1c5c0ac9
    โ”‚ย ย  โ”œโ”€โ”€ 5e09dd17175ecdff0b478333a6d5f444ea2314e5f9114a5443d7fc9fec86b834
    โ”‚ย ย  โ”œโ”€โ”€ 69c54a0cfa733e6fdb478b5612127feec51153ce066cff4a32f1fdce84bb8af6
    โ”‚ย ย  โ”œโ”€โ”€ 6d038d18f5017765cfcbb2262ae3933429e5be0c64f5d70130de8c788791e1d7
    โ”‚ย ย  โ”œโ”€โ”€ a8c54eebc7056cd3dfccee64c28c12d7653fbe2b1fe61159f2c08fcfb15110b0
    โ”‚ย ย  โ”œโ”€โ”€ b14710b9d573f363bbbad56f0ff69e79b5f229b83daaecaf25d9856a84308df7
    โ”‚ย ย  โ”œโ”€โ”€ b151cdb91db489ee8ab7ba84839dd420164692b3031020c8ded00436421715b6
    โ”‚ย ย  โ”œโ”€โ”€ cac4ae0c405ea55bed5402512d66abadac6ea01c31e650ffd86af31560ad98bc
    โ”‚ย ย  โ”œโ”€โ”€ ced8a8fe165881fdef10a647838751d98e0a3d7aba06316f44bdf28f86e23d25
    โ”‚ย ย  โ”œโ”€โ”€ e6da4025fb017e4e79e7339c5953f4c1aa247c11fe7862920d53ef8879341cfa
    โ”‚ย ย  โ”œโ”€โ”€ fce289e99eb9bca977dae136fbe2a82b6b7d4c372474c9235adc1741675f587e
    โ”‚ย ย  โ”œโ”€โ”€ images.json
    โ”‚ย ย  โ””โ”€โ”€ images.lock
    โ”œโ”€โ”€ btrfs-layers
    โ”‚ย ย  โ”œโ”€โ”€ 06b85792928655f9c05298c2104b509396a9ae72fe22f739bb004a2ff87f99d0.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ 30fa407a2912badb23e133597472bec5d233e439a497d96751f5dcfb8617894e.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ 3f194c10a561e3b694e9602b088a4e516eb19ee468992bb5cc6845c717b06b49.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ 43316bbb040759a859d32501f20dd865db10b6ee62659fd09aead1a920097982.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ 46e841fc16afde233e84ec806cb528e3d0ae0c8ead03f074aa7699f67b8f1b4c.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ 51899d997ab4c1f790759deeeaaebb03d3742c34236e5a85499eb302cea6fb7b.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ 8297fa3a5e5f1eb097422189d1f7d9046dfcc658008fe395d8b27e93a5954691.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ 866d3ed87bbbc6beb540cc52cb79f8efaec0f3a8b7cf554f65c5df8fce524ec0.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ 9dae2a8870ddfae25701156bc993448175a4cf5bbc0f537e8563e286a5a37385.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ a17440e364015841337f4b80d2a42cf79936cba443ab941525785a7e35f0c0c6.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ a2f24cc38b696cc363f1270684da6d5ae40c5b086f6f6978462e7c7f551e4341.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ af0b15c8625bb1938f1d7b17081031f649fd14e6b233688eea3c5483994a66a3.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ b6b761c5afcb8b69e34a26df1ce16be5d16b3088c838060dcc2528cc3d4cdd5f.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ d548e5ae588bc66a2250ed8312145394cb9704ecdf50153fe608b93ec3c15a0f.tar-split.gz
    โ”‚ย ย  โ”œโ”€โ”€ layers.json
    โ”‚ย ย  โ””โ”€โ”€ layers.lock
    โ”œโ”€โ”€ cache
    โ”‚ย ย  โ””โ”€โ”€ blob-info-cache-v1.boltdb
    โ”œโ”€โ”€ libpod
    โ”‚ย ย  โ””โ”€โ”€ bolt_state.db
    โ”œโ”€โ”€ mounts
    โ”œโ”€โ”€ storage.lock
    โ”œโ”€โ”€ tmp
    โ””โ”€โ”€ volumes
        โ”œโ”€โ”€ brave-storage-volume
        โ””โ”€โ”€ test
kinbug stale-issue

All 29 comments

Sounds like c/storage cleanup - @nalind Agree?

Actually, wait - how are we using the btrfs driver with rootless?

@giuseppe PTAL

It's a bit confusing at the moment. For historical reasons, btrfs subvolume create is permitted unprivileged, but btrfs subvolume delete requires root privilege unless the file system was mounted with option user_subvol_rm_allowed.

However, recently, rmdir(2) will delete a btrfs subvolume, without privilege escalation required, so long as the user otherwise has privilege for the subvolume and contents. I haven't benchmarked a fully populated set of subvolumes to compare btrfs subvolume delete vs rm -rf - there could be a difference if the latter causes recursive rm of files/dirs and then rmdir on the subvolume, whereas btrfs sub del calls BTRFS_IOC_SNAP_DESTROY ioctl which generally exits quickly, and cleanup happens in the background.

Ok, I don't truly understand what is going on here. But are we doing something wrong in the storage driver that we can fix to make rootless podman with btrfs work? Or should we simply block rootless podman with BTRFS because it will never work correctly and force users to use fuse-overlay?

If kernel 4.18+ use rm -rf
That will work as if the subvolume is a directory. Just like on any other file system, it will check every file and directory for the proper permissions, and unlinkat() each one in turn.

Optimization 1: If root, use btrfs subvolume delete.
Optimization 2: If rootless, check if the btrfs volume is mounted with option user_subvol_rm_allowed and if so then use btrfs subvolume delete

btrfs subvolume delete avoids all the traversal and recursive unlinkat(), so it's way faster.

For kernels 4.17 and older, then you could check for user_subvol_rm_allowed mount option and then permit rootless podman, otherwise disallow it.

We've never actually tested the btrfs driver on rootless before, and it was never written with rootless support in mind. However, it seems like everything except cleanup is already working. Given that we seem to have a way forward there (at least on newer kernels), this sounds like a reasonable fix.

FWIW, user_subvol_rm_allowed exists since 2010, circa kernel 2.6.38. When enabled, there is only a check of subvolume owner/perm, not contents. If the user has the proper privilege for just the subvolume, btrfs subvolume delete will exit 0, everything inside instantly goes bye bye.

I'm not sure what use case would exist where a user owns the subvolume but doesn't own the contents, but...

I've never tried the btrfs driver and I am surprised it works with rootless.

@cmurf would it be enough if we attempt to rm -rf if the volume deletion fails with EPERM?

Would you like to work on a patch for that?

@giuseppe actually it's a good idea to try the optimized case first, since it's way faster, and then use rm -rf as the fallback. It'll take me longer to find the file to work on than it'll take an actual competent person to just patch it.

For users experiencing this issue, what storage driver should those running Podman rootless use before BTRFS is supported?

We'd recommend fuse-overlayfs

I'm good with closing this is as "unsupported" if you'd like to track this as a pending feature elsewhere, but I'll leave that up to the dev team. Thanks for your quick responses.

Apologies if I'm stating the obvious, but as Podman is creating the BTRFS subvolume it could easily tag the subvolume as USER_SUBVOL_RM_ALLOWED on creation, right? I'm not a Go programmer but I an read it, and it appears that support for this option could be added consistently with the other BTRFS options used on subvolume creation in func parseoptions in c/storage/drivers/btrfs/btrfs.go. This would allow the non-root user to delete BTRFS subbvolumes on supported kernels. This should cover most cases, and a verbose error message may cover the edge cases with a simple addition to code.

I think that's definitely possible. The biggest issue right now is probably finding someone to work on it - most people are on fuse-overlayfs, so fixing the btrfs driver is lower priority on the storage side.

user_subvol_rm_allowed is a Btrfs mount option, not a subvolume property. It's also not a per mount point option - once you use it, it applies to all other mounts for this file system. e.g.

[chris@fmac ~]$ mount | grep btrfs
/dev/sda6 on / type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,subvolid=496,subvol=/root)
/dev/sda6 on /boot type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,subvolid=581,subvol=/boot)
/dev/sda6 on /home type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,subvolid=468,subvol=/home)
[chris@fmac ~]$ sudo mount -o remount,user_subvol_rm_allowed /home
[chris@fmac ~]$ mount | grep btrfs
/dev/sda6 on / type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,user_subvol_rm_allowed,subvolid=496,subvol=/root)
/dev/sda6 on /boot type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,user_subvol_rm_allowed,subvolid=581,subvol=/boot)
/dev/sda6 on /home type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,user_subvol_rm_allowed,subvolid=468,subvol=/home)
[chris@fmac ~]$ 

With this mount option set, the user must have privilege for just the subvolume they want to delete, contents aren't checked for privileges. That lack of checking content ownership is why user_subvol_rm_allowed isn't the default behavior, but you can certainly add it to your fstab and make it a persistent mount option.

@cmurf Yep, makes total sense now... I just found all this out for troubleshooting. I can confirm Podman works as expect with BTRFS as a storage driver when running in rootless mode.

For troubleshooters, add the user_subvol_rm_allowed flag to /etc/fstab if you'd like to use podman rootless with a BTRFS file system. Be aware of potential security implications to this option, and do your homework before enabling.

The root line of my /etc/fstab looks like this:

UID=466f96bf-0e8f-44cb-a42b-c46cedca5803  /                       btrfs  defaults,user_subvol_rm_allowed  0  0

@mheon Something to consider though, the overlay option is not supported for people running BTRFS, so in effect people with BTRFS filesystems as their root filesystem effectively cannot run Podman rootless without issue deleting containers. Just an FYI on that.

I was wrong here and had a different configuration issue. overlay is the default and preferred option here.

I think the second part needs an issue of its own. I know btrfs and overlayfs can co-exist in production environments with tens of thousands of snapshots. I'm not sure what the nature of this lack of support could be about. There are nuances that can be workload specific where one works better than the other, and even where overlayfs copy-up operation can be made more efficient using cloning (Btrfs has had reflinks since forever, and XFS enables them in the most recent xfsprogs at mkfs time).

Hmm, on Fedora 31, by default it appears podman is using fuse-overlayfs on Btrfs.

store:
  ConfigFile: /home/chris/.config/containers/storage.conf
  ContainerStore:
    number: 2
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mount_program=/usr/bin/fuse-overlayfs
  GraphRoot: /home/chris/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"

@SwitchedToGitlab We would welcome a PR on container/storage to better support BTRFS for rootless containers.

Or at least add some of this information on how to setup btrfs for use in rootless containers in some of the troublsheooting rootless.md files on github.

Hello.

I noticed this issue on btrfs ML and if nobody is working on, I'm willing to help (though I'm not familiar with podman and may take some time).

That would be great, Changes might be required in github.com/containers/storage though since most of the BTRFS code is in there.

Thanks, let me try.

Honestly using overlay is greatly preferred. Not sure if this is something that really needs a fix in code. It works great with BTRFS once you enable user_subvol_rm_allowed on the root filesystem.I'll see if I can get some time to update the rootless tutorial on using overlay as the storage driver, ensuring fuse-overlayfs is installed, and the advantages of doing so. I'm also planning on adding that for BTRFS support to work you need to set user_subvol_rm_allowed flag in /etc/fstab.

That said, @t-msn I think we run into trouble at or about line 308 in containers/storage. The elegant solution IMHO would be to trap for errors on this unix.Syscall then re-try deleting the subvolume using the Golang OS standard library if it fails. It's also my understanding that you will need to walk the subvolume, deleting any subvolumes as you go (if that makes sense). Not a Go programmer, but that's my understanding.

It works great with BTRFS once you enable user_subvol_rm_allowed on the root filesystem.

Right. The needed fix is when that mount option is not set, the subvolume remove fails with an error, in which case the fallback should be to 'rm -rf' the subvolume. It's slower than subvolume delete, but at least it won't fail, unless the user doesn't actually own what they're deleting.

Are either of you guys at All systems go this weekend?

Honestly using overlay is greatly preferred. Not sure if this is something that really needs a fix in code.

well, I think the fix is trivial and won't hurt anyone (please see below).

It's also my understanding that you will need to walk the subvolume, deleting any subvolumes as you go

We cannot call IOC_SNAP_DESTROY ioctl if it contains other subolumes. This is the reason subovlDelte() performs path walk to remove subvolume bottom-up, but it is not necessary for "rm -r"

I check the code and notice that system.EnsureRemoveAll() is called after subvolDelete() and it uses os.RemoveAll() to ensure the target path is removed.

Therefore the simplest solution would be just ignoring the error of subvolDelete() and fallback to system.EnsureRemoveAll(): https://github.com/t-msn/storage/commit/41c2a90841cfc42f1373dac0de240773b405f536

I followed the tutorial (https://github.com/containers/libpod/blob/master/docs/tutorials/rootless_tutorial.md and https://github.com/containers/libpod/blob/master/docs/tutorials/podman_tutorial.md) and with above fix I can do "podman rm".

(BTW, subvol quota operation needs privilege.)

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

@t-msn - Would you be willing to submit your change (https://github.com/t-msn/storage/commit/41c2a90841cfc42f1373dac0de240773b405f536) as a Pull Request?

Was this page helpful?
0 / 5 - 0 ratings