Singularity: sandbox directory cannot be removed if original container has non-writeable directories

Created on 24 Sep 2019  Â·  14Comments  Â·  Source: hpcng/singularity

Version of Singularity:

singularity-3.4.0-1.2.el7.x86_64

Expected behavior

singularity -s build --sandbox /home/aforti/docker_centos_7/image docker://centos:7

Expect like in 3.2.1 or even 2.6.1 singularity to build the sandbox with the paermission of the callers

Actual behavior

if I leave the cache eanbled the command fails with

2019-09-24 22:37:58,127 | ERROR    | Container execution failed with errors. Error code: 255
2019-09-24 22:37:58,127 | ERROR    | FATAL:   While performing build: conveyor failed to get: Error initializing source oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: Error initializing destination oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: mkdir : no such file or directory

if I disable the cache it returns some warnings and then it builds the sandbox with root permissions whether the user napmesspaces is enabled or not.

warnings

2019-09-24 22:43:54,556 | WARNING  | 2019/09/24 22:43:49  info unpack layer: sha256:d8d02d45731499028db01b6fa35475f91d230628b4e25fab8e3c015594dc3261
2019-09-24 22:43:54,556 | WARNING  | 2019/09/24 22:43:49  warn rootless{usr/bin/ping} ignoring (usually) harmless EPERM on setxattr "security.capability"
2019-09-24 22:43:54,556 | WARNING  | 2019/09/24 22:43:52  warn rootless{usr/sbin/arping} ignoring (usually) harmless EPERM on setxattr "security.capability"
2019-09-24 22:43:54,556 | WARNING  | 2019/09/24 22:43:52  warn rootless{usr/sbin/clockdiff} ignoring (usually) harmless EPERM on setxattr "security.capability"

permissions that do not allow to delete the sandbox

-bash-4.2$ rm -rf docker_centos_7/
rm: cannot remove ‘docker_centos_7/image/root/.cshrc’: Permission denied
rm: cannot remove ‘docker_centos_7/image/root/anaconda-ks.cfg’: Permission denied
rm: cannot remove ‘docker_centos_7/image/root/.bashrc’: Permission denied
rm: cannot remove ‘docker_centos_7/image/root/.bash_logout’: Permission denied
rm: cannot remove ‘docker_centos_7/image/root/.bash_profile’: Permission denied
rm: cannot remove ‘docker_centos_7/image/root/.tcshrc’: Permission denied
[......]

Steps to reproduce this behavior

with the cache

mkdir docker_centos_7
export SINGULARITY_TMPDIR=docker_centos_7
export SINGULARITY_CACHEDIR=$SINGULAIRTY_TMPDIR/cache
singularity -s build --sandbox /home/aforti/docker_centos_7/image docker://centos:7
````
without the cache

export SINGULARITY_DISABLE_CACHE=1
docker_centos_7
export SINGULARITY_TMPDIR=docker_centos_7
singularity -s build --sandbox /home/aforti/docker_centos_7/image docker://centos:7

### What OS/distro are you running

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
```

How did you install Singularity

rpm from EPEL repository

Bug Regression Release 3.4

All 14 comments

I believe the removal failure could be because of the unpack failure; if it is like another docker unpacker I looked at lately, it fixes up permissions after successful completion. To workaround the removal you can do find path_to_image -type d ! -perm -200|xargs chmod u+w.

I see different error messages when building both centos:7 and centos:6, with version 3.4.0-1.2 and 3.4.1-1.1. This is what I see when building with privileged singularity:

FATAL:   While performing build: sandbox assemble failed: exit status 1: mv: cannot open '/tmp/sbuild-811365620/fs/etc/gshadow' for reading: Permission denied
mv: cannot open '/tmp/sbuild-811365620/fs/etc/shadow' for reading: Permission denied
mv: cannot open '/tmp/sbuild-811365620/fs/etc/shadow-' for reading: Permission denied
mv: cannot open '/tmp/sbuild-811365620/fs/etc/gshadow-' for reading: Permission denied

and this is what I see with the -u option:

INFO:    Starting build...
INFO:    Building into existing container: /cloud/login/dwd/scratch/centos7-sandbox
FATAL:   While performing build: failed to retrieve path for /cloud/login/dwd/scratch/centos7-sandbox: lstat /cloud/login/dwd/scratch/centos7-sandbox: no such file or directory

I don't understand how nobody caught this before now since this is a pretty common task. I know I have been building sandboxes in the last month. @cclerget @dctrud can you take a look?

Thanks @DrDaveD, I don't want to use workarounds it makes progressively more fragile. Already I had to rename the default I/O directory because of this in 3.2.1 issue https://github.com/sylabs/singularity/issues/4498 now I have also to go around this. Note that already building the sandbox is an attempt to fix this https://github.com/sylabs/singularity/issues/2588 and which is also a problem for others https://github.com/sylabs/singularity/issues/3886 and were never really replied.

This problem is impacting us (unpacked.cern.ch) as well.

We are not able to delete images that are created from a docker container, which is quite problematic.
Also we don't have root access.

I can confirm with the docker://centos:7 latest container pulled onto a Debian buster machine. I can remove the failed sandbox with...

chmod -R +rw test_sandbox/
rm -rf test_sandbox/

I'm going to guess something is going on here with the umoci based code for unpacking which was brought in to fix some different issues. @ikaneshiro - can you advise on this at all?

@dctrud as written in the ticket if the cache is not disabled it doesn't even create the image,
it exits with

While performing build: conveyor failed to get: Error initializing source oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: Error initializing destination oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: mkdir : no such file or directory

same goes using -u fails with another error

singularity build -u --sandbox /home/aforti/github/panda-wnscript/src/runcontainer/docker_centos_7/image docker://centos:7
INFO:    Starting build...
INFO:    Building into existing container: /home/aforti/github/panda-wnscript/src/runcontainer/docker_centos_7/image
FATAL:   While performing build: failed to retrieve path for /home/aforti/github/panda-wnscript/src/runcontainer/docker_centos_7/image: lstat /home/aforti/github/panda-wnscript/src/runcontainer/docker_centos_7/image: no such file or directory

so the new title doesn't reflect the whole story and may lead you to fix only 1 of what look like 3 problem(s). Unless you think they all have the same root cause.

thanks

Hi @afortiorama - the caching issue is a separate thing, which neither @DrDaveD or we are replicating yet. It's noted, but we're concentrating on the sandbox permission problem first, which we can replicate with caching enabled. I'll split the caching thing into a new issue in a bit.

PR #4522 should return the non-root OCI/docker origin sandboxes to the previous state from <3.4.0. Any :eyes: on it much appreciated.

@DrDaveD

I don't understand how nobody caught this before now since this is a pretty common task. I know I have been building sandboxes in the last month

I swore I tested this... and looking back I had... but in VMs where I was building a sandbox on the same device as where /tmp is present. That means no inter-device move of the sandbox (which is a cp + delete) so the permissions error doesn't occur.

The regression test being added to the PR is looking at actual permissions on files in the container, in order to not be affected by this inter vs intra-device move causing success/failure.

In order to try to separate this into the different problem being reported, this part:

2019-09-24 22:37:58,127 | ERROR    | FATAL:   While performing build: conveyor failed to get: Error initializing source oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: Error initializing destination oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: mkdir : no such file or directory

is because of this typo:

export SINGULARITY_TMPDIR=docker_centos_7
export SINGULARITY_CACHEDIR=$SINGULAIRTY_TMPDIR/cache

note it says "SINGULAIRTY_TMPDIR" instead of "SINGULARITY_TMPDIR".

Because of that the cache directory is being set to /cache and that's what causes the error:

$ SINGULARITY_CACHEDIR=/cache SINGULARITY_TMPDIR=issue-4517 ./builddir/singularity build --sandbox $PWD/issue-4517/image docker://centos:7
FATAL:   Unable to create build: could not create temp dir in "issue-4517": stat issue-4517: no such file or directory

$ mkdir issue-4517

$ SINGULARITY_CACHEDIR=/cache SINGULARITY_TMPDIR=issue-4517 ./builddir/singularity build --sandbox $PWD/issue-4517/image docker://centos:7
INFO:    Starting build...
FATAL:   While performing build: conveyor failed to get: Error initializing source oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: Error initializing destination oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: mkdir : no such file or directory

note this is the original issue as reported.

Then there's a second different issue:

$ SINGULARITY_CACHEDIR=$PWD/issue-4517/cache SINGULARITY_TMPDIR=$PWD/issue-4517 ./builddir/singularity build --sandbox $PWD/issue-4517/image docker://centos:7
INFO:    Starting build...
FATAL:   While performing build: conveyor failed to get: Error initializing source oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: Error initializing destination oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: mkdir : no such file or directory

$ mkdir issue-4517/cache

$ SINGULARITY_CACHEDIR=$PWD/issue-4517/cache SINGULARITY_TMPDIR=$PWD/issue-4517 ./builddir/singularity build --sandbox $PWD/issue-4517/image docker://centos:7
INFO:    Starting build...
Getting image source signatures
Copying blob sha256:d8d02d45731499028db01b6fa35475f91d230628b4e25fab8e3c015594dc3261
 71.92 MiB / 71.92 MiB [===================================================] 12s
Copying config sha256:acab94af64effb1f7481666a37788e7a59465e723f0b0fe0a0f458f3f4856638
 1.05 KiB / 1.05 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
2019/09/25 18:13:48  info unpack layer: sha256:d8d02d45731499028db01b6fa35475f91d230628b4e25fab8e3c015594dc3261
2019/09/25 18:13:49  warn rootless{usr/bin/ping} ignoring (usually) harmless EPERM on setxattr "security.capability"
2019/09/25 18:13:50  warn rootless{usr/sbin/arping} ignoring (usually) harmless EPERM on setxattr "security.capability"
2019/09/25 18:13:50  warn rootless{usr/sbin/clockdiff} ignoring (usually) harmless EPERM on setxattr "security.capability"
INFO:    Creating sandbox directory...
INFO:    Build complete: /home/mem/devel/sylabs/singularity/src/github.com/sylabs/singularity/issue-4517/image

it seems even a non-existent but writeable cache directory will cause this error.

The third issue is that once the cache is set to something that does exist, as in the last command in the previous command, the resulting sandbox cannot be deleted:

$ rm -rf issue-4517/image/
rm: cannot remove 'issue-4517/image/root/.bash_logout': Permission denied
rm: cannot remove 'issue-4517/image/root/anaconda-ks.cfg': Permission denied
rm: cannot remove 'issue-4517/image/root/.tcshrc': Permission denied
rm: cannot remove 'issue-4517/image/root/.bashrc': Permission denied
rm: cannot remove 'issue-4517/image/root/.cshrc': Permission denied
rm: cannot remove 'issue-4517/image/root/.bash_profile': Permission denied
...

Examining these files:

$ ls -l issue-4517/image/root/.bash_logout
-rw-r--r-- 1 mem mem 18 Dec 28  2013 issue-4517/image/root/.bash_logout

$ ls -ld issue-4517/image/root/
dr-xr-x--- 2 mem mem 4096 Jul 31 19:10 issue-4517/image/root/

the file itself is OK, but the directory containing it does not have write permissions.

This is fixed by:

$ chmod -R +w issue-4517/image

$ rm -rf issue-4517/image

@mem - not quite... your third issue isn't quite the same as reported in the thread above - there's a build failure that can happen from the same cause.

This is becoming very confusing to track, so I'm going to close this issue, and open multiple new ones which are more granular right now.

@dctrud I think the difference is I'm looking at master, not 3.4.

I believe a change was added in master that's effectively fixing the build failure. This problem seems specific to 3.4 right now.

Same problem with singularity version 3.0.3-1.

Was this page helpful?
0 / 5 - 0 ratings