Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Running a script that resides on a mounted vokume fails
Steps to reproduce the issue:
cat > ./example <<EOF
#!/bin/bash
echo hello
EOF
chmod +x ./example
podman run -v "$PWD:$PWD" -w "$PWD" fedora:30 ./example
podman run -v "$PWD:$PWD" -w "$PWD" fedora:30 "$PWD"/example
Describe the results you received:
Error: executable file not found in $PATH: No such file or directory: OCI runtime command not found error
/bin/bash: /home/avi/example: Permission denied
Describe the results you expected:
hello
hello
Additional information you deem important (e.g. issue happens only occasionally):
Looks like there are two bugs: first, relative paths don't work, and second, permission is denied when running a script from a mounted volume.
Output of podman version:
podman version 1.6.1
Output of podman info --debug:
debug:
compiler: gc
git commit: ""
go version: go1.13
podman version: 1.6.1
host:
BuildahVersion: 1.11.2
CgroupVersion: v2
Conmon:
package: conmon-2.0.1-1.fc31.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.1, commit: 5e0eadedda9508810235ab878174dca1183f4013'
Distribution:
distribution: fedora
version: "31"
MemFree: 17672916992
MemTotal: 33549914112
OCIRuntime:
package: crun-0.10.2-1.fc31.x86_64
path: /usr/bin/crun
version: |-
crun version 0.10.2
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
SwapFree: 16869486592
SwapTotal: 16869486592
arch: amd64
cpus: 8
eventlogger: file
hostname: tmp.scylladb.com
kernel: 5.3.4-300.fc31.x86_64
os: linux
rootless: true
slirp4netns:
Executable: /usr/bin/slirp4netns
Package: slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31.x86_64
Version: |-
slirp4netns version 0.4.0-beta.3+dev
commit: bbd6f25c70d5db2a1cd3bfb0416a8db99a75ed7e
uptime: 45h 11m 12.94s (Approximately 1.88 days)
registries:
blocked: null
insecure: null
search:
- docker.io
- registry.fedoraproject.org
- quay.io
- registry.access.redhat.com
- registry.centos.org
store:
ConfigFile: /home/avi/.config/containers/storage.conf
ContainerStore:
number: 32
GraphDriverName: overlay
GraphOptions:
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: fuse-overlayfs-0.6.4-2.fc31.x86_64
Version: |-
fusermount3 version: 3.6.2
fuse-overlayfs: version 0.6.4
FUSE library version 3.6.2
using FUSE kernel interface version 7.29
GraphRoot: /home/avi/.local/share/containers/storage
GraphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
ImageStore:
number: 4
RunRoot: /run/user/1000
VolumePath: /home/avi/.local/share/containers/storage/volumes
Package info (e.g. output of rpm -q podman or apt list podman):
podman-1.6.1-2.fc31.x86_64
Additional environment details (AWS, VirtualBox, physical, etc.):
Physical
Is the volume mounted with the "noexec" flag?
No.
SELinux?
The first one - I think that's a known issue with that crun version not supporting relative paths, fixed upstream. Second one is interesting, though. Can you do a podman inspect on the container and look up the $PWD:$PWD volume in the Mounts section? What mount options are listed?
SELinux?
I was taught to disable SELinux on any machine I touch.
The first one - I think that's a known issue with that
crunversion not supporting relative paths, fixed upstream. Second one is interesting, though. Can you do apodman inspecton the container and look up the$PWD:$PWDvolume in theMountssection? What mount options are listed?
Here are the storage-related sections:
"GraphDriver": {
"Name": "overlay",
"Data": {
"LowerDir": "/home/avi/.local/share/containers/storage/overlay/fe17f2ed86269a12ee4f30df6a2f385cd1d20d15d6f993062cf084e4f92d125c/diff",
"UpperDir": "/home/avi/.local/share/containers/storage/overlay/d1f203cdf6280fa55d9598a197c26404381537b41c5a6465bf6f6eeb0d1e4890/diff",
"WorkDir": "/home/avi/.local/share/containers/storage/overlay/d1f203cdf6280fa55d9598a197c26404381537b41c5a6465bf6f6eeb0d1e4890/work"
}
},
"Mounts": [
{
"Type": "bind",
"Name": "",
"Source": "/home/avi",
"Destination": "/home/avi",
"Driver": "",
"Mode": "",
"Options": [
"rbind"
],
"RW": true,
"Propagation": "rprivate"
}
],
Well you should relearn that lesson for any system running containers, SELinux is the best protection against breakout.
Just run a bash script and see if you see example
If I put selinux into permissive mode this works for me.
podman run -ti -v "$PWD:$PWD" -w "$PWD" fedora:30 "$PWD"/example
hello
This does not:
podman run -ti -v "$PWD:$PWD" -w "$PWD" fedora:30 ./example
Error: executable file not found in $PATH: No such file or directory: OCI runtime command not found error
Does the second example work with Docker?
@giuseppe Any idea why the ./example fails?
Both examples work with docker, and in fact with podman on my Fedora 30 machine (but it's likely due to a different configuration).
Could cgroups2 be responsible?
@rhatdan the relative path one doesn't work because of crun. It is already fixed upstream, I'll cut a new release soon
@avikivity Could you confirm the second version does not work for you. Since it works for me with podman 1.6.1 on f31. (Uses crun and cgroup v2)
Confirmed. It does not work.
I can attach an strace -f if it will help.
Probably would not help.
If you run
podman run -ti -v "$PWD:$PWD" -w "$PWD" fedora:30 sh
And then
# ls -l example
# ./example
What happens?
$ podman run -ti -v "$PWD:$PWD" -w "$PWD" fedora:30 sh
sh-5.0#
sh-5.0# ls -l example
ls: cannot access 'example': Permission denied
sh-5.0# ./example
sh: ./example: Permission denied
from the host:
$ ls -l example
-rwxrwxr-x. 1 avi avi 23 Oct 21 20:43 example
Oh, actually selinux is enabled (was confused by 0 exit code of selinuxenabled)
After setenforce 0 it works.
Can you look for AVCs in the system logs? Grepping for AVC should find it. An example:
Jun 01 20:12:44 bellerophon.lldp.net audit[1]: AVC avc: denied { create } for pid=1 comm="systemd" name="code" scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=dir permissive=0
Something like that, but for Podman?
Oct 23 00:25:24 tmp.scylladb.com audit[66029]: AVC avc: denied { execute } for pid=66029 comm="sh" name="example" dev="dm-1" ino=1053690 scontext=system_u:system_r:container_t:s0:c606,c1012 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=0
Oct 21 21:55:33 tmp.scylladb.com audit[62812]: AVC avc: denied { read } for pid=62812 comm="example" name="example" dev="dm-1" ino=1053690 scontext=system_u:system_r:container_t:s0:c826,c913 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=0
Looks like you'd need to :Z or :z to relabel the volume, or run with --security-opt label=disable - @rhatdan agree?
Yes :Z will fix the issue. But that will relabel the entire directory, which is not really needed.
chcon -t container_file_t example
Will fix the issue without screwing up the entire labeling of the $PWD
Of course I asked https://github.com/containers/libpod/issues/4306#issuecomment-544489840
Of course I asked #4306 (comment)
And I gave the wrong answer :(
I guess this can be closed? But it's sad that the default invocation fails like this.
@avikivity Did not mean any offence in my statement.
From an SELinux point of view, this looks like a container processes has escaped and is attempting to read/execute content in your home dir. Therefore it is blocked. If you do not believe this is good behaviour then you can disable SELinux container separation, or you can label the content as something the container can use.
Of course, offense was self-assigned for being a bad bug reporter.
I am using podman to build a program with a frozen toolchain. The source has a file with the name of a container image that contains a toolchain, and then then build executes in that container. As part of the build it has to read, execute, and write files to the home directory (and various caches like /var/cache/ccache).
So I guess disabling selinux container separation is the right thing. But this should be documented prominently, it is surprising to someone coming from docker.
Well the same issues have always existed for Docker. I added the SELinux support for Docker. I think the surprise comes from people not used to dealing with SELinux.
SELinux is the only think that really protects the file system from container breakout, so when you have intentional container breakout, you need to deal with it.