Podman: Images that use `gosu` don't work in my rootless environment

Created on 30 Jun 2020 · 7Comments · Source: containers/podman

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

A while ago, I had some issues with a container image I was sending changes to which the maintainer wasn't able to repro on his machine. We were both running rootless podman, but I would consistently get EPERM errors when starting the container. At the time I was on Fedora 31. I was able to mostly work around the issue with bizarre combinations of --cap-add=DAC_OVERRIDE and edits to the entrypoint (which I can't find anymore ): )

Today, I pulled down the redis and rabbitmq docker images and hit similar seeming issues. When I dug into it, it appears that their use of gosu to setuid/setgid to an unprivileged user in the container is being denied by something. I figured it might be selinux denying some transition, but there don't appear to be any denied entries in my audit.log. Adding the SETUID and SETGID caps don't help either (in fact adding ALL doesn't help).

My understanding is that docker containers sometimes not working isn't unknown, but most of the issues are documented as being related to cgroups2 and manifest differently. This feels like something seccomp-y or selinuxy is getting in the way.

Steps to reproduce the issue:

Run redis containers and see them fail (hopefully)

$ podman run -t redis
error: exec: "/usr/local/bin/docker-entrypoint.sh": stat /usr/local/bin/docker-entrypoint.sh: permission denied
$ podman run -t --cap-add=SETUID,SETGID redis
error: exec: "/usr/local/bin/docker-entrypoint.sh": stat /usr/local/bin/docker-entrypoint.sh: permission denied

$ cat >Dockerfile <<EOF
FROM redis
RUN sed -i 's/exec gosu/##/' /usr/local/bin/docker-entrypoint.sh
EOF
$ podman build -t redis:debug .

podman run -t redis:debug -> should work normally now
podman run -ti --user redis --cap-add=DAC_OVERRIDE redis also works but this trick doesn't work for rabbitmq

Describe the results you received:
Unmodified redis image fails to run

Describe the results you expected:
It would be nice if these containers Just Worked (TM)

Additional information you deem important (e.g. issue happens only occasionally):

SELinux labels on my ${GRAPHROOT}/storage/

$ ls -lZ ~/.local/share/containers/storage
total 152
drwx------+  2 user group unconfined_u:object_r:container_var_lib_t:s0  4096 Jan 20 19:30 cache
drwx------+  2 user group unconfined_u:object_r:container_var_lib_t:s0  4096 Jan 20 19:30 libpod
drwx------+  2 user group unconfined_u:object_r:container_var_lib_t:s0  4096 Jan 20 19:30 mounts
drwx--x--x+ 57 user group unconfined_u:object_r:container_ro_file_t:s0 28672 Jun 30 10:19 overlay
drwx--x--x+ 19 user group unconfined_u:object_r:container_var_lib_t:s0 12288 Jun 30 10:19 overlay-containers
drwx------+ 19 user group unconfined_u:object_r:container_ro_file_t:s0 12288 Jun 30 10:16 overlay-images
drwx------+  2 user group unconfined_u:object_r:container_ro_file_t:s0 24576 Jun 30 10:19 overlay-layers
-rw-------.  1 user group unconfined_u:object_r:container_var_lib_t:s0    64 Jun 30 10:24 storage.lock
drwx------+  2 user group unconfined_u:object_r:container_var_lib_t:s0  4096 Jan 20 19:30 tmp
-rw-------.  1 user group unconfined_u:object_r:container_var_lib_t:s0     0 May  4 14:41 userns.lock
drwx--x--x+ 91 user group unconfined_u:object_r:container_var_lib_t:s0 12288 Jun 30 10:19 volumes

Trying to run rabbitmq with --user:

$ podman run -ti --user rabbitmq --cap-add=DAC_OVERRIDE rabbitmq
:eacces

00:28:38.131 [error]

00:28:38.132 [error] BOOT FAILED
BOOT FAILED
00:28:38.133 [error] ===========
===========
00:28:38.133 [error] Exception during startup:
Exception during startup:
00:28:38.133 [error]

00:28:38.133 [error]     supervisor:'-start_children/2-fun-0-'/3 line 355
    supervisor:'-start_children/2-fun-0-'/3 line 355
00:28:38.133 [error]     supervisor:do_start_child/2 line 371
00:28:38.133 [error]     supervisor:do_start_child_i/3 line 385
00:28:38.133 [error]     rabbit_prelaunch:run_prelaunch_first_phase/0 line 27
00:28:38.133 [error]     rabbit_prelaunch:do_run/0 line 111
00:28:38.133 [error]     rabbit_prelaunch_dist:setup/1 line 12
00:28:38.133 [error]     rabbit_nodes_common:do_ensure_epmd/2 line 93
00:28:38.133 [error]     erlang:open_port({spawn_executable,"/usr/local/lib/erlang/erts-11.0.2/bin/erl"}, [{args,["-boot","no_dot_erlang","-sname","epmd-starter-112824848","-noinput","-s","erlang","hal..."]},...])
    supervisor:do_start_child/2 line 371
    supervisor:do_start_child_i/3 line 385
    rabbit_prelaunch:run_prelaunch_first_phase/0 line 27
    rabbit_prelaunch:do_run/0 line 111
    rabbit_prelaunch_dist:setup/1 line 12
    rabbit_nodes_common:do_ensure_epmd/2 line 93
    erlang:open_port({spawn_executable,"/usr/local/lib/erlang/erts-11.0.2/bin/erl"}, [{args,["-boot","no_dot_erlang","-sname","epmd-starter-112824848","-noinput","-s","erlang","hal..."]},...])
00:28:38.133 [error] error:eacces
error:eacces
00:28:38.133 [error]

00:28:39.135 [error] Supervisor rabbit_prelaunch_sup had child prelaunch started with rabbit_prelaunch:run_prelaunch_first_phase() at undefined exit with reason eacces in context start_error
00:28:39.136 [error] CRASH REPORT Process <0.153.0> with 0 neighbours exited with reason: {{shutdown,{failed_to_start_child,prelaunch,eacces}},{rabbit_prelaunch_app,start,[normal,[]]}} in application_master:init/4 line 138
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbitmq_prelaunch,{{shutdown,{failed_to_start_child,prelaunch,eacces}},{rabbit_prelaunch_app,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbitmq_prelaunch,{{shutdown,{failed_to_start_child,prelaunch,eacces}},{rabbit_prelaunch_app,start,[normal,[]]}}})

Crash dump is being written to: erl_crash.dump...done

Output of podman version:
Using my distro podman at the moment:

Version:            1.9.3
RemoteAPI Version:  1
Go Version:         go1.14.2
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  gitCommit: ""
  goVersion: go1.14.2
  podmanVersion: 1.9.3
host:
  arch: amd64
  buildahVersion: 1.14.9
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.18-1.fc32.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.18, commit: 6e8799f576f11f902cd8a8d8b45b2b2caf636a85'
  cpus: 8
  distribution:
    distribution: fedora
    version: "32"
  eventLogger: file
  hostname: host
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 31337
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.6.14-300.fc32.x86_64
  memFree: 677883904
  memTotal: 8235126784
  ociRuntime:
    name: crun
    package: crun-0.13-2.fc32.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.13
      commit: e79e4de4ac16da0ce48777afb72c6241de870525
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.1-1.fc32.x86_64
    version: |-
      slirp4netns version 1.1.1
      commit: bbf27c5acd4356edb97fa639b4e15e0cd56a39d5
      libslirp: 4.2.0
      SLIRP_CONFIG_VERSION_MAX: 2
  swapFree: 7357460480
  swapTotal: 8392798208
  uptime: 169h 4m 51.29s (Approximately 7.04 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 17
    paused: 0
    running: 0
    stopped: 17
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.0.0-1.fc32.x86_64
      Version: |-
        fusermount3 version: 3.9.1
        fuse-overlayfs: version 1.0.0
        FUSE library version 3.9.1
        using FUSE kernel interface version 7.31
  graphRoot: /home/user/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 17
  runRoot: /run/user/1001/containers
  volumePath: /home/user/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.9.3-1.fc32.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):
Physical host

kinbug

Source

maybe-sybr

Most helpful comment

Well this was certainly a fun adventure. It turns out this is all due to my use of extended ACLs to restrict permissions in my home dircetory. Essentially, I mask everything with d:user:rwx and this obviously causes issues with the user namespace mapping that takes place when we run containers. For any namespaced ID which doesn't just map to my real UID external to the NS (ie. not root in the container), things wouldn't play nicely, any of RWX access would get rejected and that's why things like su/gosu would explode.

I just had to setfacl -Rb ~/.local/share/containers (I probably should have done it to the storage/ subdir only) and suddenly everything works quite happily. I'm going to close this issue since it's clearly not a libpod thing to have to deal with user-set ACLs. It might be worth adding to the troubleshooting doc though since it's fairly esoteric and manifests in really opaque permission errors.

maybe-sybr on 2 Jul 2020

❤1 🎉1 👍1

All 7 comments

With podman 2.0 redis works fine for me

$ podman run -t redis
Trying to pull registry.fedoraproject.org/redis...
  manifest unknown: manifest unknown
Trying to pull registry.access.redhat.com/redis...
  name unknown: Repo not found
Trying to pull registry.centos.org/redis...
  manifest unknown: manifest unknown
Trying to pull docker.io/library/redis...
Getting image source signatures
Copying blob 8559a31e96f4 skipped: already exists  
Copying blob 5ce7b314b19c done  
Copying blob b69876b7abed done  
Copying blob 85a6a5c53ff0 done  
Copying blob 04c4bfb0b023 done  
Copying blob a72d84b9df6a done  
Copying config 2355926154 done  
Writing manifest to image destination
Storing signatures
1:C 30 Jun 2020 12:14:04.007 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 30 Jun 2020 12:14:04.007 # Redis version=6.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 30 Jun 2020 12:14:04.007 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 30 Jun 2020 12:14:04.007 * Increased maximum number of open files to 10032 (it was originally set to 1024).
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 6.0.5 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1:M 30 Jun 2020 12:14:04.008 # Server initialized
1:M 30 Jun 2020 12:14:04.008 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 30 Jun 2020 12:14:04.008 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 30 Jun 2020 12:14:04.008 * Ready to accept connections

rhatdan on 30 Jun 2020

Both

$ podman run -ti --user rabbitmq --cap-add=DAC_OVERRIDE rabbitmq
and
$ podman run -ti --user rabbitmq  rabbitmq

Worked fine on Fedora 32.

$ podman info
host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.18-1.fc32.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.18, commit: 6e8799f576f11f902cd8a8d8b45b2b2caf636a85'
  cpus: 8
  distribution:
    distribution: fedora
    version: "32"
  eventLogger: file
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 3267
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 3267
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.6.18-300.fc32.x86_64
  linkmode: dynamic
  memFree: 849010688
  memTotal: 16416161792
  ociRuntime:
    name: runc
    package: containerd.io-1.2.10-3.2.fc31.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8+dev
      commit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
      spec: 1.0.1-dev
  os: linux
  remoteSocket:
    path: /run/user/3267/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /bin/slirp4netns
    package: slirp4netns-1.1.1-1.fc32.x86_64
    version: |-
      slirp4netns version 1.1.1
      commit: bbf27c5acd4356edb97fa639b4e15e0cd56a39d5
      libslirp: 4.2.0
      SLIRP_CONFIG_VERSION_MAX: 2
  swapFree: 6074658816
  swapTotal: 8296329216
  uptime: 280h 29m 8.18s (Approximately 11.67 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/dwalsh/.config/containers/storage.conf
  containerStore:
    number: 108
    paused: 0
    running: 0
    stopped: 108
  graphDriverName: overlay
  graphOptions:
    overlay.ignore_chown_errors: "false"
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.1.1-1.fc32.x86_64
      Version: |-
        fusermount3 version: 3.9.1
        fuse-overlayfs: version 1.1.0
        FUSE library version 3.9.1
        using FUSE kernel interface version 7.31
  graphRoot: /home/dwalsh/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 35
  runRoot: /run/user/3267/containers
  volumePath: /home/dwalsh/.local/share/containers/storage/volumes
version:
  APIVersion: 1
  Built: 1593512221
  BuiltTime: Tue Jun 30 06:17:01 2020
  GitCommit: b54a24499facb701ee76198ef6af17af1a172dfa
  GoVersion: go1.14.3
  OsArch: linux/amd64
  Version: 2.1.0-dev

rhatdan on 30 Jun 2020

Yesterday I tried one of your semanage incantations from a different issue @rhatdan , since I noticed that the SELinux labels on my graphroot seemed wrong (a lot of user_home_t labels) but it didn't fix things :(

Any tips on what I might be able to do to debug further on my machine? It's been a nagging issue and my alternative is to blow away my laptop and see if a fresh install ends up being happier.

maybe-sybr on 1 Jul 2020

What AVCs are you seeing?
ausearch -m avc -ts recent

rhatdan on 1 Jul 2020

Looks like none, which is a bit surprising. I've got a few more observations which might be relevant:

my main account (where I encountered the issue) has UID != GID, but changing it to a typical UID == GID setup did not fix anything. New logins and reboots tried to confirm. Also changed the subuid and subgid maps to see if that was behaving oddly, no change
a pre-existing account with UID == GID works fine (I did hit something like #6084 but I believe that's because I su'ed so there is no systemd user session set up)
- a new account with UID == GID or UID != GID works fine
- podman system reset and even using a different $HOME to force it to have no state doesn't make the issue go away
- purging podman from the system (all configs and shared data) and reinstalling doesn't fix it

More info on the path to now:

Fedora 30 install originally, used podman since around that time to prep for docker going away
have never used podman on the pre-existing account which isn't broken as above
Upgraded through 30->31->32

So it's obviously some bizarre user misconfiguration on my main account which I can't spot. I think my plan will be to just pivot to a new account and steal my homedir back. I'll keep the original account in case there's anything else you think I can check out.

maybe-sybr on 2 Jul 2020

❤1 🎉1 👍1

Well this was certainly a fun adventure. It turns out this is all due to my use of extended ACLs to restrict permissions in my home dircetory. Essentially, I mask everything with d:user:rwx and this obviously causes issues with the user namespace mapping that takes place when we run containers. For any namespaced ID which doesn't just map to my real UID external to the NS (ie. not root in the container), things wouldn't play nicely, any of RWX access would get rejected and that's why things like su/gosu would explode.

I just had to setfacl -Rb ~/.local/share/containers (I probably should have done it to the storage/ subdir only) and suddenly everything works quite happily. I'm going to close this issue since it's clearly not a libpod thing to have to deal with user-set ACLs. It might be worth adding to the troubleshooting doc though since it's fairly esoteric and manifests in really opaque permission errors.

Worked for me. Thanks!