Podman: Always mount tmpfs with noexec

Created on 13 Aug 2019 · 8Comments · Source: containers/podman

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

By default podman run --tmpfs always go with noexec,nosuid,nodev mount options as the code describes.

However, there is a use case where containers include s6-supervisor who needs to spawn child from files in /run: https://github.com/just-containers/s6-overlay/issues/248 (although program at /etc/services.d/run cannot be executed as well.)
So it shouldn't be set without choices.

Steps to reproduce the issue:

podman run -d -e S6_READ_ONLY_ROOT=1 --read-only=true --read-only-tmpfs=false --tmpfs=/run:exec --systemd=false --privileged linuxserver/jackett
podman exec -i -l mount | grep /run

Describe the results you received:
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k,mode=777,uid=1000,gid=1000)

Describe the results you expected:
tmpfs on /run type tmpfs (rw,nodev,relatime,size=65536k,mode=777,uid=1000,gid=1000)

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:            1.5.0
RemoteAPI Version:  1
Go Version:         go1.10.4
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.10.4
  podman version: 1.5.0
host:
  BuildahVersion: 1.10.1
  Conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.0, commit: unknown'
  Distribution:
    distribution: ubuntu
    version: "18.04"
  MemFree: 83152896
  MemTotal: 2088439808
  OCIRuntime:
    package: 'containerd.io: /usr/bin/runc'
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8
      commit: 425e105d5a03fabd737a126ad93d62a9eeede87f
      spec: 1.0.1-dev
  SwapFree: 176951296
  SwapTotal: 1073733632
  arch: amd64
  cpus: 1
  eventlogger: journald
  hostname: ip-172-26-31-172
  kernel: 4.18.0-1012-aws
  os: linux
  rootless: true
  uptime: 485h 13m 40.86s (Approximately 20.21 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /home/ubuntu/.config/containers/storage.conf
  ContainerStore:
    number: 15
  GraphDriverName: vfs
  GraphOptions: null
  GraphRoot: /home/ubuntu/.local/share/containers/storage
  GraphStatus: {}
  ImageStore:
    number: 6
  RunRoot: /tmp/1000
  VolumePath: /home/ubuntu/.local/share/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):
Description: Ubuntu 18.04.3 LTS Release: 18.0

kinbug

Source

xcffl

Most helpful comment

Try running with the --systemd=false flag.

rhatdan on 13 Aug 2019

👍3 🎉1

All 8 comments

I'll take this one - I was thinking about this as part of the ro=false work.

mheon on 13 Aug 2019

Also, to be clear, we don't intend on stripping security-related options by default. You'll probably have to pass noexec=false or something similar in options for the tmpfs mount.

mheon on 13 Aug 2019

Also, to be clear, we don't intend on stripping security-related options by default. You'll probably have to pass noexec=false or something similar in options for the tmpfs mount.

Passing an argument is acceptable.

Besides, this inelegant argument is intended to workaround a problem with all images based on s6-overlay.
For instance, all linuxserver.io images will encounter this error
````shell
➜ ~ sudo podman run --net=host linuxserver/jackett
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 10-adduser: executing...
usermod: no changes

      _         ()
     | |  ___   _    __
     | | / __| | |  /  \ 
     | | \__ \ | | | () |
     |_| |___/ |_|  \__/

Brought to you by linuxserver.io
We gratefully accept donations at:

https://www.linuxserver.io/donate/

GID/UID

User uid: 911

User gid: 911

[cont-init.d] 10-adduser: exited 0.
[cont-init.d] 30-config: executing...
[cont-init.d] 30-config: exited 0.
[cont-init.d] 99-custom-scripts: executing...
[custom-init] no custom files found exiting...
[cont-init.d] 99-custom-scripts: exited 0.
[cont-init.d] done.
[services.d] starting services
s6-supervise (child): fatal: unable to exec run: Permission denied
s6-supervise jackett: warning: unable to spawn ./run - waiting 10 seconds
[services.d] done.
s6-supervise jackett: warning: unable to spawn ./run - waiting 10 seconds
s6-supervise (child): fatal: unable to exec run: Permission denied
s6-supervise (child): fatal: unable to exec run: Permission denied
s6-supervise jackett: warning: unable to spawn ./run - waiting 10 seconds
s6-supervise (child): fatal: unable to exec run: Permission denied
s6-supervise jackett: warning: unable to spawn ./run - waiting 10 seconds
````

https://github.com/just-containers/s6-overlay/issues/248 said it can be workaround by this which is caused by noexec, but the cause is more complicated.

https://github.com/just-containers/s6-overlay/issues/158 said Docker encountered this problem before, but is fixed now (from my testing). It's caused because a RedHat's patch mount /run as tmpfs by default. Looks like Podman does this as well. Is this a intended behavior?

Hope there is a way to fix the problem completely as it makes a large number of interesting containers unusable.

xcffl on 13 Aug 2019

Try running with the --systemd=false flag.

rhatdan on 13 Aug 2019

👍3 🎉1

man podman run
...
       --systemd=true|false

       Run container in systemd mode. The default is true.

       If the command you running inside of the container is systemd or init, podman will setup tmpfs mount points in the following directories:

       /run, /run/lock, /tmp, /sys/fs/cgroup/systemd, /var/lib/journal

       It will also set the default stop signal to SIGRTMIN+3.

       This allow systemd to run in a confined container without any modifications.

       Note: On SELinux systems, systemd attempts to write to the cgroup file system.  Containers writing to the cgroup file system are denied by default.  The container_manage_cgroup boolean must be enabled for this to be allowed
       on an SELinux separated system.

       setsebool -P container_manage_cgroup true

rhatdan on 13 Aug 2019

Great, it works!

Is there any method to control this behavior during packaging? Or maybe I'll try to let the image producer add some documentation.

xcffl on 13 Aug 2019

What is triggering it is the name ending with "init". If this first program run in the image was not init, it would not happen.


    if c.Systemd && (strings.HasSuffix(c.Command[0], "init") ||
        strings.HasSuffix(c.Command[0], "systemd")) {
        options = append(options, libpod.WithSystemd())
    }

rhatdan on 13 Aug 2019

Dan - defaulting --systemd to true/false sounds like a good use case for
containers.conf, if we add one. Some people will rarely use systemd
containers and might want it to be opt in.

On Tue, Aug 13, 2019, 14:11 Daniel J Walsh notifications@github.com wrote:

What is triggering it is the name ending with "init". If this first
program run in the image was not init, it would not happen.

if c.Systemd && (strings.HasSuffix(c.Command[0], "init") ||
strings.HasSuffix(c.Command[0], "systemd")) {
options = append(options, libpod.WithSystemd())
}

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/containers/libpod/issues/3803?email_source=notifications&email_token=AB3AOCDTEYJIIXBNT2J6PDLQEL2O3A5CNFSM4ILMG2WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4GQC6I#issuecomment-520946041,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB3AOCCEYVDXEHVNRNF7KUTQEL2O3ANCNFSM4ILMG2WA
.