Podman: unable to run rootless container: error with setrlimit RLIMIT_NPROC

Created on 26 May 2020  ·  50Comments  ·  Source: containers/podman

/kind bug

Description

I'm unable to run a rootless container, podman returns the following error:

$ podman run --rm golang:1.14-alpine go version
Error: setrlimit `RLIMIT_NPROC`: Invalid argument: OCI runtime error

I'm on Fedora 32. I'm sure running a rootless container worked with Fedora 31 but I'm not sure if the problem appeared as soon as I migrated to Fedora 32 or if it appeared later since I don't use podman that often.

Steps to reproduce the issue:

  1. podman run --rm golang:1.14-alpine go version (or any image really)

Describe the results you received:

Error: setrlimit `RLIMIT_NPROC`: Invalid argument: OCI runtime error

Describe the results you expected:

I expect the container to run correctly.

Additional information you deem important (e.g. issue happens only occasionally):

I tried:

  • uninstalling podman, libpod, conmon, container-selinux, containers-common
  • removing the directories $HOME/.local/share/containers and $HOME/.config.containers
  • reinstalling podman

but the error remains.

Output of podman version:

Version:            1.9.2
RemoteAPI Version:  1
Go Version:         go1.14.2
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  gitCommit: ""
  goVersion: go1.14.2
  podmanVersion: 1.9.2
host:
  arch: amd64
  buildahVersion: 1.14.8
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.16-2.fc32.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.16, commit: 1044176f7dd177c100779d1c63931d6022e419bd'
  cpus: 8
  distribution:
    distribution: fedora
    version: "32"
  eventLogger: file
  hostname: thor
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.6.14-300.fc32.x86_64+debug
  memFree: 3684970496
  memTotal: 16685846528
  ociRuntime:
    name: crun
    package: crun-0.13-2.fc32.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.13
      commit: e79e4de4ac16da0ce48777afb72c6241de870525
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.0.0-1.fc32.x86_64
    version: |-
      slirp4netns version 1.0.0
      commit: a3be729152a33e692cd28b52f664defbf2e7810a
      libslirp: 4.2.0
  swapFree: 2147479552
  swapTotal: 2147479552
  uptime: 1h 11m 1.23s (Approximately 0.04 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/vincent/.config/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 0
    stopped: 4
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.0.0-1.fc32.x86_64
      Version: |-
        fusermount3 version: 3.9.1
        fuse-overlayfs: version 1.0.0
        FUSE library version 3.9.1
        using FUSE kernel interface version 7.31
  graphRoot: /home/vincent/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 8
  runRoot: /run/user/1000/containers
  volumePath: /home/vincent/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.9.2-1.fc32.x86_64

It's a desktop PC.

kinbug

Most helpful comment

for anyone out there still having this issue, you have to recreate the affected containers

Sorry what does it mean? Can I recreate the containers keeping all theirs data?

All 50 comments

Are you using cgroup v1?
Do you have a libpod.conf in your homedir? If so remove it.
rm ~/.config/containers/libpod.conf

Also what does this command show?

$ podman run --help | grep pids-limit
--pids-limit int Tune container pids limit (set 0 for unlimited, -1 for server defaults)

As far as I can tell I'm not using cgroups v1, I don't see systemd.unified_cgroup_hierarchy=0 in my /proc/cmdline and I'm pretty sure I never changed it.

I don't have a libpod.conf file, in fact right now I don't even have the directory $HOME/.config/containers.

The output from your command:

$ podman run --help | grep pids-limit
--pids-limit int                           Tune container pids limit (set 0 for unlimited) (default 2048)
$ ls -l /usr/share/containers/containers.conf /etc/containers/containers.conf
$ rpm -q podman
podman-1.9.2-1.fc32.x86_64
$ grep pids_limit /etc/containers/containers.conf 
$ grep pids_limit /usr/share/containers/containers.conf 
# pids_limit = 2048
# cat /proc/self/cgroup 
11:perf_event:/
10:cpu,cpuacct:/
9:pids:/user.slice/user-3267.slice/[email protected]
8:cpuset:/
7:devices:/user.slice
6:freezer:/
5:memory:/user.slice/user-3267.slice/[email protected]
4:blkio:/
3:hugetlb:/
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-3267.slice/[email protected]/apps.slice/apps-org.gnome.Terminal.slice/vte-spawn-960193fe-002b-412d-909f-e1ee7bdde126.scope
0::/user.slice/user-3267.slice/[email protected]/apps.slice/apps-org.gnome.Terminal.slice/vte-spawn-960193fe-002b-412d-909f-e1ee7bdde126.scope

$ podman info | grep cgroup
cgroupVersion: v1

Actually looking at your podman info, I see you are in cgroup V2? Which even makes this more strange. @giuseppe thoughts?

I'm assuming you want me to run the command you posted above ?

$ ls -l /usr/share/containers/containers.conf /etc/containers/containers.conf
ls: cannot access '/etc/containers/containers.conf': No such file or directory
-rw-r--r--. 1 root root 12725 Apr  9 22:11 /usr/share/containers/containers.conf
$ rpm -q podman
podman-1.9.2-1.fc32.x86_64
$ grep pids_limit /usr/share/containers/containers.conf
# pids_limit = 2048
[root@thor vincent]# cat /proc/self/cgroup
0::/user.slice/user-1000.slice/[email protected]/gnome-launched-Alacritty.desktop-12748.scope
````

$ podman info | grep cgroup
cgroupVersion: v2
```

can you show me the output for $ cat /proc/self/limits?

Also, do you have any override for default_ulimits? You can easily find it out with something like grep -A 10 default_ulimits /etc/containers/* /usr/share/containers/* ~/.config/containers/*.

$ cat /proc/self/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             262144               524288               processes
Max open files            1024                 2097152              files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       63491                63491                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

I only have the file /usr/share/containers/containers.conf with this:

# default_ulimits = [
#  “nofile”=”1280:2560”,
# ]

Everything looks fine, and it works for me, but not for you.

My user account has:

Max processes 62461 62461 processes
Max open files 1024 524288 files

Which are smaller then yours.

Running with strace I'm getting this:

$ strace -e setrlimit podman run --rm -ti golang:1.14-alpine go version
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=88160, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=88160, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=88160, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=88160, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=88160, si_uid=0} ---
setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=88160, si_uid=0} ---
setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=88170, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=88160, si_uid=0} ---
setrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=88171, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=88160, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=88160, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=88160, si_uid=0} ---
Error: setrlimit `RLIMIT_NPROC`: Invalid argument: OCI runtime error

The man page says this about EPERM:

EPERM

An unprivileged process tried to raise the hard limit; the CAP_SYS_RESOURCE capability is required to do this. Or, the caller tried to increase the hard RLIMIT_NOFILE limit above the current kernel maximum (NR_OPEN). Or, the calling process did not have permission to set limits for the process specified by pid. 

I checked with getcap /usr/bin/podman but it doesn't return anything, which I assume means there's no capabilities. However I don't know if I'm on the right path here, is any of the binary from podman/conmon/runc supposed to have CAP_SYS_RESOURCE ?

Ok so I figured out a workaround.

Based on the setrlimit call above it's trying to set the soft/hard nproc limit to 4194304 however my hard limit was set to 524288. I changed the hard limit to 8388608 in /etc/security/limits.conf and now it works.

The code is not supposed to do this though. It looks like we have a bug. The code is supposed to set the limit to the limit of the user.

Before the settings change, did the podman unshare ulimits show any thing intersting?

I reverted my change and ran podman unshare ulimit -a (ulimits doesn't seem to exist):

$ podman unshare ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63491
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 262144
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I think the big number must be coming from here?

cat /proc/sys/kernel/pid_max

const (
    oldMaxSize = uint64(1048576)
)

// getDefaultProcessLimits returns the nproc for the current process in ulimits format
// Note that nfile sometimes cannot be set to unlimited, and the limit is hardcoded
// to (oldMaxSize) 1048576 (2^20), see: http://stackoverflow.com/a/1213069/1811501
// In rootless containers this will fail, and the process will just use its current limits
func getDefaultProcessLimits() []string {
    rlim := unix.Rlimit{Cur: oldMaxSize, Max: oldMaxSize}
    oldrlim := rlim
    // Attempt to set file limit and process limit to pid_max in OS
    dat, err := ioutil.ReadFile("/proc/sys/kernel/pid_max")
    if err == nil {
        val := strings.TrimSuffix(string(dat), "\n")
        max, err := strconv.ParseUint(val, 10, 64)
        if err == nil {
            rlim = unix.Rlimit{Cur: uint64(max), Max: uint64(max)}
        }
    }
    defaultLimits := []string{}
    if err := unix.Setrlimit(unix.RLIMIT_NPROC, &rlim); err == nil {
        defaultLimits = append(defaultLimits, fmt.Sprintf("nproc=%d:%d", rlim.Cur, rlim.Max))
    } else {
        if err := unix.Setrlimit(unix.RLIMIT_NPROC, &oldrlim); err == nil {
            defaultLimits = append(defaultLimits, fmt.Sprintf("nproc=%d:%d", oldrlim.Cur, oldrlim.Max))
        }
    }
    return defaultLimits
}

But the first one should fail, and then we should set the second.

This is indeed what I have:

$ cat /proc/sys/kernel/pid_max
4194304

but looking at the strace:

setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)

it tries to set 4096*1024 or 1024*1024 which seems to be oldMaxSize in your code. But given that my hard limit is 524288 that will also fail.

Right if they both fail, the code should be returning.
defaultLimits := []string{}
Which should tell the system to not set the rlimits at all.

Oh right.

Right now I can confirm that the following workarounds work:

  • increasing the nproc hard limit in /etc/security/limits.conf
  • creating $HOME/.config/containers/containers.conf with this:
default_ulimits = [
 "nproc=200000:400000",
]

@rhatdan What's the verdict here - does this look like a Podman bug?

I want to try to get it to happen locally but have not had time. Perhaps an issue for an Intern.

@sujil02 Could you see if you can get this to fail?

@sujil02 Could you see if you can get this to fail?

Sure thing will have a look.

FWIW I started getting this error on Arch Linux also once Linux 5.7 hit the core repositories a few days ago. I'd previously been running rootless containers on this system just fine for months.

I am also using cgroups v1:

$ podman info | grep cgroup
  cgroupVersion: v1

Edit: Oops, I just downgraded my kernel to 5.6.15 on Arch and I still can't run any containers. Something else must have broken it. For reference:

$ podman version
Version:            1.9.3
RemoteAPI Version:  1
Go Version:         go1.14.3
Git Commit:         5d44534fff6877b1cb15b760242279ae6293154c
Built:              Mon May 25 22:25:50 2020
OS/Arch:            linux/amd64

I've definitely run containers in the past week, so it must have broken sometime around then.

this error/bug has just start for me as well... none of my containers will start :/

@sujil02 Were you able to check this out?

@sujil02 Were you able to check this out?

Could not simulate. I used Podman-2.0 dev on fedora 32 with cgroups: v1

Just to note, I have the problem with cgroup v2. I didn’t test with cgroup v1.

I'm also in cgroup v2... only fix was to do what @vrischmann suggested https://github.com/containers/libpod/issues/6389#issuecomment-634258120

I had this issue as well. I found I had an old file in /etc/security/limits.d/ which set nproc. I deleted that file and the issue went away for me.

The way the code is supposed to work is to examine your current settings and then set it no higher then the user has.

@rhatdan I hear ya... I am not certain but I think it happen with a kernel update. Both my home and work machines were effected.

I keep updating everything on my desktop weekly and this phenomenon (can start existing containers but can't create new ones) surfaced right after updating libpod to 2.0 and downgrading to 1.9.3 immediately made it work again. (I'm running Gentoo and kernel version 5.6.19 with cgroups v1 - I've been using 5.6 for a while now.)

I have the same issue with podman version 1.9.3 and podman 2.0.0 on Arch Linux, kernels 5.6.15, 5.7.2, and 5.7.4. In short the error is:

$ strace -e setrlimit podman start postgres10
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=535583, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=535583, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=535583, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=535583, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=535583, si_uid=0} ---
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=535583, si_uid=0} ---
setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=535583, si_uid=0} ---
...
setrlimit(RLIMIT_NPROC, {rlim_cur=4096*1024, rlim_max=4096*1024}) = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_NPROC, {rlim_cur=1024*1024, rlim_max=1024*1024}) = -1 EPERM (Operation not permitted)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=535591, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=535615, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=535636, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
Error: unable to start container "085b86b34400b94915889b1175af2c1a40f06aba73d0ff004f8b741d4cea107f": container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:378: setting rlimits for ready process caused \\\"error setting rlimit type 6: operation not permitted\\\"\"": OCI runtime permission denied error
+++ exited with 125 +++

Full strace here: podman-setrlimit.txt. Other information:

$ podman info | grep cgroup
  cgroupVersion: v1
$ cat /proc/self/limits | grep -E "Max (open files|processes)"
Max processes             62972                62972                processes 
Max open files            1024                 524288               files

Could you try this with crun instead of runc?

setrlimit(RLIMIT_NPROC, {rlim_cur=40961024, rlim_max=40961024})
Why is there a multiplier of 1024 on these?

Ok with some of my older containers, I am seeing these fields being set.

$ podman inspect charming_ride --format '{{ .HostConfig.Ulimits }}'
[{RLIMIT_NOFILE 300 300} {RLIMIT_NPROC 50 50}]

But newer containers I am creating with podman 2.0. Does not create the Ulimits.

$ podman create -ti alpine sh
fc7abf0fddfa30e1c375e44f0c70f180ff63be8a19f816cbbcb46d292a76f750
$ podman inspect -l --format '{{ .HostConfig.Ulimits }}'
[]

@rhatdan yes you're right. I see the same thing here: old containers have ulimits set, new containers created with podman 2.0 don't. I re-created my old containers that had been refusing to start and now they start properly in user mode. :partying_face:

Since 2.0 is being released, I am going to close this as fixed in 2.0.

@rhatdan to be clear we have to re-create our containers, though. Perhaps mention that in some release notes or a tweet or something?

I believe this is only effecting people with /etc/security/limits.conf being set for their rootless users.

So I just got back to my computer where this bug is happening and there's nothing set in /etc/security/limits.conf.

With your inspect command @rhatdan I think I found out the bug:

$ podman inspect --format '{{ printf "%+v" .HostConfig.Ulimits }}' determined_boyd
[{Name:RLIMIT_NOFILE Soft:1048576 Hard:1048576} {Name:RLIMIT_NPROC Soft:524288 Hard:262144}]

but ulimit says this:

$ ulimit -u --hard
524288
$ ulimit -u --soft
262144

so basically it looks like the container is created with soft/hard the wrong way around.

for anyone out there still having this issue, you have to recreate the affected containers

for anyone out there still having this issue, you have to recreate the affected containers

Sorry what does it mean? Can I recreate the containers keeping all theirs data?

I got my existing toolbox container working by setting the nproc limit for my user in /etc/security/limits.conf to the value reported for the existing container with podman inspect --format '{{ printf "%+v" .HostConfig.Ulimits }}'

@llunved please share your limits.conf
thanks

I got same error message when I try to restart the running pod. It started happening just today, was working totally fine few days before. No changes in limits.conf

for anyone out there still having this issue, you have to recreate the affected containers

Sorry what does it mean? Can I recreate the containers keeping all theirs data?

FWIW, I managed to backup my containers in F32, then rebase on F33 and restore the backups using this tutorial: https://fedoramagazine.org/backup-and-restore-toolboxes-with-podman/

@llunved please share your limits.conf
thanks
The container in question had NPROC set to 62509, while the system default was 62508:

# podman inspect --format '{{ printf "%+v" .HostConfig.Ulimits }}' fedora-toolbox-32 [{Name:RLIMIT_NOFILE Soft:524288 Hard:524288} {Name:RLIMIT_NPROC Soft:62509 Hard:62509}]

So I added the following line to limits.conf (my user is in the group wheel - but you could use another one or the user):

@wheel hard nproc 62509

Was this page helpful?
0 / 5 - 0 ratings