/kind bug
Description
I use Jenkins to build root-less containers using Podman. However, I notice that every time I reboot, I'm presented with the following message when trying to interact with Podman as the Jenkins user through the command line:
-bash-4.2$ podman ps --all
ERRO[0000] cannot join pause process. You may need to remove /tmp/run-996/libpod/pause.pid and stop all containers
ERRO[0000] you can use `system migrate` to recreate the pause process
ERRO[0000] open /proc/3959/ns/user: no such file or directory
Steps to reproduce the issue:
Get Podman in a working state
Create a Jenkins job which performs certain interactions with Podman through a SHELL step
Try interacting with Podman as the Jenkins user through the command line
Describe the results you received:
Podman complains about not being able to join the pause process.
Doing a podman system migrate seem to usually solve the problem, but doesn't seem very convenient.
Describe the results you expected:
Interacting with Podman as the Jenkins user within the Java process should be the same as interacting with Podman as the Jenkins user through the command line.
Additional information you deem important (e.g. issue happens only occasionally):
Output of podman version:
-bash-4.2$ podman version
Version: 1.4.4
RemoteAPI Version: 1
Go Version: go1.10.3
OS/Arch: linux/amd64
Output of podman info --debug:
ERRO[0000] cannot join pause process. You may need to remove /tmp/run-996/libpod/pause.pid and stop all containers
ERRO[0000] you can use `system migrate` to recreate the pause process
ERRO[0000] open /proc/3959/ns/user: no such file or directory
After getting it back to work by deleting the contents of /tmp/run-996/libpod:
```debug:
compiler: gc
git commit: ""
go version: go1.10.3
podman version: 1.4.4
host:
BuildahVersion: 1.9.0
Conmon:
package: podman-1.4.4-4.el7.centos.x86_64
path: /usr/libexec/podman/conmon
version: 'conmon version 0.3.0, commit: unknown'
Distribution:
distribution: '"centos"'
version: "7"
MemFree: 750735360
MemTotal: 1927163904
OCIRuntime:
package: runc-1.0.0-65.rc8.el7.centos.x86_64
path: /usr/bin/runc
version: 'runc version spec: 1.0.1-dev'
SwapFree: 2147479552
SwapTotal: 2147479552
arch: amd64
cpus: 2
hostname: jenkins
kernel: 3.10.0-1062.1.1.el7.x86_64
os: linux
rootless: true
uptime: 10m 17.93s
registries:
blocked: null
insecure:
**Package info (e.g. output of `rpm -q podman` or `apt list podman`):**
podman-1.4.4-4.el7.centos.x86_64
**Additional environment details (AWS, VirtualBox, physical, etc.):**
-bash-4.2$ rpm -q slirp4netns
slirp4netns-0.3.0-1.el7.x86_64
-bash-4.2$ cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
-bash-4.2$ cat /etc/subuid
jenkins:110000:655360000
-bash-4.2$ cat /etc/subgid
jenkins:110000:655360000
sysctl -w user.max_user_namespaces=15076
```
is /tmp really a tmpfs? I'd suggest to ensure /tmp is really cleaned up after each reboot, as there are other things that rely on that behaviour.
is
/tmpreally a tmpfs? I'd suggest to ensure/tmpis really cleaned up after each reboot, as there are other things that rely on that behaviour.
Aha, you're right. /tmp is just directly part of /, and hence not cleaned up.
How come it relies on being clean though? Wouldn't the things being dependent on it just pickup where they left?
/dev/sda1 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
[root@jenkins ~]# mount | grep tmp
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=933556k,nr_inodes=233389,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=188200k,mode=700,uid=1000,gid=1000)
I used the following commands on Centos (7.7) to enable tmpfs for /tmp and then rebooted.
Seems to have solved my issue, thanks for the pointer @giuseppe !
systemctl enable tmp.mount
systemctl start tmp.mount
How come it relies on being clean though?
in this particular case, we store there the PID for the pause process so it might not exist (or worse be a different process), when you reboot.
Other state that is supposed to not be persistent can be stored there.
XDG_RUNTIMED_DIR usually is under /run. Since you have it and it is a tmpfs, why not forcing XDG_RUNTIME_DIRto be under/run`?
That's indeed another solution. Thank you for the explanation.
Just manually re-running a jenkins project with podman commands in it produces the error.
@delenius Please check out issue #4655. Are we encountering the same problem?
Just manually re-running a jenkins project with podman commands in it produces the error.
@delenius Please check out issue #4655. Are we encountering the same problem?
Yes, same problem. I ended up removing my comment because I am running an older version of podman, (1.4.4, same as in #4655, on RHEL 7.7), and I figured it might have gotten fixed since then. I also found a workaround, which is to just add
rm /tmp/run-`id -u`/libpod/pause.pid
before the podman command, in the jenkins shell script. Not sure if this has some inherent dangers, mind you ;)
I was having this issue on fedora 31 after updating from 30
$ podman container ls
Error: could not get runtime: open /proc/1445/ns/user: no such file or directory
I fixed it with:
mv /run/user/$(id -u)/libpod{,-000}
I had to do podman system reset -f after reboot.
@wishachu:
That worked well for me. Thumbs up!