Podman: Rootless podman error: could not get runtime: open /proc/31678/ns/user: no such file or directory

Created on 24 Feb 2020  ·  65Comments  ·  Source: containers/podman

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Running any podman command (for example podman info or podman version) as non-root only gives me the following output:

% podman info
Error: could not get runtime: open /proc/31678/ns/user: no such file or directory

It works when running as root.

Steps to reproduce the issue:

  1. Run podman info

Describe the results you received:

Error: could not get runtime: open /proc/31678/ns/user: no such file or directory

Describe the results you expected:
The actual command output...

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

% sudo podman version
Version:            1.8.0
RemoteAPI Version:  1
Go Version:         go1.13.7
OS/Arch:            linux/amd64

Output of podman info --debug:

UNOBTAINABLE

Package info (e.g. output of rpm -q podman or apt list podman):

% xbps-query podman
architecture: x86_64
filename-sha256: 514f725a15bc57ec717606103eaedd77f603d62c3b29007166ef9d235b308ac2
filename-size: 18MB
homepage: https://podman.io/
install-date: 2020-02-24 09:11 CET
installed_size: 54MB
license: Apache-2.0
maintainer: Cameron Nemo <[email protected]>
metafile-sha256: 347e1471b2966a2350df5661216db7a1f60103544ebbe406d8e4a48dde169782
pkgver: podman-1.8.0_1
repository: https://alpha.de.repo.voidlinux.org/current
shlib-requires:
    libpthread.so.0
    libgpgme.so.11
    libassuan.so.0
    libgpg-error.so.0
    libseccomp.so.2
    librt.so.1
    libdevmapper.so.1.02
    libc.so.6
short_desc: Simple management tool for containers and images
source-revisions: podman:4351b38206
state: installed

Additional environment details (AWS, VirtualBox, physical, etc.):
Physical box running Void Linux.

kinbug rootless

Most helpful comment

just fyi, I hit the same on RHEL 8.2 using podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64 where the system got unexpected rebooted. Cleaning /tmp/run-1001/libpod/pause.pid where /tmp is not on a tmpfs fixed it

All 65 comments

Oh I just noticed one interesting and probably important thing: The PID in the path (/proc/...) is always the same value: 31678. Of course that PID can not exist. That is probably why the thing is failing.

Could you check if you have a podman running as the user? If yes, could you kill it and then see if the problem goes away?

No podman running, also tried restarting the box.

What OS are you running podman on? Could this be your kernel does not support User Namespace?
ls -l /proc/self/ns

@rhatdan I'm running Void Linux. The Kernel supports user namespaces. The issue is probably not the namespace but podman figuring out some bogus PID when building the path.

% uname -srvmpio
Linux 5.4.21_1 #1 SMP PREEMPT Thu Feb 20 09:09:55 UTC 2020 x86_64 unknown unknown GNU/Linux

% cat /etc/os-release 
NAME="void"
ID="void"
DISTRIB_ID="void"
PRETTY_NAME="void"

% ls -l /proc/self/ns 
total 0
lrwxrwxrwx 1 socke socke 0 Feb 24 13:42 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 socke socke 0 Feb 24 13:42 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 socke socke 0 Feb 24 13:42 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 socke socke 0 Feb 24 13:42 net -> 'net:[4026531992]'
lrwxrwxrwx 1 socke socke 0 Feb 24 13:42 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 socke socke 0 Feb 24 13:42 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 socke socke 0 Feb 24 13:42 user -> 'user:[4026531837]'
lrwxrwxrwx 1 socke socke 0 Feb 24 13:42 uts -> 'uts:[4026531838]'

Could this issue be ulimits related?

@giuseppe Nope. Installed 1.8.1-rc1 and the error is still there.

I have downgraded a little bit. 1.7.0 is also affected but 1.6.5 works. So this got broken somewhere between 1.6.5 and 1.7.0.

Okay, this is odd. After I've downgraded it yesterday, i've now upgraded it back to 1.8.0. Then it worked. Then I rebooted to make sure no processes from the old version are running. And it still works. Don't know what went wrong there.

thanks for the update. Let's close the issue for now.

If it still happens, feel free to re-open it with more details

Have this issue too after each reboot of the host system. Only a podman system migrate resolves this temporary (until next boot). Can't downgrade/upgrade the package, as this is a new void linux installation and only podman 1.8.0 packages is available.

Is the runtime directory /run not on a tmpfs? We rely on the /run directory being cleaned on a reboot.

/run is on tmpfs

devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G   84K  3.9G   1% /dev/shm
tmpfs           3.9G  448K  3.9G   1% /run
/dev/sda1        64G  2.9G   61G   5% /
cgroup          3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs           3.9G     0  3.9G   0% /tmp
% podman --log-level debug info                                                                                                                                               :(
WARN[0000] The cgroups manager is set to systemd but there is no systemd user session available 
WARN[0000] For using systemd, you may need to login using an user session 
WARN[0000] Alternatively, you can enable lingering with: `loginctl enable-linger 2001` (possibly as root) 
WARN[0000] Falling back to --cgroup-manager=cgroupfs and --events-backend=file 
DEBU[0000] Using conmon: "/usr/libexec/podman/conmon"   
DEBU[0000] Initializing boltdb state at /home/void/.local/share/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /home/void/.local/share/containers/storage 
DEBU[0000] Using run root /var/tmp/run-2001/containers  
DEBU[0000] Using static dir /home/void/.local/share/containers/storage/libpod 
DEBU[0000] Using tmp dir /var/tmp/run-2001/libpod/tmp   
DEBU[0000] Using volume path /home/void/.local/share/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] Not configuring container store              
DEBU[0000] Initializing event backend file              
DEBU[0000] using runtime "/usr/bin/runc"                
DEBU[0000] using runtime "/usr/bin/crun"                
DEBU[0000] Failed to add pause process to systemd sandbox cgroup: <nil> 
ERRO[0000] open /proc/30771/ns/user: no such file or directory

removed .config and .local directorys and started from scratch. Now it works also after a reboot.

Great, happy to hear that.

I ran into this again on another machine. This time the PID is 7577 but I guess this doesn't matter. Where the hell does podman get these PIDs from?

Also I can't run podman system migrate nor podman system reset. podman is completely inoperable from this user. Not sure how to reset it. The issue occoured after a reboot and of course no podman process is running in BG.

@faulesocke Are you on Podman 1.8.1? That should have the fix.

If not, can you try removing /run/user/$UID/libpod/pause.pid and see if that fixes it (replace $UID with the UID of the user running Podman)?

I've just hit this on podman 1.6.2, /run is tmpfs.
Considering it hasn't been given yet I'm posting the strace before I try to reinstall podman to a more recent version. Redacted /home/<user>/src/<project>.

$ podman build --tag yelgeb/githooks:latest --squash-all --file "${PWD}/launch/oci/githooks/Dockerf
ile" --authfile ${PWD}/config/oci/begleybrothers-auth.json
ERRO[0000] open /proc/12992/ns/user: no such file or directory 
, "/home/<user>/src/<project>/launc"..., "--authfile", "/home/<user>/src/<project>/confi"...], 0x7fffc2ac
1810 /* 97 vars */) = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000b\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=144976, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff852c22000
mmap(NULL, 2221184, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff8527f4000
mmap(0x7ff8527f1000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x44000) = 0x7f
f8527f1000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libseccomp.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200%\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=288976, ...}) = 0
mmap(0x7ff852595000, 98304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2f000) = 0x7f
f852595000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\"\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=31680, ...}) = 0
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14560, ...}) = 0
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff851f5a000
mprotect(0x7ff851f5d000, 2093056, PROT_NONE) = 0
mmap(0x7ff85215c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ff8
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff851b69000
mprotect(0x7ff851d50000, 2097152, PROT_NONE) = 0
mmap(0x7ff851f50000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7
56000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libassuan.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p3\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=75928, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff852c20000
mmap(NULL, 2171232, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff851956000
mprotect(0x7ff851967000, 2097152, PROT_NONE) = 0
mmap(0x7ff851b67000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11000) = 0x7ff
851b67000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgpg-error.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340+\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=84032, ...}) = 0
mmap(NULL, 2179304, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff851741000
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff852c1d000
arch_prctl(ARCH_SET_FS, 0x7ff852c1d740) = 0
mprotect(0x7ff851f50000, 16384, PROT_READ) = 0
mprotect(0x7ff851954000, 4096, PROT_READ) = 0
mprotect(0x7ff851b67000, 4096, PROT_READ) = 0
mprotect(0x7ff85215c000, 4096, PROT_READ) = 0
mprotect(0x7ff852a0d000, 4096, PROT_READ) = 0
mprotect(0x7ff852364000, 4096, PROT_READ) = 0
mprotect(0x7ff852595000, 94208, PROT_READ) = 0
mprotect(0x7ff8527f1000, 4096, PROT_READ) = 0
mprotect(0x28f7000, 4096, PROT_READ)    = 0
mprotect(0x7ff852c3a000, 4096, PROT_READ) = 0
munmap(0x7ff852c24000, 89326)           = 0
set_tid_address(0x7ff852c1da10)         = 18682
set_robust_list(0x7ff852c1da20, 24)     = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7ff8527f9cb0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restor
er=0x7ff852806890}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7ff8527f9d50, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO,
 sa_restorer=0x7ff852806890}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
brk(NULL)                               = 0x3e58000
brk(0x3e79000)                          = 0x3e79000
openat(AT_FDCWD, "/proc/self/fd", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0
getdents(3, /* 6 entries */, 32768)     = 144
geteuid()                               = 1000
openat(AT_FDCWD, "/proc/self/cmdline", O_RDONLY) = 3
close(3)                                = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff852bdd000
rt_sigaction(SIGSEGV, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SI
rt_sigaction(SIGUSR2, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SI
rt_sigaction(SIGALRM, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGTERM, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
SIGINFO, sa_restorer=0x7ff852806890}, NULL, 8) = 0
GINFO, sa_restorer=0x7ff852806890}, NULL, 8) = 0
INFO, sa_restorer=0x7ff852806890}, NULL, 8) = 0
rt_sigaction(SIGIO, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIGI
rt_sigaction(SIGPWR, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIG
rt_sigaction(SIGSYS, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIG
restorer=0x7ff852806890}, 8) = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7ff8527f9cb0, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_SIGINFO
GINFO, sa_restorer=0x7ff852806890}, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7ff8527f9d50, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|
SA_SIGINFO, sa_restorer=0x7ff852806890}, NULL, 8) = 0
rt_sigaction(SIGRT_3, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SI
GINFO, sa_restorer=0x7ff852806890}, NULL, 8) = 0
rt_sigaction(SIGRT_5, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SI
rt_sigaction(SIGRT_6, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SI
rt_sigaction(SIGRT_7, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SI
rt_sigaction(SIGRT_8, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SI
rt_sigaction(SIGRT_9, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SI
rt_sigaction(SIGRT_10, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_S
rt_sigaction(SIGRT_11, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_S
rt_sigaction(SIGRT_12, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_S
rt_sigaction(SIGRT_13, {sa_handler=0x4621e0, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_S
getuid()                                = 1000
read(6, "", 1499)                       = 0
futex(0xc42025b948, FUTEX_WAKE, 1)      = 1
write(5, "l\1\0\1+\0\0\0\2\0\0\0\220\0\0\0\1\1o\0\"\0\0\0/org/fre"..., 203) = 203
futex(0xc42025b948, FUTEX_WAKE, 1)      = 1
futex(0x2984008, FUTEX_WAIT, 0, NULL)   = -1 EAGAIN (Resource temporarily unavailable)
epoll_pwait(4, [], 128, 0, NULL, 0)     = 0
epoll_pwait(4, [{EPOLLOUT, {u32=1387863392, u64=140704516480352}}], 128, -1, NULL, 43481408) = 1
epoll_pwait(4, [{EPOLLIN, {u32=1387863600, u64=140704516480560}}], 128, -1, NULL, 43481408) = 1
read(6, "conmon version 2.0.3\ncommit: unk"..., 512) = 37
read(6, 0xc4204b0025, 1499)             = -1 EAGAIN (Resource temporarily unavailable)
epoll_pwait(4, [], 128, 0, NULL, 0)     = 0
epoll_pwait(4, [{EPOLLHUP, {u32=1387863600, u64=140704516480560}}], 128, -1, NULL, 43481408) = 1
read(6, "", 1499)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc420063ddc) = 0
close(6)                                = 0
epoll_pwait(4, [], 128, 0, NULL, 842351135704) = 0
epoll_pwait(4, [{EPOLLOUT, {u32=1387863392, u64=140704516480352}}], 128, -1, NULL, 43481408) = 1
epoll_pwait(4, [{EPOLLHUP, {u32=1387863600, u64=140704516480560}}], 128, -1, NULL, 43481408) = 1
read(10, "", 512)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 10, 0xc420063ddc) = 0
close(10)                               = 0
epoll_pwait(4, [], 128, 0, NULL, 842351135704) = 0
epoll_pwait(4, [{EPOLLOUT, {u32=1387863392, u64=140704516480352}}], 128, -1, NULL, 43481408) = 1
epoll_pwait(4, [{EPOLLHUP, {u32=1387863600, u64=140704516480560}}], 128, -1, NULL, 43481408) = 1
read(10, "", 512)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 10, 0xc4203f55dc) = 0
close(10)                               = 0
futex(0xc42025b948, FUTEX_WAKE, 1)      = 1
futex(0xc420546148, FUTEX_WAKE, 1)      = 1
futex(0x2984008, FUTEX_WAIT, 0, NULLERRO[0000] open /proc/12992/ns/user: no such file or directory
)   = ?
+++ exited with 1 +++

The issue occoured after a reboot and of course no podman process is running in BG.

/run/ should not survive a reboot. Is /run on a tmpfs?

Considering it hasn't been given yet I'm posting the strace before I try to reinstall podman to a more recent version. Redacted /home/<user>/src/<project>.

have you used strace -f? As @mheon pointed out, the issue should be fixed in 1.8.1

Thanks for looking at this @giuseppe.
I think you've conflated two comments?

I posted the strace in case it helped - initially I couldn't get podman more recent than 1.6, but switching to the Suse repos allowed me to get 1.8.1 - the RH container bot managed builds have been broken for some time see issue #5502.

Happy for this to stay closed, as mentioned I posted mainly as FYI

@mheon No this is 1.8.0. /run is on a tmpfs but /run/user/$UID does not exist. May this be the root cause of the problem?

Is this a systemd based system? Is $XDG_RUNTIME_DIR defined?

@rhatdan In case of @faulesocke and me, no, we do not use a systemd based system.

Which probably means XDG_RUNTIME_DIR is not set?

@rhatdan exactly, no systemd and no XDG_RUNTIME_DIR. Of course I can set it and configure the system to create /run/user/$UID for me but I guess if it's required, at least the error message should indicate this.

@rhatdan:

Is this a systemd based system? Is $XDG_RUNTIME_DIR defined?

Yes to both:

$ echo $XDG_RUNTIME_DIR
/run/user/1000

Ok if this is a new issue, we should open a new issue and not just add comments to this closed issue.

No, keep closed. We were able to get more recent packages and haven't seen this since.

I'm seeing this behavior on CentOS 8.1.1911 with podman-1.8.2-81.1.el8.x86_64.

Using systemd but XDG_RUNTIME_DIR is not set for the user running podman commands.

I had initially tried rming ~/.cache and ~/.local then podman system migrate but it did not resolve the issue. Now I'm installing the latest podman packages available in the kubic repo and still hitting this problem.

Rebooting the system is not feasible as it's a CI builder. Should XDG_RUNTIME_DIR be set for the UID of the user running podman commands? The directory does not exist on the system I can reliably reproduce this issue on.

v1.8.2 should definitely have the latest code for handling missing pause processes. The XDG_RUNTIME_DIR thing does seem suspicious, but I believe we have patches in for handling that.

Can you try a podman system reset to completely wipe Podman's state, and see if that resolves things?

[jenkins-build@braggi05 ~]$ podman system reset
Error: could not get runtime: open /proc/13238/ns/user: no such file or directory

[jenkins-build@braggi05 ~]$ sudo podman system reset --force

[jenkins-build@braggi05 ~]$ docker ps
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Error: could not get runtime: open /proc/13238/ns/user: no such file or directory

Can you see if you have a /run/user/$UID/libpod/pause.pid (substitute your UID in there) and if so, what are the contents of that file?

Can you give the output of podman info --log-level=debug?

@rhatdan @giuseppe It looks like podman system reset is being blocked by this. We should investigate making system reset more durable - if a userns issue can prevent it being run, it's a lot less useful

[jenkins-build@braggi05 ~]$ stat /run/user/$UID/libpod/pause.pid
stat: cannot stat '/run/user/1108/libpod/pause.pid': No such file or directory

[jenkins-build@braggi05 ~]$ podman info --log-level=debug
WARN[0000] The cgroup manager or the events logger is set to use systemd but there is no systemd user session available 
WARN[0000] For using systemd, you may need to login using an user session 
WARN[0000] Alternatively, you can enable lingering with: `loginctl enable-linger 1108` (possibly as root) 
WARN[0000] Falling back to --cgroup-manager=cgroupfs and --events-backend=file 
DEBU[0000] Using conmon: "/usr/bin/conmon"              
DEBU[0000] Initializing boltdb state at /home/jenkins-build/.local/share/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /home/jenkins-build/.local/share/containers/storage 
DEBU[0000] Using run root /tmp/run-1108                 
DEBU[0000] Using static dir /home/jenkins-build/.local/share/containers/storage/libpod 
DEBU[0000] Using tmp dir /tmp/run-1108/libpod/tmp       
DEBU[0000] Using volume path /home/jenkins-build/.local/share/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] Not configuring container store              
DEBU[0000] Initializing event backend file              
DEBU[0000] using runtime "/usr/bin/runc"                
WARN[0000] Error initializing configured OCI runtime crun: no valid executable found for OCI runtime crun: invalid argument 
ERRO[0000] could not get runtime: open /proc/13238/ns/user: no such file or directory 

I think this means that podman system reset is attempting to enter the user namespace.

Same issue here after a restart of the machine. podman only works as root and not as a rootless user. I get the same error with a different PID. Here are some answers of some questions in the issue.

  • there is no process called podman which is running as the rootless user which I could kill

  • os:

$ cat /etc/os-release
NAME="openSUSE Leap"
VERSION="15.1"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.1"
PRETTY_NAME="openSUSE Leap 15.1"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.1"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
  • in case of the default kernel of leap 15.1, which does not run with fuse-overlayfs, I installed a new kernel:
$ uname -r
5.6.3-1.ge840c7b-default
  • namespaces:
$ ls -l /proc/self/ns
lrwxrwxrwx 1 root root 0 20. Apr 10:23 cgroup -> cgroup:[4026531835]
lrwxrwxrwx 1 root root 0 20. Apr 10:23 ipc -> ipc:[4026531839]
lrwxrwxrwx 1 root root 0 20. Apr 10:23 mnt -> mnt:[4026531840]
lrwxrwxrwx 1 root root 0 20. Apr 10:23 net -> net:[4026531992]
lrwxrwxrwx 1 root root 0 20. Apr 10:23 pid -> pid:[4026531836]
lrwxrwxrwx 1 root root 0 20. Apr 10:23 pid_for_children -> pid:[4026531836]
lrwxrwxrwx 1 root root 0 20. Apr 10:23 time -> time:[4026531834]
lrwxrwxrwx 1 root root 0 20. Apr 10:23 time_for_children -> time_for_children:[4026531834]
lrwxrwxrwx 1 root root 0 20. Apr 10:23 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 20. Apr 10:23 uts -> uts:[4026531838]
  • /run is on tmpfs: tmpfs 16G 756K 16G 1% /run

  • podman info as the rootless user

$ podman info
host:
  BuildahVersion: 1.13.1
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.10-lp151.2.1.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.10, commit: unknown'
  Distribution:
    distribution: '"opensuse-leap"'
    version: "15.1"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 2662
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 2662
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  MemFree: 32803192832
  MemTotal: 33667706880
  OCIRuntime:
    name: runc
    package: runc-1.0.0~rc6-lp151.1.2.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc6
      spec: 1.0.1-dev
  SwapFree: 1076883456
  SwapTotal: 1076883456
  arch: amd64
  cpus: 8
  eventlogger: file
  hostname: <my-hostname>
  kernel: 5.6.3-1.ge840c7b-default
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.4-lp151.2.6.1.x86_64
    Version: |-
      slirp4netns version 0.4.4
      commit: unknown
      libslirp: 4.2.0
  uptime: 15m 41.43s
registries:
  search:
  - docker.io
store:
  ConfigFile: /home/<rootlessuser>/.config/containers/storage.conf
  ContainerStore:
    number: 16
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.7.6-lp151.5.1.x86_64
      Version: |-
        fusermount3 version: 3.6.1
        fuse-overlayfs: version 0.7.6
        FUSE library version 3.6.1
        using FUSE kernel interface version 7.29
  GraphRoot: <another-path-for-storage>/containers/storage/root
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 679
  RunRoot: <another-path-for-storag>/containers/storage/run
  VolumePath: <another-path-for-storage>/containers/storage/root/volumes
  ```

- `/run/user/2662/` does not exists, while 2662 is the uid of the rootless user
- `$XDG_RUNTIME_DIR` is not set

As you see, I have a specific storage configuration, because I store the containers/images on a different partition. So, one solution which works, or works in case to solve this problem is, if I do the following:

mv ~/.config/containters/storage.conf ~/.config/containters/storage.conf.bak
podman ps
mv ~/.config/containters/storage.conf.bak ~/.config/containters/storage.conf

This creates the dir under `/run/user/2662/` and a second `podman ps` after the last `mv` gives me the container I have created before. I created a systemd oneshot which do this workaround, because after every reboot, I run in the same issue.

After this workaround, I observe the following: It is a systemd system and I have service files for a pod. In this pod I have various containers. I created systemd files for the pod and the container with `podman generate systemd -f`. The pod should start when the machine starts. But it does not work. I don't know why at the moment, because I did not investigate it much. But starting the pod service is hanging and takes a lot of time without any result. For this the containers do not start either. Another thing after the reboot and the workaround is, if I list the containers:

$ podman ps -a
ERRO[0000] error joining network namespace for container 44c145e925aaa83363cb966b2945cfd7456a659c90a4ed2ef974b92530bb3830: error retrieving network namespace at /run/user/2662/netns/cni-f721a062-b591-4151-35a3-15663fb38fa7: failed to Statfs "/run/user/2662/netns/cni-f721a062-b591-4151-35a3-15663fb38fa7": no such file or directory
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[...] some container information here [...]
44c145e925aa k8s.gcr.io/pause:3.1 10 days ago Up 44 minutes ago 0.0.0.0:8005->8005/tcp a6d566eb2bc0-infra
```
So there is another issue/bug with pods in podman. But I am not sure if it is a result of this issue, or if it is anything else. To get my environment work, I will try running my containers without a pod. Before this issue, I already was not so happy about the use and behaviour of pods.

What Podman version?

What Podman version?

$ podman version
Version:            1.8.0
RemoteAPI Version:  1
Go Version:         go1.12.12
OS/Arch:            linux/amd64

Ok forget the workaround of my last https://github.com/containers/libpod/issues/5306#issuecomment-616411969 .

I figured out that this error occurs, if I have a running rootless container and reboot the machine. It fails with the standard storage.conf under .config/containers/ as well as a modified configuration. It works for me, if I delete /var/tmp/run-<uid>. Afterwards a podman command creates this dir again and all podman commands work normally. I tested it by running podman run -dt --rm -P nginx with a default storage.conf as well as with a modified one. If I only have the image, the container is not running and I reboot, everything works. It occurs only with running containers. If I run the nginx container as root (with a modified storage.conf under /etc/containers/) and reboot, I don't have any problem.

Another point I don't understand is, I changed runroot dir in storage.conf (default is /var/tmp/run-<uid>), but podman still uses this dir to look for usernamespace which ends with the error of this issue.

Is /var/tmp not a tmpfs on your system?

Also, I would strongly recommend updating to at least 1.8.1, there are a very large number of bugfixes in that release.

No it is not. Ok for this reason the dir run-<uid> is not removed by a reboot and concludes in this error.
Ok, thanks for the advice to update :)

I updated to 1.9.0 and it works really good! Thanks for your help :)

just fyi, I hit the same on RHEL 8.2 using podman-1.6.4-11.module+el8.2.0+6368+cf16aa14.x86_64 where the system got unexpected rebooted. Cleaning /tmp/run-1001/libpod/pause.pid where /tmp is not on a tmpfs fixed it

That is definitely not a situation that we consider safe - Podman's rundir absolutely needs to be on a tmpfs for safety across reboots.

have this issue with 1.9.1~1 on ubuntu, 1.8.2~140 works

Can you open a fresh issue? This one is getting rather long, and this definitely should be fixed

have this issue with 1.9.1~1

it is fixed with: 89d4940a3787ccc871c92950a79347efc0d5c58c

I resolved this issue with podman 1.6.4 on CentoOS 8.1.1911 by adding the following to the end of "/usr/lib/tmpfiles.d/tmp.conf"

R! /tmp/run-992

This way the directory is removed on reboot.

992 is the userid

confirmed when tmp_dir is not on tmpfs (tmp.mount systemd unit is masked)

$ podman version
Version:            1.6.4
RemoteAPI Version:  1
Go Version:         go1.13.4
OS/Arch:            linux/amd64

$ cat /etc/redhat-release 
CentOS Linux release 8.2.2004 (Core) 

$ systemctl status tmp.mount
● tmp.mount
   Loaded: masked (Reason: Unit tmp.mount is masked.)
   Active: inactive (dead)

rm -rf /tmp/run-$(id -u)/ resolve issue without reboot

@rhatdan: Just to summarize the impression from this conversation, the podman stores runtime data somewhere and "rely on the ... directory being cleaned on a reboot". But, if user run containers under unprivileged user (via sudo/su), the crucial $XDG_RUNTIME_DIR pointing to /run is not defined. Than it most likely fallbacks to /tmp, which is by default in e.g. CentOS 8 cloud images on persistent rootfs (not tmpfs) and not cleaned by default on reboot. Moreover, even the podman 2.0+ (referring to current version 2.0.5 in CentOS 8 Stream) still doesn't deal with this situation and user is left in broken state after reboot. The fact that he needs to clean something in /tmp manually, or ensure /tmp is mounted on tmpfs, or explicitly export $XDG_RUNTIME_DIR ...

... results in a terrible experience how to run rootless containers and this approach is (sadly) quickly abandoned in favour of relatively less problematic Docker UX. I don't think you CAN "rely" on a thing, which is obivously not matched in so many cases.

I'm still very confused as who why cloud images believe it's acceptable to have /tmp not be a tmpfs. Are there any directories that are tmpfs? Can we assume that /run is safe to use? (And what happens if we encounter another cloud image that has /tmp as a tmpfs but not /run? Since it seems like logic has escaped this particular part of the world...)

After pulling up some FHS standards, it seems like /tmp is not guaranteed to be a tmpfs, but /run is guaranteed to at least remove files on every reboot, which is what we need. So it seems like my anger is a little misplaced, because Cent is technically following FHS (instead, it's FHS's fault for not specifying that the temporary files directory should be, well, temporary).

@rhatdan It seems like /run is a lot safer to use given this.

Can we assume that /run is safe to use? (And what happens if we encounter another cloud image that has /tmp as a tmpfs but not /run? Since it seems like logic has escaped this particular part of the world...)

The issue is that /run is not writeable for rootless users.

Well that's just lovely... I'm officially out of ideas for usable directories, then.

I suppose we could potentially look into having the rootless user namespace make + mount our own tmpfs, but that would cause bad things to happen if the pause process was ever killed - we'd detect a restart and assume all containers and pods were now stopped and unmounted, even though that might not be true. This generally results in Podman completely losing track of all running containers.

I would say that this should be the responsibility of the user. Setup a systemd onetime bootup script that cleans out /tmp/podman at boot time.

Why user should substitute a design flaw? And, if yes, how is he supposed to know he needs to clear something and what?

If it's crucial to have those data non-persitent, I guess the Podman packaging should introduce some mechanism to clean it on reboot automatically. Or, introduce a dedicated /run/podman/ location with permissions similar to /tmp/...

The issue is that /run is not writeable for rootless users.

And what about /run/user/$(id -u)/podman/ ?

it might be useful https://www.freedesktop.org/software/systemd/man/pam_systemd.html

... user runtime directory /run/user/$UID is either created or mounted as new "tmpfs" file system ...

/run/user/$uid is created by pam_systemd and used for storing files used by running processes for that user.

We do default to /run/user - so for most installations using systemd, this is not an issue. It is an issue if the system does not use systemd, or the user running Podman does not have a proper login session (sudo and su sessions do not cause systemd to make the /run/user/$UID directory).

If the files are under /tmp/podman/* then it is easy to add a tmpfiles.d script that removes these files on reboot.

@mheon one more reason why XDG_RUNTIME_DIR is may not been set properly is when one calls podman through a 3rd process like gitlab-runner. This is exactly what we do (using rootless podman from gitlab-runner through a custom executor) and at each call podman receives a new runtime directory (/tmp/custom-executor${RANDOMNUMBER}/run-${UID}) by default (which makes the solution leak _podman pause_ processes for each job).

I do not know if you considered this in the design but please also note that /run/user/$UID supposed to be deleted when the user logs out from it's last session. From man pam_systemd:

$XDG_RUNTIME_DIR
Path to a user-private user-writable directory that is bound to the user login time on the machine. It is automatically created the first time a user logs in and removed on the user's final logout.

Also, it would be nice to handle cases like sudo -i -u and su -l the same way how normal logins... Probably this is just a dream as pam_systemd is not called and as such no one would handle the directory/variables but who knows :)

We'd really like for sudo and su to be handled correctly, but we really
need systemd to handle that on their end - not much we can do on the Podman
side of things.

On being removed at logout - that can be annoying for services that run
after the user logs out (or systemd services using the User directive) but
we can disable that behavior but enabling lingering.

On Wed, Nov 25, 2020, 08:26 Balazs Zachar notifications@github.com wrote:

@mheon https://github.com/mheon one more reason why XDG_RUNTIME_DIR is
may not been set properly is when one calls podman through a 3rd process
like gitlab-runner. This is exactly what we do (using rootless podman from
gitlab-runner through a custom executor) and at each call podman receives a
new runtime directory (/tmp/custom-executor${RANDOMNUMBER}/run-${UID}) by
default.

I do not know if you considered this in the design but please also note
that /run/user/$UID supposed to be deleted when the user logs out from it's
last session. From man pam_systemd:

$XDG_RUNTIME_DIR
Path to a user-private user-writable directory that is bound to the user
login time on the machine. It is automatically created the first time a
user logs in and removed on the user's final logout.

Also, it would be nice to handle cases like sudo -i -u and su -l the same
way how normal logins... Probably this is just a dream as pam_systemd is
not called and as such no one would handle the directory/variables but who
knows :)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/containers/podman/issues/5306#issuecomment-733704871,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB3AOCDWAQR4ASXUKLT65LTSRUAWRANCNFSM4K2DOZDA
.

Best you or we can do is handle this on reboot by using tmpfiles.d, if /tmp is not on a tmpfs.
Podman has no concept of a user logging in or out, and really should not grow this ability.

@rhatdan IMO that sounds good but only if podman does not rely on the XDG_RUNTIME_DIR variable. As we can see, that is what is causing most of the issues for users in case of rootless podman. It is not set for sudo -i, su -l and also problematic when podman started by systemd and or another process like gitlab-runner (which sets this environment variable to a random directory for each job).
In case podman would use the "tmpfiles.d interface" of systemd directly, then we could have a fixed path which works in every above mentioned case and the directory cleanup among reboots would be handled by systemd as you said.

Well if the content is always placed on /tmp, then I believe we can setup a tmpfiles.d to look for the directory podman anywhere under /tmp and remove content. We just need to have a path that the tmpfiles.d file would look for.

Was this page helpful?
0 / 5 - 0 ratings