Podman: Unable to restart Toolbox containers stopped by podman (must reboot)

Created on 4 Oct 2019  路  17Comments  路  Source: containers/podman

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I am unable to re enter toolbox containers which were stopped using podman stop <container>

In order to re enter the container with toolbox enter <container> (or by podman start <container>), I need to reboot the system, after which I can re enter the container and its state is mantained

Steps to reproduce the issue:

  1. toolbox create

  2. toolbox enter

  3. podman stop fedora-toolbox-31

  4. toolbox enter (errors out, see below)

Describe the results you received:

toolbox -v enter error:

Error: unable to start container "fedora-toolbox-31": container '7dbef4079c4e61754d26135c9fab554b9130bf4e1bc7a2d484aace38a7468eca' already exists: OCI runtime error
toolbox: failed to start container fedora-toolbox-31

journalctl log sniplet:

Oct 04 18:35:26 rauros.figura.io conmon[12671]: conmon 7dbef4079c4e61754d26 <ndebug>: failed to write to /proc/self/oom_score_adj: Permission denied
Oct 04 18:35:26 rauros.figura.io conmon[12672]: conmon 7dbef4079c4e61754d26 <ninfo>: attach sock path: /run/user/1000/libpod/tmp/socket/7dbef4079c4e61754d26135c9fab554b9130bf4e1bc7a2d484aace38a7468eca/attach
Oct 04 18:35:26 rauros.figura.io conmon[12672]: conmon 7dbef4079c4e61754d26 <ninfo>: addr{sun_family=AF_UNIX, sun_path=/run/user/1000/libpod/tmp/socket/7dbef4079c4e61754d26135c9fab554b9130bf4e1bc7a2d484aace38a7468eca/attach}
Oct 04 18:35:26 rauros.figura.io conmon[12672]: conmon 7dbef4079c4e61754d26 <ninfo>: ctl fifo path: /var/home/returntrip/.local/share/containers/storage/overlay-containers/7dbef4079c4e61754d26135c9fab554b9130bf4e1bc7a2d484aace38a7468eca/userdata/ctl
Oct 04 18:35:26 rauros.figura.io conmon[12672]: conmon 7dbef4079c4e61754d26 <ninfo>: terminal_ctrl_fd: 12
Oct 04 18:35:26 rauros.figura.io conmon[12672]: conmon 7dbef4079c4e61754d26 <error>: Failed to create container: exit status 1
Oct 04 18:35:27 rauros.figura.io podman[12675]: 2019-10-04 18:35:27.029460577 +0200 CEST m=+0.050979420 container cleanup 7dbef4079c4e61754d26135c9fab554b9130bf4e1bc7a2d484aace38a7468eca (image=registry.fedoraproject.org/f31/fedora-toolbox:31, name=fedora-toolbox-31)

Describe the results you expected:
I should be able to access the container without rebooting

Additional information you deem important (e.g. issue happens only occasionally):
I have noticed this issue about 15 days ago while testing this: https://github.com/containers/libpod/issues?q=is%3Aissue+is%3Aclosed

Cleared .local/share/containers before testing

Output of software versions`:

podman-1.6.1-2.fc31.x86_64
toolbox-0.0.15-1.fc31.noarch
conmon-2.0.1-1.fc31.x86_64
fuse-overlayfs-0.6.4-2.fc31.x86_64
crun-0.10.1-1.fc31.x86_64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.13
  podman version: 1.6.1
host:
  BuildahVersion: 1.11.2
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.1-1.fc31.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.1, commit: 5e0eadedda9508810235ab878174dca1183f4013'
  Distribution:
    distribution: fedora
    version: "31"
  MemFree: 9118236672
  MemTotal: 16778067968
  OCIRuntime:
    package: crun-0.10.1-1.fc31.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.10.1
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 7985950720
  SwapTotal: 7985950720
  arch: amd64
  cpus: 16
  eventlogger: journald
  hostname: rauros.figura.io
  kernel: 5.3.1-300.fc31.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31.x86_64
    Version: |-
      slirp4netns version 0.4.0-beta.3+dev
      commit: bbd6f25c70d5db2a1cd3bfb0416a8db99a75ed7e
  uptime: 17m 1.5s
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /home/returntrip/.config/containers/storage.conf
  ContainerStore:
    number: 1
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.6.4-2.fc31.x86_64
      Version: |-
        fusermount3 version: 3.6.2
        fuse-overlayfs: version 0.6.4
        FUSE library version 3.6.2
        using FUSE kernel interface version 7.29
  GraphRoot: /var/home/returntrip/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 1
  RunRoot: /run/user/1000
  VolumePath: /var/home/returntrip/.local/share/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):
Physical Silverblue 31

kinbug stale-issue

All 17 comments

It looks like delete isn't firing... I'm assuming toolbox enter hass a podman start under the hood.

@giuseppe @rhatdan I'm starting to suspect that cleanup processes aren't working on F31 right now - seeing a lot of reports like this, where containers aren't properly cleaned up when they exit.

$ podman create --name dan -t fedora echo hello
99796ba4220e2aab88ef7b66a89142dbb1d4612d622ca39b14b0adcb806a7097
$ ./bin/podman start dan
$ ./bin/podman stop dan
99796ba4220e2aab88ef7b66a89142dbb1d4612d622ca39b14b0adcb806a7097
$ ./bin/podman start dan
Error: unable to start container "dan": container '99796ba4220e2aab88ef7b66a89142dbb1d4612d622ca39b14b0adcb806a7097' already exists: OCI runtime error

Shouldn't this work?

@giuseppe Ideas?

Works fine for me on F30 (with runc and crun both). Probably something with Cgroups V2?

I don't see it on Fedora 30 with cgroups v2, I'll debug further

I'm assuming toolbox enter hass a podman start under the hood.

Yes. toolbox enter is podman start ... followed by podman exec ....

I'm also having this kind of issue on Fedora 30 with podman version 1.6.1 as well. It seems that there is something wrong with clean-up process of podman right now.

[root@prod systemd]# journalctl --no-pager -u cnginx
Oct 09 02:14:09 prod.moe.ph systemd[1]: Started nginx-proxy container service.
Oct 09 02:14:09 prod.moe.ph podman[14497]: Error: error creating container storage: the container name "nginx-proxy" is already in use by "145ac5bc3662d3d45bcc53756ced1fef61a87c3fc6e954c66f899e46e6071f5e". You have to remove that container to be able to reuse that name.: that name is already in use
Oct 09 02:14:09 prod.moe.ph systemd[1]: cnginx.service: Main process exited, code=exited, status=125/n/a
Oct 09 02:14:09 prod.moe.ph systemd[1]: cnginx.service: Failed with result 'exit-code'.
[root@prod systemd]# podman ps -a | grep -i 145ac5bc3662d3d45b
[root@prod systemd]# 

This is also causing some issues on starting containers with specific IP address, since these zombie containers seems to be still taking the cni resources as well.

[root@prod systemd]# journalctl --no-page -u cblog
Oct 09 01:16:28 prod.moe.ph systemd[1]: Started blog container service.
Oct 09 01:16:28 prod.moe.ph podman[2495]: 2019-10-09 01:16:28.916887857 +0800 PST m=+0.093316762 volume create 657eaab17a67b103eac6ff4bddd7c5220d900e570375c148c66e0f95169f783d
Oct 09 01:16:28 prod.moe.ph podman[2495]: 2019-10-09 01:16:28.933363667 +0800 PST m=+0.109792592 container create ee1bc4953cb67c9f696ccc9c2042b9d60c976f85ec60feed5a228060a419401c (image=localhost/blog:v002, name=blog)
Oct 09 01:16:28 prod.moe.ph podman[2495]: time="2019-10-09T01:16:28+08:00" level=error msg="Error adding network: failed to allocate for range 0: requested IP address 10.88.0.82 is not available in range set 10.88.0.1-10.88.255.254"
Oct 09 01:16:28 prod.moe.ph podman[2495]: time="2019-10-09T01:16:28+08:00" level=error msg="Error while adding pod to CNI network \"podman\": failed to allocate for range 0: requested IP address 10.88.0.82 is not available in range set 10.88.0.1-10.88.255.254"
Oct 09 01:16:29 prod.moe.ph podman[2495]: 2019-10-09 01:16:29.040742207 +0800 PST m=+0.217171157 container remove ee1bc4953cb67c9f696ccc9c2042b9d60c976f85ec60feed5a228060a419401c (image=localhost/blog:v002, name=blog)
Oct 09 01:16:29 prod.moe.ph podman[2495]: Error: error configuring network namespace for container ee1bc4953cb67c9f696ccc9c2042b9d60c976f85ec60feed5a228060a419401c: failed to allocate for range 0: requested IP address 10.88.0.82 is not available in range set 10.88.0.1-10.88.255.254
Oct 09 01:16:29 prod.moe.ph systemd[1]: cblog.service: Main process exited, code=exited, status=127/n/a
Oct 09 01:16:29 prod.moe.ph systemd[1]: cblog.service: Failed with result 'exit-code'.

If you need debugging logs, I can provide it. I'm still experiencing this issue.

Based on these logs I don't believe this is the same issue. This is likely https://github.com/containers/libpod/issues/3906

You are likely running podman run --rm as part of a systemd service with KillMode set to something other than none. systemd is hitting Podman with a SIGKILL after the container exits as it attempts to remove the container.

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

No comment from the bot that the issue was due to close? That's strange.

It looks like you @mheon answered this a month ago, is there any reason to keep it open?

I tried today to stop a toolbox container via podman and re enter with toolbox enter and it did work. Also starts if I issue podman start <toolbox>. So I guess it can be closed.

Tested on latest Silverblue 31 (no overrides)

Hi.
I have very similar problem today. I stopped container with podman stop and tried to re-enter with toolbox:

toolbox -v enter --container mongodb
toolbox: running as real user ID 1000
toolbox: resolved absolute path for /usr/bin/toolbox to /usr/bin/toolbox
toolbox: checking if /etc/subgid and /etc/subuid have entries for user lukasz
toolbox: TOOLBOX_PATH is /usr/bin/toolbox
toolbox: running on a cgroups v2 host
toolbox: current Podman version is 1.7.0
toolbox: migration not needed: Podman version 1.7.0 is unchanged
toolbox: Fedora generational core is f31
toolbox: base image is fedora-toolbox:31
toolbox: container is mongodb
toolbox: checking if container mongodb exists
toolbox: calling org.freedesktop.Flatpak.SessionHelper.RequestSession
toolbox: starting container mongodb
toolbox: /etc/profile.d/toolbox.sh already mounted in container mongodb
Error: unable to start container "mongodb": container '07c1fcae8ebdee7aa3815544aeac13e94abcae64794171e729ef27397e79e9dc' already exists: OCI runtime error
toolbox: failed to start container mongodb

Same error as above.
SB 31
Podman 1.7.0

Still can't repro - Fedora 31, latest toolbox, Podman, and crun.

I'd be happy to work on this one if someone can provide a reliable reproducer. A --log-level=debug trace of podman start on the toolbox container (and any associated conmon logs printed to syslog) would also help.

Updated to podman 1.8.0, rebooted, problem gone. Cannot reproduce again. I believe it could had been related to permissions in ~/.local/share/container

Was this page helpful?
0 / 5 - 0 ratings