Podman: "Address already in use" when restarting container

Created on 2 Aug 2019 · 24Comments · Source: containers/podman

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Steps to reproduce the issue:

sudo podman run --name=test_port -d -p 9999:9999 ubuntu:18.04 sleep infinity
sudo podman --log-level debug restart test_port

Describe the results you received:
When restarting, it shows "address already in use".
If I stop, and then start, it works however.

DEBU[0000] Initializing boltdb state at /home/docker-data/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /home/docker-data
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /home/docker-data/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Using volume path /home/docker-data/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] cached value indicated that overlay is supported
DEBU[0000] cached value indicated that metacopy is not being used
DEBU[0000] cached value indicated that native-diff is usable
DEBU[0000] backingFs=xfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false
DEBU[0000] Initializing event backend journald
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
DEBU[0000] Setting maximum workers to 2
DEBU[0000] Stopping ctr 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 (timeout 10)
DEBU[0000] Stopping container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 (PID 1253)
DEBU[0000] Sending signal 15 to container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767
DEBU[0010] container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 did not die within timeout 10000000000
WARN[0010] Timed out stopping container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767, resorting to SIGKILL
DEBU[0010] Created root filesystem for container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 at /home/docker-data/overlay/f1399b213c963b07f07ccd091f8b3ce133d4aaacec12b859c99fd1b106c2a024/merged
DEBU[0010] Recreating container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 in OCI runtime
DEBU[0010] Successfully cleaned up container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767
DEBU[0010] /etc/system-fips does not exist on host, not mounting FIPS mode secret
DEBU[0010] Setting CGroups for container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 to machine.slice:libpod:5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767
DEBU[0010] reading hooks from /usr/share/containers/oci/hooks.d
DEBU[0010] reading hooks from /etc/containers/oci/hooks.d
DEBU[0010] Created OCI spec for container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 at /home/docker-data/overlay-containers/5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767/userdata/config.json
DEBU[0010] /usr/libexec/podman/conmon messages will be logged to syslog
DEBU[0010] running conmon: /usr/libexec/podman/conmon    args="[-s -c 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 -u 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 -n test_port -r /usr/sbin/runc -b /home/docker-data/overlay-containers/5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767/userdata -p /var/run/containers/storage/overlay-containers/5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767/userdata/pidfile --exit-dir /var/run/libpod/exits --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/docker-data --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 --socket-dir-path /var/run/libpod/socket -l k8s-file:/home/docker-data/overlay-containers/5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767/userdata/ctr.log --log-level debug --syslog]"
DEBU[0010] Cleaning up container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767
DEBU[0010] Tearing down network namespace at /var/run/netns/cni-a36cd478-3857-e576-6e7d-face72a879d1 for container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767
INFO[0010] Got pod network &{Name:test_port Namespace:test_port ID:5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 NetNS:/var/run/netns/cni-a36cd478-3857-e576-6e7d-face72a879d1 PortMappings:[{HostPort:9999 ContainerPort:9999 Protocol:tcp HostIP:}] Networks:[] NetworkConfig:map[]}
INFO[0010] About to del CNI network podman (type=bridge)
DEBU[0010] unmounted container "5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767"
DEBU[0010] Failed to restart container 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767: cannot listen on the TCP port: listen tcp4 :9999: bind: address already in use
DEBU[0010] Worker#0 finished job [(*LocalRuntime) Restart func1]/5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767 (cannot listen on the TCP port: listen tcp4 :9999: bind: address already in use)
DEBU[0010] Pool[restart, 5871e240b73da769154be80cc039e9fdbd0bbe8167f0f3d02e3649cd0e7d5767: cannot listen on the TCP port: listen tcp4 :9999: bind: address already in use]
ERRO[0010] cannot listen on the TCP port: listen tcp4 :9999: bind: address already in use

Describe the results you expected:
Should be able to restart successfully.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

podman version 1.4.4

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.10.8
  podman version: 1.4.4
host:
  BuildahVersion: 1.9.0
  Conmon:
    package: podman-1.4.4-2.el7.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 0.3.0, commit: unknown'
  Distribution:
    distribution: '"rhel"'
    version: "7.6"
  MemFree: 176705536
  MemTotal: 1920000000
  OCIRuntime:
    package: containerd.io-1.2.5-3.1.el7.x86_64
    path: /usr/sbin/runc
    version: |-
      runc version 1.0.0-rc6+dev
      commit: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
      spec: 1.0.1-dev
  SwapFree: 0
  SwapTotal: 0
  arch: amd64
  cpus: 1
  hostname: MASKED
  kernel: 3.10.0-957.21.3.el7.MASKED.20190617.34.x86_64
  os: linux
  rootless: false
  uptime: 114h 57m 18.7s (Approximately 4.75 days)
registries:
  blocked: null
  insecure: null
  search:
  - registry.access.redhat.com
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.centos.org
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 8
  GraphDriverName: overlay
  GraphOptions: null
  GraphRoot: /home/docker-data
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 15
  RunRoot: /var/run/containers/storage
  VolumePath: /home/docker-data/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):
OpenStack VM

kinbug stale-issue

Source

hanshsieh

All 24 comments

@mheon Looks like a race condition, where the container is not fully stopped before the second container starts?

rhatdan on 2 Aug 2019

Looks like it. We're hitting the timeout, sending SIGKILL, and then try to start.

vrothberg on 2 Aug 2019

Perhaps in restart we should wait for the container cleanup to complete before starting?

rhatdan on 2 Aug 2019

@rhatdan, we're getting closer. The issue is that the network isn't cleaned up when stopping the container during restart.

vrothberg on 2 Aug 2019

Ah, there's more behind. Still tracking it down. Maybe @mheon or @baude know where to shoot?

vrothberg on 2 Aug 2019

Looks like we need to find a way to avoid binding the ports (and store them in the container).

vrothberg on 2 Aug 2019

The ports are held open by Conmon. We don't want to avoid new bindings entirely - this is likely the old container's Conmon not dying in time, and we won't have the ports held open if we don't try to reserve them again.

mheon on 2 Aug 2019

Per the logs, we timed out killing with the normal stop signal, and resorted to SIGKILL - but the container did die. Restart should be waiting for the exit file to be created before transitioning to the 'start' part, so we should have a guarantee that conmon is dead (or very close to it) by the time we're getting to restarting the container. However, seemingly, the old conmon is managing to stick around.

mheon on 2 Aug 2019

@mheon I am exactly on that track atm. Playing around with killing conmon. Looks like we're on a race.

vrothberg on 2 Aug 2019

It's a bug in conmon, preparing a fix.

vrothberg on 2 Aug 2019

👍1

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

github-actions[bot] on 2 Nov 2019

We fixed this one by forcibly killing Conmon on restart, I believe

mheon on 2 Nov 2019

👍1

I seem to be running into this issue. I get unable to start container "dcn": cannot listen on the TCP port: listen tcp4 :39123: bind: address already in use when starting a (root) container again after stopping it.

EDIT: What conmon version was this fixed in? Maybe I don't have the fix in the latest podman release yet.

EDIT2:

Logs:

DEBU[0000] using conmon: "/usr/bin/conmon"              
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /var/lib/containers/storage 
DEBU[0000] Using run root /var/run/containers/storage   
DEBU[0000] Using static dir /var/lib/containers/storage/libpod 
DEBU[0000] Using tmp dir /var/run/libpod                
DEBU[0000] Using volume path /var/lib/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] [graphdriver] trying provided driver "overlay" 
DEBU[0000] cached value indicated that overlay is supported 
DEBU[0000] cached value indicated that metacopy is not being used 
DEBU[0000] cached value indicated that native-diff is usable 
DEBU[0000] backingFs=extfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false 
DEBU[0000] Initializing event backend journald          
DEBU[0000] using runtime "/usr/sbin/runc"               
WARN[0000] Error initializing configured OCI runtime crun: no valid executable found for OCI runtime crun: invalid argument 
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist 
DEBU[0000] Made network namespace at /var/run/netns/cni-ab20fd67-055b-0cf3-89cc-180ad7222413 for container 0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 
INFO[0000] Got pod network &{Name:dcn Namespace:dcn ID:0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 NetNS:/var/run/netns/cni-ab20fd67-055b-0cf3-89cc-180ad7222413 Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[{HostPort:39123 ContainerPort:22 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}]} 
INFO[0000] About to add CNI network cni-loopback (type=loopback) 
DEBU[0000] overlay: mount_data=lowerdir=/var/lib/containers/storage/overlay/l/CNS7KFQZNRKEVVKUEZQUCEYZ4Y:/var/lib/containers/storage/overlay/l/U74DVPD2TW4QCUCVGFYDMDEKDQ:/var/lib/containers/storage/overlay/l/QV6WF7LAWVMLMGUYHHQ2GW4QYC:/var/lib/containers/storage/overlay/l/UJY6PXZTMY37QO7KIRSEFLIQVB:/var/lib/containers/storage/overlay/l/XOTIGD4LKSS4DHRWWWA4OYPLCW:/var/lib/containers/storage/overlay/l/UHY7O5NDAP4OWEJV4ICTJLDHCN:/var/lib/containers/storage/overlay/l/KD27AID45U7QBLDLACU23VKYSB:/var/lib/containers/storage/overlay/l/VOGUIGMYO74L23BF4CP3U74PHC:/var/lib/containers/storage/overlay/l/QZVOALXN67FMJJEOL7H4BXIOWF:/var/lib/containers/storage/overlay/l/AOQO6WE5LKDGQ2HJVSHJ3YBKQY:/var/lib/containers/storage/overlay/l/TMLF45SQGBYNUWEG3ETXOYPXSM:/var/lib/containers/storage/overlay/l/PQ43PKUKJNFTUJ3536QRVMB4JD:/var/lib/containers/storage/overlay/l/Y7ZJFNE2K2BANCPJGX2M6RYV6X,upperdir=/var/lib/containers/storage/overlay/44553b727e9b7fdf13f9a5e4168867b9618a60f81d72ac96d6a7f9556b62738b/diff,workdir=/var/lib/containers/storage/overlay/44553b727e9b7fdf13f9a5e4168867b9618a60f81d72ac96d6a7f9556b62738b/work 
DEBU[0000] mounted container "0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8" at "/var/lib/containers/storage/overlay/44553b727e9b7fdf13f9a5e4168867b9618a60f81d72ac96d6a7f9556b62738b/merged" 
DEBU[0000] Created root filesystem for container 0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 at /var/lib/containers/storage/overlay/44553b727e9b7fdf13f9a5e4168867b9618a60f81d72ac96d6a7f9556b62738b/merged 
INFO[0000] Got pod network &{Name:dcn Namespace:dcn ID:0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 NetNS:/var/run/netns/cni-ab20fd67-055b-0cf3-89cc-180ad7222413 Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[{HostPort:39123 ContainerPort:22 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}]} 
INFO[0000] About to add CNI network podman (type=bridge) 
DEBU[0000] [0] CNI result: Interfaces:[{Name:cni-podman0 Mac:ee:9a:95:0b:cb:ef Sandbox:} {Name:veth18b0249d Mac:1e:35:fe:9c:04:01 Sandbox:} {Name:eth0 Mac:ea:07:a4:52:05:95 Sandbox:/var/run/netns/cni-ab20fd67-055b-0cf3-89cc-180ad7222413}], IP:[{Version:4 Interface:0xc4203cb408 Address:{IP:10.88.0.22 Mask:ffff0000} Gateway:10.88.0.1}], Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:<nil>}], DNS:{Nameservers:[] Domain: Search:[] Options:[]} 
INFO[0000] No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4] 
INFO[0000] IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844] 
DEBU[0000] /etc/system-fips does not exist on host, not mounting FIPS mode secret 
DEBU[0000] Setting CGroups for container 0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 to machine.slice:libpod:0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 
DEBU[0000] reading hooks from /usr/share/containers/oci/hooks.d 
DEBU[0000] reading hooks from /etc/containers/oci/hooks.d 
DEBU[0000] Created OCI spec for container 0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 at /var/lib/containers/storage/overlay-containers/0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8/userdata/config.json 
DEBU[0000] /usr/bin/conmon messages will be logged to syslog 
DEBU[0000] running conmon: /usr/bin/conmon               args="[--api-version 1 -s -c 0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 -u 0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 -r /usr/sbin/runc -b /var/lib/containers/storage/overlay-containers/0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8/userdata -p /var/run/containers/storage/overlay-containers/0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8/userdata/pidfile -l k8s-file:/var/lib/containers/storage/overlay-containers/0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8/userdata/ctr.log --exit-dir /var/run/libpod/exits --socket-dir-path /var/run/libpod/socket --log-level debug --syslog --conmon-pidfile /var/run/containers/storage/overlay-containers/0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8]"
DEBU[0000] Cleaning up container 0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 
DEBU[0000] Tearing down network namespace at /var/run/netns/cni-ab20fd67-055b-0cf3-89cc-180ad7222413 for container 0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 
INFO[0000] Got pod network &{Name:dcn Namespace:dcn ID:0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8 NetNS:/var/run/netns/cni-ab20fd67-055b-0cf3-89cc-180ad7222413 Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[{HostPort:39123 ContainerPort:22 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}]} 
INFO[0000] About to del CNI network podman (type=bridge) 
DEBU[0000] unmounted container "0338c8ce9ab4aa02a433018157ae8f255a2510a3c1d3868faae436ba0e831da8" 
ERRO[0000] unable to start container "dcn": cannot listen on the TCP port: listen tcp4 :39123: bind: address already in use

Netstat:

tcp        1      0 localhost:39123         localhost:40300         CLOSE_WAIT 
tcp        0      0 localhost:40352         localhost:39123         FIN_WAIT2  
tcp        1      0 localhost:39123         localhost:40282         CLOSE_WAIT 
tcp        1      0 localhost:39123         localhost:40274         CLOSE_WAIT 
tcp        1      0 localhost:39123         localhost:40374         CLOSE_WAIT 
tcp        1      0 localhost:39123         localhost:40316         CLOSE_WAIT 
tcp        1      0 localhost:39123         localhost:40278         CLOSE_WAIT 
tcp        0      0 localhost:40374         localhost:39123         FIN_WAIT2  
tcp        1      0 localhost:39123         localhost:40270         CLOSE_WAIT 
tcp        1      0 localhost:39123         localhost:40352         CLOSE_WAIT

DaanDeMeyer on 27 Nov 2019

Can you get a podman info to see what version Conmon is?

mheon on 27 Nov 2019

host:
  BuildahVersion: 1.11.3
  CgroupVersion: v1
  Conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.3, commit: unknown'
  Distribution:
    distribution: ubuntu
    version: "18.04"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 29040357376
  MemTotal: 33498562560
  OCIRuntime:
    name: runc
    package: 'runc: /usr/sbin/runc'
    path: /usr/sbin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 2147479552
  SwapTotal: 2147479552
  arch: amd64
  cpus: 12
  eventlogger: journald
  hostname: pcl001477a
  kernel: 5.0.0-37-generic
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: 'slirp4netns: /usr/bin/slirp4netns'
    Version: |-
      slirp4netns version 0.4.2
      commit: unknown
  uptime: 1h 50m 53.42s (Approximately 0.04 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - quay.io
store:
  ConfigFile: /home/daan_dm/.config/containers/storage.conf
  ContainerStore:
    number: 0
  GraphDriverName: vfs
  GraphOptions: {}
  GraphRoot: /home/daan_dm/.local/share/containers/storage
  GraphStatus: {}
  ImageStore:
    number: 0
  RunRoot: /run/user/1000
  VolumePath: /home/daan_dm/.local/share/containers/storage/volumes

I'm pretty sure my issue is caused by the conmon issue I just created (it's a side effect of the winsz bug).

DaanDeMeyer on 27 Nov 2019

This issue is effecting my system as well, this effects several containers.
This is worked around by calling this in the middle of a po

podman stop containername
for i in `netstat -pnl|sed '/8112/!d; /conmon/!d' -|awk '{split($NF,a,"/"); print a[1]}'`; do kill -9 $i; done
podman start containername

Where 8112 is the port in question, this is not ideal as it complicates restarting services with systemd
ExecSartPre=bash -c 'for i innetstat -pnl|sed '/8112/!d; /conmon/!d' -|awk '{split($NF,a,"/"); print a[1]}'; do kill -9 $i; done'

conmon --version
conmon version 2.0.2
commit: 186a550ba0866ce799d74006dab97969a2107979

podman info
host:
  BuildahVersion: 1.11.3
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.2-1.fc31.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.2, commit: 186a550ba0866ce799d74006dab97969a2107979'
  Distribution:
    distribution: fedora
    version: "31"
  MemFree: 616427520
  MemTotal: 12562640896
  OCIRuntime:
    name: crun
    package: crun-0.10.6-1.fc31.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.10.6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 1455120384
  SwapTotal: 1719660544
  arch: amd64
  cpus: 8
  eventlogger: journald
  hostname: svhqdownload01.0xcbf.net
  kernel: 5.3.14-300.fc31.x86_64
  os: linux
  rootless: false
  uptime: 17h 23m 19.14s (Approximately 0.71 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - quay.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 8
  GraphDriverName: overlay
  GraphOptions:
    overlay.mountopt: nodev,metacopy=on
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  ImageStore:
    number: 11
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

cat /etc/redhat-release
Fedora release 31 (Thirty One)

ACiDGRiM on 11 Dec 2019

i cannot replicate this on master on fedora ...

$ sudo podman info
host:
  BuildahVersion: 1.11.5
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.2-1.fc31.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.2, commit: 186a550ba0866ce799d74006dab97969a2107979'
  Distribution:
    distribution: fedora
    version: "31"
  MemFree: 14543925248
  MemTotal: 33021677568
  OCIRuntime:
    name: crun
    package: crun-0.10.6-1.fc31.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.10.6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 0
  SwapTotal: 0
  arch: amd64
  cpus: 8
  eventlogger: journald
  hostname: DESKTOP-SH5EG3J.localdomain
  kernel: 5.3.15-300.fc31.x86_64
  os: linux
  rootless: false
  uptime: 10h 24m 23.57s (Approximately 0.42 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 3
  GraphDriverName: overlay
  GraphOptions:
    overlay.mountopt: nodev,metacopy=on
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  ImageStore:
    number: 3
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

baude on 11 Dec 2019

From the podman info output, @ACiDGRiM looks like he has a more than new enough version to have both the Conmon fix and the Podman patches to manually kill Conmon from @vrothberg

mheon on 11 Dec 2019

I wonder if we aren't talking about a different issue here. I see mention of systemd unit files in his example; I wonder if we aren't seeing an improperly-formatted one that's not working as advertised, causing Podman to die but not killing Conmon.

@ACiDGRiM Can you provide the systemd unit files you're using? Were they generated by podman generate systemd?

mheon on 11 Dec 2019

I've regenerated the systemd units with podman generate systemd.

They used to be approximately as below:

[Unit]
Description=Daemon
After=network.target mnt-content.mount mnt-data.mount
Requires=mnt-content.mount mnt-data.mount

[Service]
Restart=always
ExecStart=/usr/bin/podman start -a container
ExecStop=/usr/bin/podman stop -t 10 container

[Install]
WantedBy=multi-user.target

I'm testing with the newly generated sysd units now, the noticeable difference being the type is forking instead of a standard unit.

ACiDGRiM on 11 Dec 2019

Getting the service files right is _very_ tough. We will publish a blog next Monday getting into some details and presenting a new best practice.

If you want to create new containers (in contrast to using existing ones as podman generate systemd currently does), we recommend the following pattern:

[Unit]
Description=Podman in Systemd

[Service]
Restart=on-failure
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStart=/usr/bin/podman run --conmon-pidfile  **/%t/%n-pid**  --cidfile /%t/%n-cid -d alpine:latest top
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`"
KillMode=none
Type=forking
PIDFile=**/%t/%n-pid**

[Install]
WantedBy=multi-user.target

vrothberg on 11 Dec 2019

The article has been published: https://www.redhat.com/sysadmin/podman-shareable-systemd-services

vrothberg on 16 Dec 2019

I've created the services with podman generate systemd and reliability is improved, however if the container dies without cleaning up, conman still hangs around and must be killed via PreStart or manually.
Furthermore I also run into this issue after releasing the port by killing conmon:
Error: unable to start container "plex": sd-bus call: File exists: OCI runtime error

Also to note, podman generate systemd doesn't include the ExecStartPre section your template indicates.

# container-ac522937d6e06fd0500b75e2e94608dae452f3a5b75500dad38a9edeefa60b5a.service
# autogenerated by Podman 1.6.2
# Wed Dec 18 18:48:49 MST 2019

[Unit]
Description=Podman container-ac522937d6e06fd0500b75e2e94608dae452f3a5b75500dad38a9edeefa60b5a.service
Documentation=man:podman-generate-systemd(1)

[Service]
Restart=on-failure
ExecStart=/usr/bin/podman start ac522937d6e06fd0500b75e2e94608dae452f3a5b75500dad38a9edeefa60b5a
ExecStop=/usr/bin/podman stop -t 10 ac522937d6e06fd0500b75e2e94608dae452f3a5b75500dad38a9edeefa60b5a
KillMode=none
Type=forking
PIDFile=/var/run/containers/storage/overlay-containers/ac522937d6e06fd0500b75e2e94608dae452f3a5b75500dad38a9edeefa60b5a/userdata/conmon.pid

[Install]
WantedBy=multi-user.target

ACiDGRiM on 19 Dec 2019

Oh,...It's closed

ACiDGRiM on 19 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings