Podman: Rebooted, dangling file in /var/lib/cni/networks/podman prevents container starting

Created on 8 Aug 2019  Â·  32Comments  Â·  Source: containers/podman

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description


I did a systemctl reboot; encountered a orphan /var/lib/cni/networks/podman/10.88.0.20 file
which prevented the container from starting.

ERRO[0000] Error adding network: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 
ERRO[0000] Error while adding pod to CNI network "podman": failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 
Error: unable to start container "freeswitch-init_1": error configuring network namespace for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254

Steps to reproduce the issue:

  1. Upgraded Fedora 30 host, rebooted, tried to start a container

2.

3.

Describe the results you received:

INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist 
DEBU[0000] Made network namespace at /var/run/netns/cni-76dce0ef-599f-b625-7710-a7eeef4159e5 for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 
INFO[0000] Got pod network &{Name:freeswitch-init_1 Namespace:freeswitch-init_1 ID:7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 NetNS:/var/run/netns/cni-76dce0ef-599f-b625-7710-a7eeef4159e5 PortMappings:[] Networks:[podman] NetworkConfig:map[podman:{IP:10.88.0.20}]} 
INFO[0000] About to add CNI network cni-loopback (type=loopback) 
DEBU[0000] overlay: mount_data=nodev,metacopy=on,lowerdir=/var/lib/containers/storage/overlay/l/Z6S3A3PN6PO5DCD4E3FZZOFLO6:/var/lib/containers/storage/overlay/l/II26Q5SUSAWGOVHLKY3MKC7GNU,upperdir=/var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/diff,workdir=/var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/work,context="system_u:object_r:container_file_t:s0:c362,c945" 
DEBU[0000] mounted container "7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7" at "/var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/merged" 
DEBU[0000] Created root filesystem for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 at /var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/merged 
INFO[0000] Got pod network &{Name:freeswitch-init_1 Namespace:freeswitch-init_1 ID:7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 NetNS:/var/run/netns/cni-76dce0ef-599f-b625-7710-a7eeef4159e5 PortMappings:[] Networks:[podman] NetworkConfig:map[podman:{IP:10.88.0.20}]} 
INFO[0000] About to add CNI network podman (type=bridge) 
ERRO[0000] Error adding network: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 
ERRO[0000] Error while adding pod to CNI network "podman": failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 
DEBU[0000] Network is already cleaned up, skipping...   
DEBU[0000] unmounted container "7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7" 
DEBU[0000] Cleaning up container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 
DEBU[0000] Network is already cleaned up, skipping...   
DEBU[0000] Container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 storage is already unmounted, skipping... 
ERRO[0000] unable to start container "freeswitch-init_1": error configuring network namespace for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 

Describe the results you expected:
Container starts

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:            1.4.4
RemoteAPI Version:  4
Go Version:         go1.12.7
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.12.7
  podman version: 1.4.4
host:
  BuildahVersion: 1.9.0
  Conmon:
    package: podman-1.4.4-4.fc30.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 1.0.0-dev, commit: 164df8af4e62dc759c312eab4b97ea9fb6b5f1fc'
  Distribution:
    distribution: fedora
    version: "30"
  MemFree: 7664599040
  MemTotal: 8340746240
  OCIRuntime:
    package: runc-1.0.0-93.dev.gitb9b6cc6.fc30.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8+dev
      commit: e3b4c1108f7d1bf0d09ab612ea09927d9b59b4e3
      spec: 1.0.1-dev
  SwapFree: 4294963200
  SwapTotal: 4294963200
  arch: amd64
  cpus: 4
  hostname: containers.localdomain
  kernel: 5.2.5-200.fc30.x86_64
  os: linux
  rootless: false
  uptime: 7m 27.03s
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 3
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mountopt=nodev,metacopy=on
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  ImageStore:
    number: 6
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):
In /var/lib/cni/networks/podman I have:

  1. 10.88.0.20: contains the ID of the container
  2. last_reserved_ip.0: contains 10.88.0.20
  3. lock

The guard file 10.88.0.20 must have been left behind from the reboot. After deleting the file, the container could start.

kinbug

Most helpful comment

This problem still exists in podman 2.0.5

All 32 comments

This usually happens when another container has taken the IP that container
was using. Do you have any other containers running using that IP?

On Thu, Aug 8, 2019, 04:15 space88man notifications@github.com wrote:

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Container won't start after system reboot with:

ERRO[0000] Error adding network: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254
ERRO[0000] Error while adding pod to CNI network "podman": failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254
Error: unable to start container "freeswitch-init_1": error configuring network namespace for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254

Steps to reproduce the issue:

1.

Upgraded Fedora 30 host, rebooted, tried to start a container
2.
3.

Describe the results you received:

INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
DEBU[0000] Made network namespace at /var/run/netns/cni-76dce0ef-599f-b625-7710-a7eeef4159e5 for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7
INFO[0000] Got pod network &{Name:freeswitch-init_1 Namespace:freeswitch-init_1 ID:7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 NetNS:/var/run/netns/cni-76dce0ef-599f-b625-7710-a7eeef4159e5 PortMappings:[] Networks:[podman] NetworkConfig:map[podman:{IP:10.88.0.20}]}
INFO[0000] About to add CNI network cni-loopback (type=loopback)
DEBU[0000] overlay: mount_data=nodev,metacopy=on,lowerdir=/var/lib/containers/storage/overlay/l/Z6S3A3PN6PO5DCD4E3FZZOFLO6:/var/lib/containers/storage/overlay/l/II26Q5SUSAWGOVHLKY3MKC7GNU,upperdir=/var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/diff,workdir=/var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/work,context="system_u:object_r:container_file_t:s0:c362,c945"
DEBU[0000] mounted container "7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7" at "/var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/merged"
DEBU[0000] Created root filesystem for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 at /var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/merged
INFO[0000] Got pod network &{Name:freeswitch-init_1 Namespace:freeswitch-init_1 ID:7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 NetNS:/var/run/netns/cni-76dce0ef-599f-b625-7710-a7eeef4159e5 PortMappings:[] Networks:[podman] NetworkConfig:map[podman:{IP:10.88.0.20}]}
INFO[0000] About to add CNI network podman (type=bridge)
ERRO[0000] Error adding network: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254
ERRO[0000] Error while adding pod to CNI network "podman": failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254
DEBU[0000] Network is already cleaned up, skipping...
DEBU[0000] unmounted container "7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7"
DEBU[0000] Cleaning up container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7
DEBU[0000] Network is already cleaned up, skipping...
DEBU[0000] Container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 storage is already unmounted, skipping...
ERRO[0000] unable to start container "freeswitch-init_1": error configuring network namespace for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254

Describe the results you expected:
Container starts

Additional information you deem important (e.g. issue happens only
occasionally):

Output of podman version:

Version: 1.4.4
RemoteAPI Version: 1
Go Version: go1.12.7
OS/Arch: linux/amd64

Output of podman info --debug:

debug:
compiler: gc
git commit: ""
go version: go1.12.7
podman version: 1.4.4
host:
BuildahVersion: 1.9.0
Conmon:
package: podman-1.4.4-4.fc30.x86_64
path: /usr/libexec/podman/conmon
version: 'conmon version 1.0.0-dev, commit: 164df8af4e62dc759c312eab4b97ea9fb6b5f1fc'
Distribution:
distribution: fedora
version: "30"
MemFree: 7664599040
MemTotal: 8340746240
OCIRuntime:
package: runc-1.0.0-93.dev.gitb9b6cc6.fc30.x86_64
path: /usr/bin/runc
version: |-
runc version 1.0.0-rc8+dev
commit: e3b4c1108f7d1bf0d09ab612ea09927d9b59b4e3
spec: 1.0.1-dev
SwapFree: 4294963200
SwapTotal: 4294963200
arch: amd64
cpus: 4
hostname: containers.localdomain
kernel: 5.2.5-200.fc30.x86_64
os: linux
rootless: false
uptime: 7m 27.03s
registries:
blocked: null
insecure: null
search:

  • docker.io
  • registry.fedoraproject.org
  • quay.io
  • registry.access.redhat.com
  • registry.centos.org
    store:
    ConfigFile: /etc/containers/storage.conf
    ContainerStore:
    number: 3
    GraphDriverName: overlay
    GraphOptions:
  • overlay.mountopt=nodev,metacopy=on
    GraphRoot: /var/lib/containers/storage
    GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
    ImageStore:
    number: 6
    RunRoot: /var/run/containers/storage
    VolumePath: /var/lib/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/containers/libpod/issues/3759?email_source=notifications&email_token=AB3AOCEZBTMKDRBW3VAM6OLQDPI2TA5CNFSM4IKHTBL2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HECRY6Q,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB3AOCEP3KRIM4SU5MGH7B3QDPI2TANCNFSM4IKHTBLQ
.

No; this was after a graceful reboot.

@mccv1r0 Can you take a look at this? it looks like CNI didn't clean up address reservations after a clean reboot?

@mheon the runtime is solely responsible for calling CNI DEL for every container that is no longer running but has not had DEL called on it. I know that podman/CRIO keep a cache of containers somewhere on-disk. When that cache is reconciled with what is actually running when CRIO/podman start, they need to call CNI DEL on every container in that cache list that is not currently running to allow the network plugin to clean up.

@dcbw Is this something we could configure to put these files on a tmpfs, so that we don't have to cleanup after a reboot?

@rhatdan you have no idea what kind of information the network plugin has to clean up, so you have to tell the network plugin to clean up. Which is CNI DEL.

If your container is no longer running, and it wasn't given a CNI DEL, you must call CNI DEL to clean up.

@mheon @rhatdan you have a container database:

// save container state to the database
func (c *Container) save() error {
    if err := c.runtime.state.SaveContainer(c); err != nil {
        return errors.Wrapf(err, "error saving container %s state", c.ID())
    }
    return nil
}

presumably that doesn't get blown away on restart. So after the next time podman runs (for any pod) it'll need to reconcile that database list with what's actually running and prune out the old stuff and call CNI DEL on those that aren't running. Or something like that.

We do actually blow away most database state on reboot, on the assumption that none of it survived the reboot - what was running no longer is, what was mounted no longer is. It may be possible to work a CNI DEL into the process of refreshing the state post-reboot - I'll investigate.

Alright, this bit promises to be complicated.

We wipe container state very early in the refresh process, because we can't retrieve containers otherwise - the state is not sane because of the restart, so all of our validation fails. The runtime never touches a state with the CNI result available after a reboot - it's gone by the time we have the ability to actually get containers.

We can change this to preserve the cached CNI result, but some bits of what we pass to OCICNI as part of TeardownPod are not going to survive - network namespace path, for example, is definitely gone, and I don't see a way of preserving it.

Hmmm. If the network namespace path isn't strictly mandatory - I think we can probably call OCICNI's TeardownPod without preserving the old CNI result.

Should be fixed by #4086

When two or more containers are started at the same time, the latter always fails

ERRO[0000] Error adding network: failed to allocate for range 0: 10.88.1.230 has been allocated to 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c, duplicate allocation is not allowed
ERRO[0000] Error while adding pod to CNI network "podman": failed to allocate for range 0: 10.88.1.230 has been allocated to 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c, duplicate allocation is not allowed
Error: unable to start container "gitlab": error configuring network namespace for container 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c: failed to allocate for range 0: 10.88.1.230 has been allocated to 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c, duplicate allocation is not allowed

Can you provide a reproducer for this? What Podman version are you on?

Centos 7 same error, after gracefull reboot some containers don't start.

- Logs begin at Пт 2020-07-24 17:09:33 UTC, end at Пт 2020-07-24 17:10:11 UTC. --
июл 24 17:09:44 epp-dev-03-apache-02-pub.12.voskhod.local systemd[1]: Starting Podman container-pdns-rec-02.service...
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Reading configuration file \"/usr/share/containers/libpod.conf\""
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Merged system config \"/usr/share/containers/libpod.conf\": &{{false false false false false true} 0 {   [] [] []}  docker://  runc map[crun:[/usr/bin/crun /usr/local/bin/crun] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] [crun runc] [crun] [] [/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] systemd   /var/run/libpod -1 false /etc/cni/net.d/ [/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] podman []   k8s.gcr.io/pause:3.1 /pause false false  2048 shm    false}"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using conmon: \"/usr/bin/conmon\""
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using graph driver overlay"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using graph root /var/lib/containers/storage"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using run root /var/run/containers/storage"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using static dir /var/lib/containers/storage/libpod"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using tmp dir /var/run/libpod"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using volume path /var/lib/containers/storage/volumes"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Set libpod namespace to \"\""
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="[graphdriver] trying provided driver \"overlay\""
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="cached value indicated that overlay is supported"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="cached value indicated that metacopy is not being used"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="cached value indicated that native-diff is usable"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="backingFs=extfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="Initializing event backend journald"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="using runtime \"/usr/bin/runc\""
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=warning msg="Error initializing configured OCI runtime crun: no valid executable found for OCI runtime crun: invalid argument"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=info msg="Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=debug msg="Made network namespace at /var/run/netns/cni-1c3935ab-eaf9-7c6c-6255-b35aca30a60f for container 217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=info msg="Got pod network &{Name:pdns-rec-02 Namespace:pdns-rec-02 ID:217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805 NetNS:/var/run/netns/cni-1c3935ab-eaf9-7c6c-6255-b35aca30a60f Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[{HostPort:5354 ContainerPort:53 Protocol:udp HostIP:} {HostPort:5354 ContainerPort:53 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}]}"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=info msg="About to add CNI network cni-loopback (type=loopback)"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=info msg="Got pod network &{Name:pdns-rec-02 Namespace:pdns-rec-02 ID:217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805 NetNS:/var/run/netns/cni-1c3935ab-eaf9-7c6c-6255-b35aca30a60f Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[{HostPort:5354 ContainerPort:53 Protocol:udp HostIP:} {HostPort:5354 ContainerPort:53 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}]}"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=info msg="About to add CNI network podman (type=bridge)"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=debug msg="overlay: mount_data=lowerdir=/var/lib/containers/storage/overlay/l/YGHO4H3I3TNXW65T6OCIN3DC4R:/var/lib/containers/storage/overlay/l/W23YXYEJN45PNRY7CYYCC3KIHG,upperdir=/var/lib/containers/storage/overlay/0a7d1dce43b21da9c68d7c7e57556b41a541959a2ad7b6d96a8d1b9c7ba8e0a3/diff,workdir=/var/lib/containers/storage/overlay/0a7d1dce43b21da9c68d7c7e57556b41a541959a2ad7b6d96a8d1b9c7ba8e0a3/work"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=debug msg="mounted container \"217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805\" at \"/var/lib/containers/storage/overlay/0a7d1dce43b21da9c68d7c7e57556b41a541959a2ad7b6d96a8d1b9c7ba8e0a3/merged\""
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=debug msg="Created root filesystem for container 217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805 at /var/lib/containers/storage/overlay/0a7d1dce43b21da9c68d7c7e57556b41a541959a2ad7b6d96a8d1b9c7ba8e0a3/merged"
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=error msg="Error adding network: unable to create chain CNI-HOSTPORT-SETMARK: running [/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait]: exit status 1: iptables: Chain already exists.\n"
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=error msg="Error while adding pod to CNI network \"podman\": unable to create chain CNI-HOSTPORT-SETMARK: running [/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait]: exit status 1: iptables: Chain already exists.\n"
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=debug msg="unmounted container \"217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805\""
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=debug msg="Network is already cleaned up, skipping..."
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=debug msg="Cleaning up container 217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805"
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=debug msg="Network is already cleaned up, skipping..."
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=debug msg="Container 217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805 storage is already unmounted, skipping..."
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=error msg="unable to start container \"pdns-rec-02\": error configuring network namespace for container 217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805: unable to create chain CNI-HOSTPORT-SETMARK: running [/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait]: exit status 1: iptables: Chain already exists.\n"
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local systemd[1]: container-pdns-rec-02.service: control process exited, code=exited status=125
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local systemd[1]: Failed to start Podman container-pdns-rec-02.service.
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local systemd[1]: Unit container-pdns-rec-02.service entered failed state.
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local systemd[1]: container-pdns-rec-02.service failed.

Steps to reproduce:

podman run --name test-01 -d centos:7
podman run --name test-02 -d centos:7
podman run --name test-03 -d centos:7
podman generate systemd -t 5 test-01 > /etc/systemd/system/container-test-01.service
podman generate systemd -t 5 test-02 > /etc/systemd/system/container-test-02.service
podman generate systemd -t 5 test-03 > /etc/systemd/system/container-test-03.service
systemctl daemon-reload
systemctl enable --now container-test-01.service container-test-02.service container-test-03.service
systemctl stop container-test-01.service container-test-02.service container-test-03.service
rm -rf /var/lib/cni/networks/podman
ip link delete cni-podman0
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X
iptables -X -t nat
iptables -L -n
iptables -L -n -t nat
systemctl start container-test-01.service & systemctl start container-test-02.service & systemctl start container-test-03.service

I think your log trace cut off some longer lines - in particular, the actual error message from CNI is missing. Can you try again?

@mheon Update log and step to reproduce without reboot. Podman version 1.6.4

podman version

# podman version
Version:            1.6.4
RemoteAPI Version:  1
Go Version:         go1.12.12
OS/Arch:            linux/amd64

podman info

# podman info
host:
  BuildahVersion: 1.12.0-dev
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.8-1.el7.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.8, commit: f85c8b1ce77b73bcd48b2d802396321217008762'
  Distribution:
    distribution: '"centos"'
    version: "7"
  MemFree: 584450048
  MemTotal: 1039032320
  OCIRuntime:
    name: runc
    package: runc-1.0.0-67.rc10.el7_8.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 4294963200
  SwapTotal: 4294963200
  arch: amd64
  cpus: 1
  eventlogger: journald
  hostname: epp-dev-03-apache-02-int.12.voskhod.local
  kernel: 3.10.0-1127.13.1.el7.x86_64
  os: linux
  rootless: false
  uptime: 4m 0.05s
registries:
  blocked: null
  insecure: null
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 4
  GraphDriverName: overlay
  GraphOptions: {}
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 6
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Basically i thinked when systemd start all containers together CNI network initialisation create race condition. But now i think runtime environment sometime doesn't correct processed. Anyway repeating creating chain in iptables nat table is race.

msg="Error adding network: unable to create chain CNI-HOSTPORT-SETMARK: running [/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait]: exit status 1: iptables: Chain already exists.\n"
msg="Error while adding pod to CNI network \"podman\": unable to create chain CNI-HOSTPORT-SETMARK: running [/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait]: exit status 1: iptables: Chain already exists.\n"

For qwick fix, i hacked CNI init.

/etc/systemd/system/podman-init.service

[Unit]
Description=Podman hack for init CNI before start containers
Documentation=man:podman-run
[Service]
ExecStartPre=/usr/bin/podman run --rm --name podman-init -d -p 65500:65500/udp centos:centos7 sleep 30
ExecStart=/usr/bin/sleep 10
Type=simple
[Install]
WantedBy=multi-user.target

/etc/systemd/system/container-test-01.service.d/after.conf

[Unit]
After=podman-init.service

I need to verify if this still occurs on 2.0; if it does, we can add a lock
for network initialization to prevent us from running CNI concurrently.

On Sat, Jul 25, 2020, 07:54 Alex Gluck notifications@github.com wrote:

For qwick fix, i hacked CNI init.

/etc/systemd/system/podman-init.service

[Unit]
Description=Podman hack for init CNI before start containers
Documentation=man:podman-run
[Service]
ExecStartPre=/usr/bin/podman run --rm --name podman-init -d -p 65500:65500/udp centos:centos7 sleep 30
ExecStart=/usr/bin/sleep 10
Type=simple
[Install]
WantedBy=multi-user.target

/etc/systemd/system/container-test-01.service.d/after.conf

[Unit]
After=podman-init.service

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/containers/podman/issues/3759#issuecomment-663846833,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB3AOCFGWFTKLVPCX7MQFBLR5LBWVANCNFSM4IKHTBLQ
.

My colleague says not to reproduce, bug floating.

This problem still exists in podman 2.0.5

@mheon Could you please reopen this since it's not fixed yet?

Can you clarify what exactly you're seeing? From my perspective, we have two bugs here. The original, ERRO[0000] Error adding network: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 - that should be fixed, at least as root. Are you still seeing it as root? If not, please open a fresh issue for whatever you're seeing.

I am seeing that exact error message as root. The problem occurs after a reboot.

My CI labs just got 2.0.5 a couple of days ago but according to @sunyitao this won't fix it, unfortunately.

I have however implemented the WO suggested by @AlexGluck and created a small podman-init.service which seems to have solved the issue for good on some of my Ansible roles.

@rhatdan This seems like a good candidate for systemd-tmpfiles? Directory not on a tmpfs that we want wiped after each reboot.

Sure, what is the path?

All subfolders of /var/lib/cni/networks

I think we also need to prevent files named last_reserved_ip.0 and lock from being removed.

Won't CNI recreate them if they are removed?

# rm -rf  /var/lib/cni/networks/*
# podman run fedora echo hello
hello
# ls /var/lib/cni/networks/*
last_reserved_ip.0  lock

Nice. OK, safe to remove everything in those directories then.

Was this page helpful?
0 / 5 - 0 ratings