Kind: network kind is ambiguous

Created on 15 May 2020  Â·  11Comments  Â·  Source: kubernetes-sigs/kind

What happened:
Kicked off multiple builds in our CI environment, some tests use KIND to spin up clusters. Saw s bunch of failures:

  ✗ Preparing nodes 📦 
 ERROR: failed to create cluster: docker run error: command "docker run --hostname ci-a510791-control-plane --name ci-a510791-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro --detach --tty --label io.x-k8s.kind.cluster=ci-a510791 --net kind --restart=on-failure:1 --volume=/workspace/pr-113/e2e/etc/rootca1.crt:/usr/local/share/ca-certificates/rootca1.crt:ro --volume=/workspace/pr-113/e2e/etc/rootca2.crt:/usr/local/share/ca-certificates/rootca2.crt:ro --publish=127.0.0.1:40759:6443/TCP kindest/node:v1.17.5" failed with error: exit status 125
 Command Output: 4614c0b36ac6a3e641b0a300d07b6b0bc7317132fab3d494d21a3e4777aa5d5a
 docker: Error response from daemon: network kind is ambiguous (2 matches found on name).

What you expected to happen:
No interference between tests, as we experienced w/ KIND 0.7.x

How to reproduce it (as minimally and precisely as possible):

  • create many KIND clusters at the same time, starting from a clean box (w/o a kind network already created)?

Anything else we need to know?:

# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
98fdc5e5af26        bridge              bridge              local
6f8de42fadad        host                host                local
de87cb7dd35e        kind                bridge              local
158648a47e91        kind                bridge              local
75afe274f8f8        none                null                local

Environment:

  • kind version: (use kind version): kind v0.8.1 go1.13.9 linux/amd64
  • Kubernetes version: (use kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5", GitCommit:"e0fccafd69541e3750d460ba0f9743b90336f24f", GitTreeState:"clean", BuildDate:"2020-04-16T11:44:03Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 122
 Server Version: 19.03.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version:
 runc version:
 init version:
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.3.0-46-generic
 Operating System: Ubuntu 19.10
 OSType: linux
 Architecture: x86_64
 CPUs: 104
 Total Memory: 754.6GiB
 Name: 7959d5c46f-m9c7p
 ID: UMPE:ZM2Z:POMD:VQDB:7JAM:7OV5:LNJE:XP5W:EX4Z:CA5N:GO35:IBFY
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="19.10 (Eoan Ermine)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 19.10"
VERSION_ID="19.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=eoan
UBUNTU_CODENAME=eoan
kinbug lifecyclactive prioritimportant-soon

All 11 comments

For now you're going to need to serialize creating the first cluster. I'm not sure if there's a non-racy way to do this in docker.

xref: https://github.com/moby/moby/issues/20648 docker-compose has this same issue :/

I don't think docker gives us sufficient tools to avoid a race condition here coordinating a docker network between multiple processes, unless we do our own out-of-band multi-process locking.

... that's not something I'm super excited to add right now and full of potential problems, would it be acceptable instead if we developed sufficient tooling to allow kind to natively create multiple clusters in one command? I've been sketching out a design for that functionality anyhow.

thanks, but that would not solve my problem. our CI jobs are independent
and don't coordinate with each other

On Wed, May 20, 2020 at 3:29 PM Benjamin Elder notifications@github.com
wrote:

I don't think docker gives us sufficient tools to avoid a race condition
here coordinating a docker network between multiple processes, unless we do
our own out-of-band multi-process locking.

... that's not something I'm super excited to add right now and full of
potential problems, would it be acceptable instead if we developed
sufficient tooling to allow kind to natively create multiple clusters in
one command? I've been sketching out a design for that functionality anyhow.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/1596#issuecomment-631678717,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAR5KLGIH3UKUMECFWYVPVTRSQVRZANCNFSM4NBGCXNQ
.

--
James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

er, but they're using the same docker instance?

are they on the same filesystem even ...?

EDIT: I ask because if not (it sounds like perhaps not) even something like deciding on our own file lock path would not work.

Currently workarounds are:

  • Pre-create a kind docker network (and try to get the options right / reasonable)
  • Serialize creating one cluster yourself first (and solve the ordering issue yourself)
  • Retry cluster creation once (and accept that we may flake very early on due to another concurrent attempt causing this race)
  • Use the experimental KIND_EXPERIMENTAL_DOCKER_NETWORK env to set the network to be unique per cluster, knowing that you'll have to deal with cleanup or have potentially infinite networks, and that we may not choose to support this long term.

I built a test bed with https://godoc.org/github.com/docker/docker/api/types#NetworkCreate CheckDuplicate and it is reliably insufficient.

package main

import (
    "context"
    "fmt"
    "sync"

    "github.com/docker/docker/api/types"
    "github.com/docker/docker/client"
)

func main() {
    cli, err := client.NewClientWithOpts(client.FromEnv)
    if err != nil {
        panic(err)
    }

    networkName := "test"

    createNetwork := func() {
        r, e := cli.NetworkCreate(context.Background(), networkName, types.NetworkCreate{
            CheckDuplicate: true,
            Driver:         "bridge",
        })
        fmt.Println(r, e)
    }
    deleteNetwork := func() {
        fmt.Println(cli.NetworkRemove(context.Background(), networkName))
    }

    var wg sync.WaitGroup
    wg.Add(2)
    go func() {
        createNetwork()
        wg.Done()
    }()
    go func() {
        createNetwork()
        wg.Done()
    }()
    wg.Wait()
    deleteNetwork()
}

results (always the same, except the random IDs):

$ go run .
{8d6b80658e72d596f19c35bd90226171056dc9f93610aec3c2b55b20ad55ff4e } <nil>
{ad09baf925e2a213132c1b9072ec54bc70aaaa0e558a771cc3de2b509d72e948 } <nil>
Error response from daemon: network test is ambiguous (2 matches found based on name)

I've got a pretty good idea how we can hack a working solution but it's going to be ... a hack.

Wrote up a detailed outline of the hack I'm considering
https://docs.google.com/document/d/1Q7Njyco2mAz66lS44pVV7ixT22RAkqBrmVMetG1zuT4

(Shared with [email protected], our standard SIG Testing group. I can't open documents to the entire internet by automated policy, but I can share with groups. This group is open to join, this is common for Kubernetes documents)

This should be mitigated in 0.9.0 (just released, this was the last blocking issue), please let us know if you still encounter issues.

FYI @howardjohn @JeremyOT it _should_ be safe to do concurrent multi-cluster bringup in CI in v0.9.0 without any workarounds.

Was this page helpful?
0 / 5 - 0 ratings