Kind: Create cluster fails without apparent information outside debug mode

Created on 16 Mar 2020  Â·  14Comments  Â·  Source: kubernetes-sigs/kind

What happened:

I was trying to spin up a kind cluster with kind create cluster which kept failing over and over with the following error:

$ kind create cluster
Creating cluster "kind" ...                        
 ✓ Ensuring node image (kindest/node:v1.17.0) 🖼
 ✗ Preparing nodes 📦
docker run error: command "docker run --hostname kind-control-plane --name kind-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro --detach --tty --label io.x-k8s.kind.cluster=kind -e '<redacted>' --publish=127.0.0.1:0:6443/TCP kindest/node:v1.17.0@sha256:9512edae126da271b66b990b6fff768fbb7cd786c7d39e86bdf55906352fdf62" failed with error: exit status 125  

It was only when I ran with -v 1 that I noticed I had forgotten to remove the previous cluster which I had spun up.

Output:
docker: Error response from daemon: Conflict. The container name "/kind-control-plane" is already in use by container "9df1a7a82171e3d329be23b8f63a1bd24f075e54f7a7d1591f94f4610293df3b". You have to remove (or re
name) that container to be able to reuse that name.
See 'docker run --help'.

What you expected to happen:

I'm not exactly sure if this has been discussed before or not, but I feel like this specific error (or some other similar in nature) can be caught and bubbled up in normal execution rather than only exposing them in debug mode.

How to reproduce it (as minimally and precisely as possible):

  • kind-0.6.1 create cluster
  • let it spin up completely
  • leave it alone running
  • kind-0.7.0 create cluster
  • the error above which only visible with -v 1

Anything else we need to know?:

Note that after a little bit of investigation to write up this issue I've figured out that this only happens if the existing container is of a different version than the cluster you're trying to spin up now. For example spin up with v0.6.1 and the try again later with v0.7.0. Now I'm not sure if it made sense for this to be fixed within the code rather than a specific documentation for it (if there isn't any already)

Environment:

  • kind version: (use kind version): kind v0.7.0 go1.13.6 linux/amd64
  • Kubernetes version: (use kubectl version): v1.16.4
  • Docker version: (use docker info): 19.03.6
  • OS (e.g. from /etc/os-release): Ubuntu 19.04
kinbug lifecyclactive prioritimportant-soon

All 14 comments

There were some breaking changes documented in the releases notes that can be causing this behavior
https://github.com/kubernetes-sigs/kind/releases/tag/v0.7.0

I agree, I've been thinking more about how to redo the debug information, I'm leaning towards bumping many things up one level and making v=1 the default (including dumping at least some of that debug information, maybe not the stack trace).

/assign
/priority important-soon

Note that after a little bit of investigation to write up this issue I've figured out that this only happens if the existing container is of a different version than the cluster you're trying to spin up now. For example spin up with v0.6.1 and the try again later with v0.7.0. Now I'm not sure if it made sense for this to be fixed within the code rather than a specific documentation for it (if there isn't any already)

this is related to a breaking change in labeling the containers, we had a migration period of k8s.io namespacing for everything and had guidance that we needed to move to x-k8s.io as an unnofficial set of APIs that have not been reviwed and approved by sig-arch.

(there is more on that change in the release notes I believe)

this is related to a breaking change in labeling the containers, we had a migration period of k8s.io namespacing for everything and had guidance that we needed to move to x-k8s.io as an unnofficial set of APIs that have not been reviwed and approved by sig-arch.

Thanks for the explanation.

I'm leaning towards bumping many things up one level and making v=1 the default

I'm generally behind this but in this particular case I made a mistake and if I had made sure there's no leftover containers before running a new version of kind I wouldn't have faced this error, so I'm not sure if making v=1 default would necessarily be the main enhancement point. I mean, yes I would've seen the error message on the first try (which is absolutely good), but in this exact case I think the code to produce ERROR: node(s) already exist for a cluster with the name "kind" error could be refactored to detect existing clusters of different versions (TBH not sure if possible, but I think it'd be nice)

Also if you think this refactoring to check cross-versions makes sense I'll be able to come up with a fix PR.

but in this exact case I think the code to produce ERROR: node(s) already exist for a cluster with the name "kind" error could be refactored to detect existing clusters of different versions (TBH not sure if possible, but I think it'd be nice)

er, we already had this logic, during the migration we applied both the old k8s.io labels and the new x-k8s.io labels, using the old one to list. then we switched to only listing the new one in a future release.

we do NOT currently support old releases actively, we don't have the bandwidth to staff this in this alpha stage. we are continuing to roll forward towards 1.0 first.

the nodes already exist check only checks for known kind nodes, you could similarly have a container by the same name for another reason, so I think more generally surfacing an error around the container name collision is the path forward here. (or rather, better results for all errors)

It seems I'm running into this error as well, trying to use pj-on-kind.sh:

$ kind get clusters
No kind clusters found.
$ kind version
kind v0.7.0 go1.13.4 darwin/amd64
Creating cluster "mkpod" ...
 ✓ Ensuring node image (kindest/node:v1.17.0) 🖼
 ✗ Preparing nodes 📦  
docker run error: command "docker run --hostname mkpod-control-plane --name mkpod-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro --detach --tty --label io.x-k8s.kind.cluster=mkpod --volume=/mnt/disks/prowjob-out:/mnt/disks/prowjob-out --volume=/mnt/disks/kind-node:/mnt/disks/kind-node --publish=127.0.0.1:0:6443/TCP kindest/node:v1.17.0@sha256:9512edae126da271b66b990b6fff768fbb7cd786c7d39e86bdf55906352fdf62" failed with error: exit status 125
ERROR: failed to create cluster: docker run error: command "docker run --hostname mkpod-control-plane --name mkpod-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro --detach --tty --label io.x-k8s.kind.cluster=mkpod --volume=/mnt/disks/prowjob-out:/mnt/disks/prowjob-out --volume=/mnt/disks/kind-node:/mnt/disks/kind-node --publish=127.0.0.1:0:6443/TCP kindest/node:v1.17.0@sha256:9512edae126da271b66b990b6fff768fbb7cd786c7d39e86bdf55906352fdf62" failed with error: exit status 125

The kind config that it is using is the following:

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
  - extraMounts:
      - containerPath: /mnt/disks/prowjob-out
        hostPath: /mnt/disks/prowjob-out
      # host <-> node mount for hostPath volumes in Pods. (All hostPaths should be under /mnt/disks/kind-node to reach the host.)
      - containerPath: /mnt/disks/kind-node
        hostPath: /mnt/disks/kind-node

@tehcyx this issue is tracking the amount of output now. for your specific failure please open a support issue with more details about your environment.

I'd appreciate cross checking https://kind.sigs.k8s.io/docs/user/known-issues/

Docker run failing generally means a problem with the host.

/lifecycle active

this should no longer be the case

Was this page helpful?
0 / 5 - 0 ratings