I created kind cluster with following YAML:
# a cluster with 3 control-plane nodes and 3 workers
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker
tsunomur@VM:~$ kind create cluster --config kind-example-config.yaml
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.18.2) 🖼
✓ Preparing nodes 📦 📦 📦 📦 📦 📦
✓ Configuring the external load balancer ⚖️
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining more control-plane nodes 🎮
✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Thanks for using kind! 😊
tsunomur@VM:~$ kubectl cluster-info --context kind-kind
Kubernetes master is running at https://127.0.0.1:43185
KubeDNS is running at https://127.0.0.1:43185/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
tsunomur@VM:~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
03dad9ed89f2 kindest/node:v1.18.2 "/usr/local/bin/entr…" 8 minutes ago Up 6 minutes kind-worker2
cbd3f2c279a8 kindest/node:v1.18.2 "/usr/local/bin/entr…" 8 minutes ago Up 6 minutes 127.0.0.1:44681->6443/tcp kind-control-plane3
1531621e9806 kindest/node:v1.18.2 "/usr/local/bin/entr…" 8 minutes ago Up 6 minutes kind-worker3
1ceaa76b5149 kindest/haproxy:2.1.1-alpine "/docker-entrypoint.…" 8 minutes ago Up 8 minutes 127.0.0.1:43185->6443/tcp kind-external-load-balancer
a8b8cc91893e kindest/node:v1.18.2 "/usr/local/bin/entr…" 8 minutes ago Up 6 minutes 127.0.0.1:43397->6443/tcp kind-control-plane
5076541a963d kindest/node:v1.18.2 "/usr/local/bin/entr…" 8 minutes ago Up 6 minutes kind-worker
e64b81636f9a kindest/node:v1.18.2 "/usr/local/bin/entr…" 8 minutes ago Up 6 minutes 127.0.0.1:33069->6443/tcp kind-control-plane2
And then restart docker(same reboot machine):
$ sudo systemctl stop docker
Result: disappear kind-external-load-balancer and even if rewrite cluster-url to control-plane's IP addr force, Pod deployment will pending ever.
tsunomur@VM:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
03dad9ed89f2 kindest/node:v1.18.2 "/usr/local/bin/entr…" 10 minutes ago Up 5 seconds kind-worker2
cbd3f2c279a8 kindest/node:v1.18.2 "/usr/local/bin/entr…" 10 minutes ago Up 5 seconds 127.0.0.1:44681->6443/tcp kind-control-plane3
1531621e9806 kindest/node:v1.18.2 "/usr/local/bin/entr…" 10 minutes ago Up 5 seconds kind-worker3
a8b8cc91893e kindest/node:v1.18.2 "/usr/local/bin/entr…" 10 minutes ago Up 4 seconds 127.0.0.1:43397->6443/tcp kind-control-plane
5076541a963d kindest/node:v1.18.2 "/usr/local/bin/entr…" 10 minutes ago Up 4 seconds kind-worker
e64b81636f9a kindest/node:v1.18.2 "/usr/local/bin/entr…" 10 minutes ago Up 5 seconds 127.0.0.1:33069->6443/tcp kind-control-plane2
Does kind not support restart machine?
Ref:
we need to know more details, like what version you're using.
kind does restart them on the latest version.
it would also be helpful to know if this happens with a simple kind create cluster (no config, no flags) and if so more about what your host environment is like
Thank you for your quick reply.
I use 0.8.1:
tsunomur@VM:~$ kind --version
kind version 0.8.1
When I created a simple cluster, not same stituation.
But exists Pod to be Error.
tsunomur@VM:~$ kind create cluster
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.18.2) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Have a nice day! 👋
tsunomur@VM:~$ k run nginx --image nginx --restart=Never
pod/nginx created
tsunomur@VM:~$ k get po
NAME READY STATUS RESTARTS AGE
nginx 0/1 ContainerCreating 0 2s
tsunomur@VM:~$ k get po -w
NAME READY STATUS RESTARTS AGE
nginx 0/1 ContainerCreating 0 3s
nginx 1/1 Running 0 17s
^Ctsunomur@VM:~$ k cluster-info
Kubernetes master is running at https://127.0.0.1:38413
KubeDNS is running at https://127.0.0.1:38413/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
tsunomur@VM:~$ k get componentstatuses
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
tsunomur@VM:~$
tsunomur@VM:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
be19fb44893d kindest/node:v1.18.2 "/usr/local/bin/entr…" 2 minutes ago Up About a minute 127.0.0.1:38413->6443/tcp kind-control-plane
tsunomur@VM:~$
tsunomur@VM:~$ sudo systemctl stop docker
tsunomur@VM:~$ sudo systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Tue 2020-06-23 17:54:16 UTC; 7s ago
Docs: https://docs.docker.com
Process: 31153 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=0/SUCCESS)
Main PID: 31153 (code=exited, status=0/SUCCESS)
Jun 23 17:15:16 VM dockerd[31153]: time="2020-06-23T17:15:16.225369221Z" level=info msg="API listen on /var/run/docker.sock"
Jun 23 17:15:16 VM systemd[1]: Started Docker Application Container Engine.
Jun 23 17:16:43 VM dockerd[31153]: time="2020-06-23T17:16:43.583494164Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jun 23 17:54:02 VM systemd[1]: Stopping Docker Application Container Engine...
Jun 23 17:54:02 VM dockerd[31153]: time="2020-06-23T17:54:02.941309822Z" level=info msg="Processing signal 'terminated'"
Jun 23 17:54:12 VM dockerd[31153]: time="2020-06-23T17:54:12.957486326Z" level=info msg="Container be19fb44893d46e0e7800cd8af414b80fc5d4bccd0d050ce282a685dd93d3735 failed to exit within
Jun 23 17:54:15 VM dockerd[31153]: time="2020-06-23T17:54:15.089736440Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jun 23 17:54:16 VM dockerd[31153]: time="2020-06-23T17:54:16.254422134Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=m
Jun 23 17:54:16 VM dockerd[31153]: time="2020-06-23T17:54:16.254886136Z" level=info msg="Daemon shutdown complete"
Jun 23 17:54:16 VM systemd[1]: Stopped Docker Application Container Engine.
tsunomur@VM:~$ sudo systemctl start docker
tsunomur@VM:~$ k cluster-info
Kubernetes master is running at https://127.0.0.1:38413
KubeDNS is running at https://127.0.0.1:38413/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
tsunomur@VM:~$ k get componentstatuses
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
tsunomur@VM:~$ k run nginx-after-restart --image nginx --restart=Never
pod/nginx-after-restart created
tsunomur@VM:~$ k get po
NAME READY STATUS RESTARTS AGE
nginx 0/1 Unknown 0 2m
nginx-after-restart 0/1 ContainerCreating 0 2s
But if only a Pod(no manage by Deployment) status is Error, I will recreate.
I won't use multi-node cluster yet.
yeah some errored pods is expected, not all things handle the IP switch well etc.
The cluster not coming back up with multi node is not,
what happens if you use:
# a cluster with 3 control-plane nodes and 3 workers
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
It's possible we have a bug in the "HA" mode, it's not well tested or used for much currently.
I tried only a control-plane cluster with multi worker, and then reboot dockerd, it's seem to good condition.
I'll create only one control-plane from next time.
Thank you.
I think this issue should be re-opened. The problem occurs when more than 1 control-plane is used. I could reproduce easily using this config (kind v0.8.1, docker 19.03.11-ce) :
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: control-plane
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9086b0999d6a kindest/haproxy:2.1.1-alpine "/docker-entrypoint.…" 5 minutes ago Exited (0) 2 minutes ago kind-external-load-balancer
938b62548187 kindest/node:v1.18.2 "/usr/local/bin/entr…" 5 minutes ago Up About a minute 127.0.0.1:39575->6443/tcp kind-control-plane
d665bd9e5fe3 kindest/node:v1.18.2 "/usr/local/bin/entr…" 5 minutes ago Up About a minute 127.0.0.1:34927->6443/tcp kind-control-plane2
I don't think 2 control planes is valid in kubeadm @Rolinh, only 3? I thought we validated this but we must not.
That said, it does seem we have a bug here with multiple control planes.
I'm going to interject a brief note: I _highly_ recommend testing with a single node cluster unless you have strong evidence that multi-node is relevant, doubly so for multi-control plane.
@BenTheElder fwiw, the issue is the same with 3 control planes.
I'm going to interject a brief note: I highly recommend testing with a single node cluster unless you have strong evidence that multi-node is relevant, doubly so for multi-control plane.
Would you mind expanding on this? Why is this a problem? I've been testing things with up to 50 nodes clusters without issues so far except upon docker service restart (or machine reboot). As a single control-plane is sufficient, I'll stick to this but I do require to tests things in a multi-node clusters.
50 nodes? Cool! That's actually the largest single kind cluster I've heard of so far :-)
Many (most?) apps are unlikely to gain anything testing wise from multiple nodes, but running multi-node kind clusters overcommits the hardware (each node reports having the full host resources) while adding more overhead.
The "HA" mode is not actually HA due to etcd and due to running on top of one physical host ... it is somewhat useful for certain things where multiple api-servers matters.
Similarly multi-node is used for testing where multi-node rolling behavior matters (we test kubernetes itself with 1 control plane and 2 workers typically), outside of that it's just extra complexity and overhead.
50 nodes? Cool! That's actually the largest single kind cluster I've heard of so far :-)
I've tried to push it further just out of curiosity but a 100 nodes cluster attempt brought my machine down to its knees with a ridiculous 2500+ load average at some point :grin:
I work on Cilium (so I use kind with the Cilium CNI) and at the moment more specifically on Hubble Relay for cluster wide observability and being able to test things in a local multi-nodes cluster is just amazing. I used to have to run multiple VMs but this process is much heavier. We're also able to test things like cluster mesh with kind. We also recently introduced kind as part of our CI to run smoke tests.
cool, that's definitely one of those apps that will benefit from multi-node :-)
we see a lot of people going a bit nuts with nodes to run web-app like services that don't benefit from this 😅
tracking the HA restart issue with a bug here https://github.com/kubernetes-sigs/kind/issues/1689
closing this one, but will continue responding to comments 😅
Most helpful comment
I've tried to push it further just out of curiosity but a 100 nodes cluster attempt brought my machine down to its knees with a ridiculous 2500+ load average at some point :grin:
I work on Cilium (so I use kind with the Cilium CNI) and at the moment more specifically on Hubble Relay for cluster wide observability and being able to test things in a local multi-nodes cluster is just amazing. I used to have to run multiple VMs but this process is much heavier. We're also able to test things like cluster mesh with kind. We also recently introduced kind as part of our CI to run smoke tests.