Kind: Multi node - worker/control-plane setup fails

Created on 29 Oct 2019  Â·  4Comments  Â·  Source: kubernetes-sigs/kind

What happened:

[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I1029 15:49:02.096728     575 round_trippers.go:438] POST https://172.17.0.3:6443/api/v1/namespaces/kube-system/configmaps  in 0 milliseconds
error execution phase control-plane-join/update-status: error uploading configuration: unable to create ConfigMap: Post https://172.17.0.3:6443/api/v1/namespaces/kube-system/configmaps: EOF 
 ✗ Joining more control-plane nodes 🎮
Error: failed to create cluster: failed to join node with kubeadm: exit status 1

What you expected to happen:
Basic create

How to reproduce it (as minimally and precisely as possible):
kind create cluster --wait 300s --config config.yaml --name kind-test --retain --loglevel debug

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker
- role: worker
- role: worker
- role: worker

Anything else we need to know?:
kind.log

Environment:

  • kind version: (use kind version): v0.5.1
  • Kubernetes version: (use kubectl version): v1.16.2
  • Docker version: (use docker info): 19.03.4
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS
kinbug

Most helpful comment

I have been getting this issue recently with a kind patch that lowered timeouts for HAProxy -- to test some things.

The fix is merged now: https://github.com/kubernetes/kubernetes/pull/85763.

I could reproduce almost everytime with the linked kind patch, always with several control plane instances (as the problem always showed up when trying to create and then update the kubeadm-config ConfigMap when joining new control plane nodes). The problem was that the create action was never retried, and sometimes we were getting an EOF at that time, when we were either expecting a valid response with a 201 HTTP status or a 409 HTTP status with AlreadyExists error. Now with the provided patch, if we get an unknown error, we'll retry the operation.

With https://github.com/kubernetes/kubernetes/pull/85763 I have been unable to reproduce again.

All 4 comments

Can you export the logs with kind export logs?

It looks like the APIserver is unhealthy, that usually happens due to running out of resources.

It seems like an issue on the VPS side, cleaning up and testing it again now raises disk space issues while have 60G to spare... sorry to have wasted your time..

I have been getting this issue recently with a kind patch that lowered timeouts for HAProxy -- to test some things.

The fix is merged now: https://github.com/kubernetes/kubernetes/pull/85763.

I could reproduce almost everytime with the linked kind patch, always with several control plane instances (as the problem always showed up when trying to create and then update the kubeadm-config ConfigMap when joining new control plane nodes). The problem was that the create action was never retried, and sometimes we were getting an EOF at that time, when we were either expecting a valid response with a 201 HTTP status or a 409 HTTP status with AlreadyExists error. Now with the provided patch, if we get an unknown error, we'll retry the operation.

With https://github.com/kubernetes/kubernetes/pull/85763 I have been unable to reproduce again.

thanks @ereslibre !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mithunvikram picture mithunvikram  Â·  4Comments

lilic picture lilic  Â·  4Comments

fgimenez picture fgimenez  Â·  4Comments

csantanapr picture csantanapr  Â·  4Comments

philipstaffordwood picture philipstaffordwood  Â·  4Comments