kubeadm 1.6.1 worker hangs on join

Created on 12 Apr 2017  ·  12Comments  ·  Source: kubernetes/kubeadm

BUG REPORT:

Versions

kubeadm version (use kubeadm version):
1.6.1
Environment:
ubuntu

  • Kubernetes version (use kubectl version):
    1.6.1
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
    ```root@phil-ubu:/home/ubuntu# lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description: Ubuntu 16.04.2 LTS
    Release: 16.04
    Codename: xenial
  • Kernel (e.g. uname -a):
  • Others:
## What happened?
when joining the cluster via `kubeadm join --token d77f50.ccc501bafbaa4179 myip.118.240.130:6443` on a kube worker I get this message and it hangs here:
``` [discovery] Created cluster-info discovery client, requesting info from “https://myip.118.240.130:6443”```

the port 6443 should be open on the master, i telneted to it and it connected

looking at journalctl on the worker I see this:  ```pr 11 22:26:23 phil-ubu-worker-1 kubelet[13231]: error: failed to run Kubelet: invalid kubeconfig: stat /etc/kubernetes/kubelet.conf: no such file or directory
Apr 11 22:26:23 phil-ubu-worker-1 kubelet[13231]: I0411 22:26:23.111145   13231 feature_gate.go:144] feature gates: map[]
Apr 11 22:26:22 phil-ubu-worker-1 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Apr 11 22:26:22 phil-ubu-worker-1 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Apr 11 22:26:22 phil-ubu-worker-1 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Apr 11 22:26:12 phil-ubu-worker-1 systemd[1]: kubelet.service: Failed with result ‘exit-code’.
Apr 11 22:26:12 phil-ubu-worker-1 systemd[1]: kubelet.service: Unit entered failed state.
Apr 11 22:26:12 phil-ubu-worker-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE```

On the master:

ubectl --kubeconfig ./admin.conf get nodes
NAME STATUS AGE
phil-ubu NotReady 1h

master: Apr 11 22:39:42 phil-ubu kubelet[10694]: E0411 22:39:42.727630 10694 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 11 22:39:42 phil-ubu kubelet[10694]: W0411 22:39:42.726912 10694 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
```

What you expected to happen?

I expected the worker to join the cluster

How to reproduce it (as minimally and precisely as possible)?

I followed instructions here: https://kubernetes.io/docs/getting-started-guides/kubeadm/

Anything else we need to know?

Most helpful comment

hi,guy, although it's late for 2 years, I still encountered the problem, but I fixed it. And I think people are easy to have the problem on multiple machines.
it' s not about network, it's about time on different machine. Different time makes the [token] out of date sometimes. So, try to synchronize the time on your clusters.

All 12 comments

@pswenson Couple of things to check: What pod networking provider and what version are you using?

flannel via kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

which has: quay.io/coreos/flannel:v0.7.1-amd64

Looks like kube-dns is not running. What is the output of kubectl get pods -n kube-system --kubeconfig ./admin.conf?

Well, kube-dns won't run until network is ready. That flannel manifest is not the one you should use with 1.6, but I don't know where the 1.6 manifest is for flannel, you can get Weave Net with kubectl apply -f https://git.io/weave-kube-1.6.

looks like you need a separate file to handle RBAC, https://github.com/tomdee/flannel/blob/743bafee48b69a3a3f79e37bc806d741715f1dd2/Documentation/kube-flannel-rbac.yml

@pswenson Did that work for you?

@coeki Trying with weave first... @errordeveloper I still have the same issue with weave

[discovery] Trying to connect to API Server "MYIP.118.240.157:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://MYIP.118.240.157:6443" ```
hangs.

this is a curl to the master from the worker

curl -v --insecure  https://MYIP.118.240.157:6443
* Rebuilt URL to: https://MYIP.118.240.157:6443/
*   Trying MYIP.118.240.157...
* Connected to MYIP.118.240.157 (MYIP.118.240.157) port 6443 (#0)
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
* found 692 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*    server certificate verification SKIPPED
*    server certificate status verification SKIPPED
*    common name: kube-apiserver (matched)
*    server certificate expiration date OK
*    server certificate activation date OK
*    certificate public key: RSA
*    certificate version: #3
*    subject: CN=kube-apiserver
*    start date: Fri, 21 Apr 2017 19:38:41 GMT
*    expire date: Sat, 21 Apr 2018 19:38:41 GMT
*    issuer: CN=kubernetes
*    compression: NULL
* ALPN, server accepted to use http/1.1
> GET / HTTP/1.1
> Host: MYIP.118.240.157:6443
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Content-Type: text/plain
< X-Content-Type-Options: nosniff
< Date: Fri, 21 Apr 2017 19:56:25 GMT
< Content-Length: 57
<
* Connection #0 to host 96.118.240.157 left intact

so they can talk to each other, but something is going wrong in that call....

what is the workflow? what happens during this call below?

[discovery] Created cluster-info discovery client, requesting info from “https://myip.118.240.130:6443”

@coeki I had same result with your suggestions... Is the pod network even needed to get past this step?

[discovery] Created cluster-info discovery client, requesting info from “https://myip.118.240.130:6443”

Update: I just tested kubeadm with weave net on our old openstack env, which is configured with the exact same network configuration. It works fine.

So there is something non-obvious that is different about this new environment. My problem is I don't see a way to debug this.....

Closing.. .turned out the the MUT was setup inconsistently in our openstack. so the api join packet was being dropped. kubeadm had nothing to do with the prob

hi,guy, although it's late for 2 years, I still encountered the problem, but I fixed it. And I think people are easy to have the problem on multiple machines.
it' s not about network, it's about time on different machine. Different time makes the [token] out of date sometimes. So, try to synchronize the time on your clusters.

Thanks @MichaleWong ! That's the right answer for my failing case. Clock need to be synchronized before joining.

Was this page helpful?
0 / 5 - 0 ratings