Kubespray: Nodes in NotReady status after reboot

Created on 26 Feb 2020  路  2Comments  路  Source: kubernetes-sigs/kubespray

Hello

I found that after draining a worker node with:
kubectl drain sdbit-k8s-worker1 --ignore-daemonsets=true

the node is correctly drained.

After a reboot, if the node is uncordoned with:
kubectl uncordon sdbit-k8s-worker1

the node becomes NotReady and never becomes Ready again.

All the docker containers are missing ( docker ps returns no running container) and kubelet doesn't start because it complains that it can't reach the API server:

kubelet_node_status.go:94] Unable to register node "sdbit-k8s-worker1" with API server: Post https://localhost:6443/api/v1/nodes: dial tcp 127.0.0.1:6443: con
kubelet.go:2267] node "sdbit-k8s-worker1" not found
kubelet.go:2267] node "sdbit-k8s-worker1" not found
reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta1.RuntimeClass: Get https://localhost:6443/apis/node.k8s.io/v1beta1/runtime
kubelet.go:2267] node "sdbit-k8s-worker1" not found

Nginx pod shows:

  Type     Reason                  Age                 From                        Message
  ----     ------                  ----                ----                        -------
  Warning  FailedCreatePodSandBox  62m (x12 over 64m)  kubelet, sdbit-k8s-worker1  Failed create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory
  Normal   SandboxChanged          61m (x13 over 64m)  kubelet, sdbit-k8s-worker1  Pod sandbox changed, it will be killed and re-created.

It looks like kubelet cannot start because of nginx proxy not being able to redirect the traffic to the API server.

Environment:

  • Cloud provider or hardware configuration:
    Baremetal cluster
  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
    Linux 4.15.0-88-generic x86_64
    NAME="Ubuntu"
    VERSION="18.04.4 LTS (Bionic Beaver)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 18.04.4 LTS"
    VERSION_ID="18.04"
    HOME_URL="https://www.ubuntu.com/"
    SUPPORT_URL="https://help.ubuntu.com/"
    BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
    PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
    VERSION_CODENAME=bionic
    UBUNTU_CODENAME=bionic

  • Version of Ansible (ansible --version):
    ansible 2.7.12
    config file = /etc/ansible/ansible.cfg
    configured module search path = ['/home/kubespray/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
    ansible python module location = /usr/local/lib/python3.6/dist-packages/ansible
    executable location = /usr/local/bin/ansible
    python version = 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0]

  • Version of Python (python --version):
    Python 2.7.17

Kubespray version (commit) (git rev-parse --short HEAD):
34e883e6

Network plugin used:
Calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):
https://gist.github.com/irizzant/53f34f02e8b857f1209bab102a67c565

Command used to invoke ansible:
ansible-playbook -b -v -i inventory/sample/hosts.yaml upgrade-cluster.yml

Output of ansible run:
https://gist.github.com/irizzant/9bfa9aec42fb1f85c4003a42b50d7c11

Anything else do we need to know:

kinbug

All 2 comments

This is caused by systemd-resolved service, which was disabled by mistake on our nodes.
This service is needed to have /etc/resolv.conf file, which is in turn needed by Kubelet.

Kubelet refused to start on our nodes on reboot, and the log showed that the resolv.conf file was missing.
Since Kubelet didn't restart the needed docker containers for Calico, Nginx and the like didn't start either and the node could not register to the cluster.

After changing the network configuration and enabling systemd-resolved service, everything went back to normal

Please excute the following commands after rebooting or dropping out of the service

1. Reset your node

$ sudo kubeadm reset

2. Turn off the swap

$ sudo swapoff -a

3. start kubeadm join

$ sudo kubeadm join聽YourNodeIPAddress聽--token --discovery-token-ca-cert-hash \
sha256...
Was this page helpful?
0 / 5 - 0 ratings