Kubespray: kubectl get nodes is not displaying the slaves

Created on 10 Aug 2017 · 11Comments · Source: kubernetes-sigs/kubespray

Kubernetes cluster was installed successfully using kubespay on CentOS7.3, but master is not able to see the slaves. Pls let me know, if I am missing something here

OS Version

[mlsm@openshift-master kubespray]$ cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)
[mlsm@openshift-master kubespray]$

PLAY RECAP **************************************************************
kubemaster : ok=348 changed=24 unreachable=0 failed=0
kubeslave1 : ok=277 changed=11 unreachable=0 failed=0
kubeslave2 : ok=264 changed=9 unreachable=0 failed=0
localhost : ok=3 changed=0 unreachable=0 failed=0

Wednesday 09 August 2017 20:19:36 -0400 (0:00:00.023) 0:08:34.074 **

kubernetes/preinstall : Update package management cache (YUM) ---------- 12.71s
network_plugin/calico : Calico | Copy cni plugins from calico/cni container --- 9.12s
kubernetes/master : Master | wait for the apiserver to be running ------- 7.50s
kubernetes/master : Copy kubectl from hyperkube container --------------- 6.90s
kubernetes/preinstall : Install packages requirements ------------------- 4.98s
etcd : wait for etcd up ------------------------------------------------- 4.92s
network_plugin/calico : Calico | Copy cni plugins from hyperkube -------- 4.56s
kubernetes-apps/ansible : Kubernetes Apps | Lay Down KubeDNS Template --- 2.64s
download : Register docker images info ---------------------------------- 2.21s
download : Register docker images info ---------------------------------- 2.06s
kubernetes/master : Install kubectl bash completion --------------------- 1.93s
etcd : Backup etcd v3 data ---------------------------------------------- 1.89s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources ------------- 1.79s
download : Register docker images info ---------------------------------- 1.74s
download : Register docker images info ---------------------------------- 1.73s
download : Register docker images info ---------------------------------- 1.73s
download : Register docker images info ---------------------------------- 1.71s
download : Register docker images info ---------------------------------- 1.67s
download : Register docker images info ---------------------------------- 1.65s
download : Register docker images info ---------------------------------- 1.63s

On the master I am not able to see the slaves

[mlsm@10 ~]$ kubectl get nodes
NAME STATUS AGE VERSION
10 Ready 23h v1.6.7+coreos.0
[mlsm@10 ~]$

Inventory file

kubemaster ansible_ssh_host=10.10.5.14 #ip=10.3.0.1
kubeslave1 ansible_ssh_host=10.10.5.13 #ip=10.3.0.2
kubeslave2 ansible_ssh_host=10.10.5.11 #ip=10.3.0.3
[kube-master]
kubemaster
[etcd]
kubemaster
[kube-node]
kubeslave1
kubeslave2
[k8s-cluster:children]
kube-node
kube-master

Source

mageshmcc

👍2

Most helpful comment

In my setup, Since I had all the vm's cloned to create multiple hosts for k8s cluster, all my vm's had the same hostname in "/etc/hosts" and hence, "kubectl get nodes" was showing all of the nodes as one single entry, after changing each nodes with a different hostname in "/etc/hosts" individually and reinstalling k8s using kubespay, It worked fine for me :-)

mageshmcc on 16 Sep 2017

👍5

All 11 comments

Just ran into this issue today using the latest v2.2.0 release. Also happened with a somewhat recent master branch.

Inventory:

# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
node0 ansible_ssh_user=root ansible_ssh_host=infra00-lab  ip=172.31.134.110
node1 ansible_ssh_user=root ansible_ssh_host=infra01-lab  ip=172.31.134.111
node2 ansible_ssh_user=root ansible_ssh_host=infra02-lab  ip=172.31.134.112
node3 ansible_ssh_user=root ansible_ssh_host=infra03-lab  ip=172.31.134.91
node4 ansible_ssh_user=root ansible_ssh_host=infra04-lab  ip=172.31.134.92
#node5 ansible_ssh_user=root ansible_ssh_host=infra05-lab  ip=172.31.134.93

[kube-master]
node0
node1

[etcd]
node0
node1

[kube-node]
node2
node3
node4
#node5

[k8s-cluster:children]
kube-node
kube-master

Installation is successful, no apparent errors:

localhost                  : ok=3    changed=0    unreachable=0    failed=0   
node0                      : ok=403  changed=103  unreachable=0    failed=0   
node1                      : ok=369  changed=98   unreachable=0    failed=0   
node2                      : ok=309  changed=75   unreachable=0    failed=0   
node3                      : ok=284  changed=67   unreachable=0    failed=0   
node4                      : ok=285  changed=67   unreachable=0    failed=0   

Wednesday 13 September 2017  17:59:31 -0300 (0:00:01.752)       0:21:04.205 *** 
=============================================================================== 
download : Download containers if pull is required or told to always pull - 504.96s
bootstrap-os : Bootstrap | Install python 2.x and pip ------------------ 52.44s
kubernetes/master : Master | wait for the apiserver to be running ------ 39.50s
docker : ensure docker packages are installed -------------------------- 33.83s
download : Download containers if pull is required or told to always pull -- 20.91s
download : Download containers if pull is required or told to always pull -- 17.87s
docker : ensure docker repository is enabled --------------------------- 13.79s
download : Download containers if pull is required or told to always pull -- 12.96s
etcd : wait for etcd up ------------------------------------------------ 12.49s
kubernetes/preinstall : Install packages requirements ------------------ 12.20s
kubernetes-apps/network_plugin/weave : Weave | wait for weave to become available -- 11.56s
etcd : reload etcd ----------------------------------------------------- 10.68s
download : Download containers if pull is required or told to always pull -- 10.29s
docker : Docker | pause while Docker restarts -------------------------- 10.06s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources ------------- 9.91s
bootstrap-os : Bootstrap | Check if bootstrap is needed ----------------- 9.86s
download : Download containers if pull is required or told to always pull --- 9.57s
download : Download containers if pull is required or told to always pull --- 9.44s
download : Download containers if pull is required or told to always pull --- 9.28s
download : Download containers if pull is required or told to always pull --- 9.27s

There are two masters in my installation, and kubectl seems to work well on both infra00-lab and infra01-lab.

root@infra00-lab:~# kubectl get nodes
NAME          STATUS    AGE       VERSION
infra00-lab   Ready     12m       v1.7.4+coreos.0
root@infra00-lab:~# 

root@infra01-lab:~# kubectl get nodes
NAME          STATUS    AGE       VERSION
infra00-lab   Ready     3h        v1.7.4+coreos.0
root@infra01-lab:~#

However, only one node appears from a kubectl get nodes.

Oddly, all necessary pods seem to be running.

root@infra01-lab:~# kubectl get pods --all-namespaces -o wide
2017-09-13 18:13:27.295541 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-13 18:13:27.295742 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-13 18:13:27.295788 I | proto: duplicate proto type registered: google.protobuf.Timestamp
NAMESPACE     NAME                                  READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   kube-apiserver-infra00-lab            1/1       Running   0          3h        172.31.134.110   infra00-lab
kube-system   kube-apiserver-infra01-lab            1/1       Running   0          9s        172.31.134.111   infra01-lab
kube-system   kube-controller-manager-infra00-lab   1/1       Running   0          3h        172.31.134.110   infra00-lab
kube-system   kube-controller-manager-infra01-lab   1/1       Running   0          9s        172.31.134.111   infra01-lab
kube-system   kube-dns-3888408129-2njkj             3/3       Running   0          3h        10.233.112.1     infra00-lab
kube-system   kube-proxy-infra00-lab                1/1       Running   0          3h        172.31.134.110   infra00-lab
kube-system   kube-proxy-infra01-lab                1/1       Running   0          9s        172.31.134.111   infra01-lab
kube-system   kube-proxy-infra02-lab                1/1       Running   0          9s        172.31.134.112   infra02-lab
kube-system   kube-proxy-infra03-lab                1/1       Running   0          9s        172.31.134.91    infra03-lab
kube-system   kube-proxy-infra04-lab                1/1       Running   0          9s        172.31.134.92    infra04-lab
kube-system   kube-scheduler-infra00-lab            1/1       Running   0          3h        172.31.134.110   infra00-lab
kube-system   kube-scheduler-infra01-lab            1/1       Running   0          9s        172.31.134.111   infra01-lab
kube-system   nginx-proxy-infra02-lab               1/1       Running   0          9s        172.31.134.112   infra02-lab
kube-system   nginx-proxy-infra03-lab               1/1       Running   0          9s        172.31.134.91    infra03-lab
kube-system   nginx-proxy-infra04-lab               1/1       Running   0          9s        172.31.134.92    infra04-lab
kube-system   tiller-deploy-1046433508-p0b88        0/1       Pending   0          3h        <none>           <none>
kube-system   weave-net-35s3r                       2/2       Running   0          3h        172.31.134.110   infra00-lab

Anyone knows what could be causing this?

juliohm1978 on 13 Sep 2017

To be fair, I just noticed that pods are reporting an ambiguous status, Sometimes Running. Sometimes Pending. On and off randomly.

juliohm1978 on 13 Sep 2017

mageshmcc on 16 Sep 2017

👍5

After much testing on my side, I found out that my vsphere cloud configuration was causing the issue. It was misconfigured with a wrong working-dir parameter.

With vsphere incorrectly configured, kubernetes is unable to keep cluster nodes registered by kubelet. As far as I could figure this out, if kube controller manager is unable to find the node VM in the vphere API, it will discard the node and remove it from the etcd database.

Sadly, I was never able to fix the broken cluster. Once installed incorrectly, I couldn't figure out a way to clean up this configuration (probably some garbage left on the etcd database). I had to revert snapshots for all nodes and reinstall using the correct vsphere parameters.

juliohm1978 on 16 Sep 2017

@juliohm when using a cloud provider, your hosts must match the instance name 100% against what vSphere api reports. Any nodes that do not match are considered unauthorized hosts.

mattymo on 16 Sep 2017

@mattymo I think you meant @juliohm1978 ?

juliohm on 16 Sep 2017

Sorry, yes

mattymo on 16 Sep 2017

Solved

: I was facing this issue, because my docker cgroup driver was different than kubernetes cgroup driver.
Just updated it to cgroupfs using following commands mentioned in doc.
cat << EOF > /etc/docker/daemon.json { "exec-opts": ["native.cgroupdriver=cgroupfs"] } EOF

Restart docker service service docker restart.
Reset kubernetes on slave node: kubeadm reset
Joined master again: kubeadm join <><>

It was visible on master using kubectl get nodes

AmreeshTyagi on 24 Feb 2018

👍1

@AmreeshTyagi I add the following to daemon.json manually and restart docker which fails. Can you please let me know how you resolved it?
{ "exec-opts": ["native.cgroupdriver=cgroupfs"] }

verizonold on 12 Mar 2018

[vagrant@kubeadm-worker0 ~]$ sudo service docker restart
Redirecting to /bin/systemctl restart docker.service
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
[vagrant@kubeadm-worker0 ~]$ journalctl -xe
No journal files were found.
-- No entries --
[vagrant@kubeadm-worker0 ~]$

verizonold on 12 Mar 2018

[vagrant@kubeadm-worker0 ~]$ !19
systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2018-03-12 18:39:54 UTC; 1min 10s ago
Docs: http://docs.docker.com
Process: 23803 ExecStart=/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --seccomp-profile=/etc/docker/seccomp.json $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY $REGISTRIES (code=exited, status=1/FAILURE)
Main PID: 23803 (code=exited, status=1/FAILURE)
[vagrant@kubeadm-worker0 ~]$