[mlsm@openshift-master kubespray]$ cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)
[mlsm@openshift-master kubespray]$
PLAY RECAP **************************************************************
kubemaster : ok=348 changed=24 unreachable=0 failed=0
kubeslave1 : ok=277 changed=11 unreachable=0 failed=0
kubeslave2 : ok=264 changed=9 unreachable=0 failed=0
localhost : ok=3 changed=0 unreachable=0 failed=0
kubernetes/preinstall : Update package management cache (YUM) ---------- 12.71s
network_plugin/calico : Calico | Copy cni plugins from calico/cni container --- 9.12s
kubernetes/master : Master | wait for the apiserver to be running ------- 7.50s
kubernetes/master : Copy kubectl from hyperkube container --------------- 6.90s
kubernetes/preinstall : Install packages requirements ------------------- 4.98s
etcd : wait for etcd up ------------------------------------------------- 4.92s
network_plugin/calico : Calico | Copy cni plugins from hyperkube -------- 4.56s
kubernetes-apps/ansible : Kubernetes Apps | Lay Down KubeDNS Template --- 2.64s
download : Register docker images info ---------------------------------- 2.21s
download : Register docker images info ---------------------------------- 2.06s
kubernetes/master : Install kubectl bash completion --------------------- 1.93s
etcd : Backup etcd v3 data ---------------------------------------------- 1.89s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources ------------- 1.79s
download : Register docker images info ---------------------------------- 1.74s
download : Register docker images info ---------------------------------- 1.73s
download : Register docker images info ---------------------------------- 1.73s
download : Register docker images info ---------------------------------- 1.71s
download : Register docker images info ---------------------------------- 1.67s
download : Register docker images info ---------------------------------- 1.65s
download : Register docker images info ---------------------------------- 1.63s
[mlsm@10 ~]$ kubectl get nodes
NAME STATUS AGE VERSION
10 Ready 23h v1.6.7+coreos.0
[mlsm@10 ~]$
kubemaster ansible_ssh_host=10.10.5.14 #ip=10.3.0.1
kubeslave1 ansible_ssh_host=10.10.5.13 #ip=10.3.0.2
kubeslave2 ansible_ssh_host=10.10.5.11 #ip=10.3.0.3
[kube-master]
kubemaster
[etcd]
kubemaster
[kube-node]
kubeslave1
kubeslave2
[k8s-cluster:children]
kube-node
kube-master
Just ran into this issue today using the latest v2.2.0 release. Also happened with a somewhat recent master branch.
Inventory:
# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
node0 ansible_ssh_user=root ansible_ssh_host=infra00-lab ip=172.31.134.110
node1 ansible_ssh_user=root ansible_ssh_host=infra01-lab ip=172.31.134.111
node2 ansible_ssh_user=root ansible_ssh_host=infra02-lab ip=172.31.134.112
node3 ansible_ssh_user=root ansible_ssh_host=infra03-lab ip=172.31.134.91
node4 ansible_ssh_user=root ansible_ssh_host=infra04-lab ip=172.31.134.92
#node5 ansible_ssh_user=root ansible_ssh_host=infra05-lab ip=172.31.134.93
[kube-master]
node0
node1
[etcd]
node0
node1
[kube-node]
node2
node3
node4
#node5
[k8s-cluster:children]
kube-node
kube-master
Installation is successful, no apparent errors:
localhost : ok=3 changed=0 unreachable=0 failed=0
node0 : ok=403 changed=103 unreachable=0 failed=0
node1 : ok=369 changed=98 unreachable=0 failed=0
node2 : ok=309 changed=75 unreachable=0 failed=0
node3 : ok=284 changed=67 unreachable=0 failed=0
node4 : ok=285 changed=67 unreachable=0 failed=0
Wednesday 13 September 2017 17:59:31 -0300 (0:00:01.752) 0:21:04.205 ***
===============================================================================
download : Download containers if pull is required or told to always pull - 504.96s
bootstrap-os : Bootstrap | Install python 2.x and pip ------------------ 52.44s
kubernetes/master : Master | wait for the apiserver to be running ------ 39.50s
docker : ensure docker packages are installed -------------------------- 33.83s
download : Download containers if pull is required or told to always pull -- 20.91s
download : Download containers if pull is required or told to always pull -- 17.87s
docker : ensure docker repository is enabled --------------------------- 13.79s
download : Download containers if pull is required or told to always pull -- 12.96s
etcd : wait for etcd up ------------------------------------------------ 12.49s
kubernetes/preinstall : Install packages requirements ------------------ 12.20s
kubernetes-apps/network_plugin/weave : Weave | wait for weave to become available -- 11.56s
etcd : reload etcd ----------------------------------------------------- 10.68s
download : Download containers if pull is required or told to always pull -- 10.29s
docker : Docker | pause while Docker restarts -------------------------- 10.06s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources ------------- 9.91s
bootstrap-os : Bootstrap | Check if bootstrap is needed ----------------- 9.86s
download : Download containers if pull is required or told to always pull --- 9.57s
download : Download containers if pull is required or told to always pull --- 9.44s
download : Download containers if pull is required or told to always pull --- 9.28s
download : Download containers if pull is required or told to always pull --- 9.27s
There are two masters in my installation, and kubectl seems to work well on both infra00-lab and infra01-lab.
root@infra00-lab:~# kubectl get nodes
NAME STATUS AGE VERSION
infra00-lab Ready 12m v1.7.4+coreos.0
root@infra00-lab:~#
root@infra01-lab:~# kubectl get nodes
NAME STATUS AGE VERSION
infra00-lab Ready 3h v1.7.4+coreos.0
root@infra01-lab:~#
However, only one node appears from a kubectl get nodes.
Oddly, all necessary pods seem to be running.
root@infra01-lab:~# kubectl get pods --all-namespaces -o wide
2017-09-13 18:13:27.295541 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-13 18:13:27.295742 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-13 18:13:27.295788 I | proto: duplicate proto type registered: google.protobuf.Timestamp
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system kube-apiserver-infra00-lab 1/1 Running 0 3h 172.31.134.110 infra00-lab
kube-system kube-apiserver-infra01-lab 1/1 Running 0 9s 172.31.134.111 infra01-lab
kube-system kube-controller-manager-infra00-lab 1/1 Running 0 3h 172.31.134.110 infra00-lab
kube-system kube-controller-manager-infra01-lab 1/1 Running 0 9s 172.31.134.111 infra01-lab
kube-system kube-dns-3888408129-2njkj 3/3 Running 0 3h 10.233.112.1 infra00-lab
kube-system kube-proxy-infra00-lab 1/1 Running 0 3h 172.31.134.110 infra00-lab
kube-system kube-proxy-infra01-lab 1/1 Running 0 9s 172.31.134.111 infra01-lab
kube-system kube-proxy-infra02-lab 1/1 Running 0 9s 172.31.134.112 infra02-lab
kube-system kube-proxy-infra03-lab 1/1 Running 0 9s 172.31.134.91 infra03-lab
kube-system kube-proxy-infra04-lab 1/1 Running 0 9s 172.31.134.92 infra04-lab
kube-system kube-scheduler-infra00-lab 1/1 Running 0 3h 172.31.134.110 infra00-lab
kube-system kube-scheduler-infra01-lab 1/1 Running 0 9s 172.31.134.111 infra01-lab
kube-system nginx-proxy-infra02-lab 1/1 Running 0 9s 172.31.134.112 infra02-lab
kube-system nginx-proxy-infra03-lab 1/1 Running 0 9s 172.31.134.91 infra03-lab
kube-system nginx-proxy-infra04-lab 1/1 Running 0 9s 172.31.134.92 infra04-lab
kube-system tiller-deploy-1046433508-p0b88 0/1 Pending 0 3h <none> <none>
kube-system weave-net-35s3r 2/2 Running 0 3h 172.31.134.110 infra00-lab
Anyone knows what could be causing this?
To be fair, I just noticed that pods are reporting an ambiguous status, Sometimes Running. Sometimes Pending. On and off randomly.
In my setup, Since I had all the vm's cloned to create multiple hosts for k8s cluster, all my vm's had the same hostname in "/etc/hosts" and hence, "kubectl get nodes" was showing all of the nodes as one single entry, after changing each nodes with a different hostname in "/etc/hosts" individually and reinstalling k8s using kubespay, It worked fine for me :-)
After much testing on my side, I found out that my vsphere cloud configuration was causing the issue. It was misconfigured with a wrong working-dir parameter.
With vsphere incorrectly configured, kubernetes is unable to keep cluster nodes registered by kubelet. As far as I could figure this out, if kube controller manager is unable to find the node VM in the vphere API, it will discard the node and remove it from the etcd database.
Sadly, I was never able to fix the broken cluster. Once installed incorrectly, I couldn't figure out a way to clean up this configuration (probably some garbage left on the etcd database). I had to revert snapshots for all nodes and reinstall using the correct vsphere parameters.
@juliohm when using a cloud provider, your hosts must match the instance name 100% against what vSphere api reports. Any nodes that do not match are considered unauthorized hosts.
@mattymo I think you meant @juliohm1978 ?
Sorry, yes
Solved
: I was facing this issue, because my docker cgroup driver was different than kubernetes cgroup driver.
Just updated it to cgroupfs using following commands mentioned in doc.
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
EOF
Restart docker service service docker restart.
Reset kubernetes on slave node: kubeadm reset
Joined master again: kubeadm join <><>
It was visible on master using kubectl get nodes
@AmreeshTyagi I add the following to daemon.json manually and restart docker which fails. Can you please let me know how you resolved it?
{ "exec-opts": ["native.cgroupdriver=cgroupfs"] }
[vagrant@kubeadm-worker0 ~]$ sudo service docker restart
Redirecting to /bin/systemctl restart docker.service
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
[vagrant@kubeadm-worker0 ~]$ journalctl -xe
No journal files were found.
-- No entries --
[vagrant@kubeadm-worker0 ~]$
[vagrant@kubeadm-worker0 ~]$ !19
systemctl status docker.service
โ docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2018-03-12 18:39:54 UTC; 1min 10s ago
Docs: http://docs.docker.com
Process: 23803 ExecStart=/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --seccomp-profile=/etc/docker/seccomp.json $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY $REGISTRIES (code=exited, status=1/FAILURE)
Main PID: 23803 (code=exited, status=1/FAILURE)
[vagrant@kubeadm-worker0 ~]$
Most helpful comment
In my setup, Since I had all the vm's cloned to create multiple hosts for k8s cluster, all my vm's had the same hostname in "/etc/hosts" and hence, "kubectl get nodes" was showing all of the nodes as one single entry, after changing each nodes with a different hostname in "/etc/hosts" individually and reinstalling k8s using kubespay, It worked fine for me :-)