What happened:
kube-scheduler and kube-controller pods fail due to liveness checking not works. Liveness checking does not work because healtz check entrypoints for these pod was removed in 1.16.13 k8s (for kube-scheduler pod http://127.0.0.1:10251/healthz and for kube-controller pod http://127.0.0.1:10252/healthz )
What you expected to happen:
I expect k8s pods manifests will not contain liveness check is containers don’t have entry points for them.
How to reproduce it (as minimally and precisely as possible):
Deploy k8s using kubespray release-2.12 (https://github.com/kubernetes-sigs/kubespray/tree/release-2.12) with default k8s version.
Anything else we need to know?:
—
Environment:
cat /etc/os-release:CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Version of Ansible (ansible --version):
ansible 2.7.16
config file = None
configured module search path = ['/home/centos/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.6.8 (default, Apr 2 2020, 13:34:55) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
Version of Python (python --version):
[centos@ip-172-31-15-227 ~]$ python --version
Python 2.7.5
Kubespray version (commit) (git rev-parse --short HEAD):
2acc5a7
Network plugin used:
Tungsten Fabric, Calico
Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):
all:
hosts:
node1:
ansible_host: 172.31.15.227
ip: 172.31.15.227
access_ip: 172.31.15.227
children:
kube-master:
hosts:
node1:
kube-node:
hosts:
node1:
etcd:
hosts:
node1:
k8s-cluster:
children:
kube-master:
kube-node:
calico-rr:
hosts: {}
Command used to invoke ansible:
ansible-playbook -i inventory/mycluster/hosts.yml --become --become-user=root cluster.yml -e kube_pods_subnet=10.32.0.0/12 -e kube_service_addresses=10.96.0.0/12
Additionally. The report from k8s repo about this bug. The ask me to report here with this issue: https://github.com/kubernetes/kubernetes/issues/93746
❯ kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
etcd-1 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
I have the same issue. "workaround" is to delete that port flag from the kubernetes manifests, but would be happy to have a better fix. Happened after i upgraded to Kubernetes 1.17.9 and release 2.13 a few days back.
sed -i '/- --port=0/d' /etc/kubernetes/manifests/kube-controller-manager.yaml
sed -i '/- --port=0/d' /etc/kubernetes/manifests/kube-scheduler.yaml
same issue here after upgrade from v1.18.5 to v1.18.6
Edit: Reproduced also on a clean install (v2.14.0)
Server Version: v1.18.8 on Debian10
Output
$ kubectl get componentstatus
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
Cluster seems to work fine, though.
Hi, i´m having the same issue in the master, this worked for me
sed -i '/- --port=0/d' /etc/kubernetes/manifests/kube-controller-manager.yaml
sed -i '/- --port=0/d' /etc/kubernetes/manifests/kube-scheduler.yaml
But when running again cluster.yml this confs are not persisted
Seems to be fixed in Kubernetes 1.16.14: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.16.md#changelog-since-v11613
Fixed a regression in kubeadm manifests for kube-scheduler and kube-controller-manager which caused continuous restarts because of failing health checks (#93208, @SataQiu) [SIG Cluster Lifecycle]
I will create a PR for using the fixed 1.16.14 version very soon.
Until then everybody should also be able to just fix the livenessprobe instead of reenabling the insecure liveness check ports e.g. with something like a basic playbook like
- hosts: kube-master
gather_facts: false
tasks:
- name: kube-controller-manager - Use secure port for liveness probe
replace:
path: /etc/kubernetes/manifests/kube-controller-manager.yaml
regexp: '10252'
replace: '10257'
- name: kube-controller-manager - Use HTTPS for liveness probe
replace:
path: /etc/kubernetes/manifests/kube-controller-manager.yaml
regexp: 'scheme: HTTP$'
replace: 'scheme: HTTPS'
- name: Wait a few seconds as too fast updates don't tear down the previous version correctly
pause:
seconds: 10
- name: kube-scheduler - Use secure port for liveness probe
replace:
path: /etc/kubernetes/manifests/kube-scheduler.yaml
regexp: '10251'
replace: '10259'
- name: kube-scheduler - Use HTTPS for liveness probe
replace:
path: /etc/kubernetes/manifests/kube-scheduler.yaml
regexp: 'scheme: HTTP$'
replace: 'scheme: HTTPS'
❯ kubectl get componentstatuses NAME STATUS MESSAGE ERROR scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused etcd-1 Healthy {"health":"true"} etcd-0 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"}I have the same issue. "workaround" is to delete that port flag from the kubernetes manifests, but would be happy to have a better fix. Happened after i upgraded to Kubernetes 1.17.9 and release 2.13 a few days back.
sed -i '/- --port=0/d' /etc/kubernetes/manifests/kube-controller-manager.yaml sed -i '/- --port=0/d' /etc/kubernetes/manifests/kube-scheduler.yaml
Thanks!
Workaround works for me.
Most helpful comment
I have the same issue. "workaround" is to delete that port flag from the kubernetes manifests, but would be happy to have a better fix. Happened after i upgraded to Kubernetes 1.17.9 and release 2.13 a few days back.