Version:
k3s version v1.0.0 (18bd921c)
Describe the bug
I have a cluster that consists of 1 master and 3 workers, after I unplugged 3 workers none of running pods were reassigned to master from workers and Kubectl claims that they are alive:
➜ ~ kubectl get nodes
NAME STATUS ROLES AGE VERSION
worker2 NotReady node 15d v1.16.3-k3s.2
worker1 NotReady node 15d v1.16.3-k3s.2
worker3 NotReady node 15d v1.16.3-k3s.2
master Ready master 16d v1.16.3-k3s.2
➜ ~ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system metrics-server-6d684c7b5-8fzld 1/1 Running 29 16d 10.42.0.139 master <none> <none>
metallb-system speaker-lv7cq 1/1 Running 7 3d2h 192.168.0.201 master <none> <none>
default nginx-1-775985c86-4q5xq 1/1 Running 18 5d7h 10.42.0.142 master <none> <none>
kube-system coredns-d798c9dd-f2wrb 1/1 Running 28 16d 10.42.0.140 master <none> <none>
kube-system local-path-provisioner-58fb86bdfd-8sbzq 1/1 Running 4 32h 10.42.0.141 master <none> <none>
kubernetes-dashboard kubernetes-dashboard-5996555fd8-k684f 1/1 Running 23 15d 10.42.2.59 worker2 <none> <none>
metallb-system speaker-hdq7h 1/1 Running 5 3d2h 192.168.0.203 worker2 <none> <none>
kube-system nginx-nginx-ingress-controller-595c6b856c-m6997 1/1 Running 3 31h 10.42.2.58 worker2 <none> <none>
kube-system nginx-nginx-ingress-default-backend-6595d9d88b-vff2c 1/1 Running 2 30h 10.42.1.59 worker1 <none> <none>
metallb-system speaker-54h22 1/1 Running 5 3d2h 192.168.0.202 worker1 <none> <none>
metallb-system controller-57967b9448-mjgcb 1/1 Running 5 3d2h 10.42.1.60 worker1 <none> <none>
kubernetes-dashboard dashboard-metrics-scraper-76585494d8-bzccd 1/1 Running 31 15d 10.42.1.58 worker1 <none> <none>
metallb-system speaker-grzfq 1/1 Running 6 3d2h 192.168.0.204 worker3 <none> <none>
md5-e1a9f850257694a3de60a4285503b623
pi@master:~ $ sudo journalctl -u k3s --since "20 minutes ago"
-- Logs begin at Wed 2020-01-01 22:17:01 CET, end at Thu 2020-01-02 21:30:46 CET. --
Jan 02 21:11:02 master k3s[550]: time="2020-01-02T21:11:02.232431780+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:11:02 master k3s[550]: E0102 21:11:02.273803 550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:11:07 master k3s[550]: I0102 21:11:07.220756 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:12:07 master k3s[550]: I0102 21:12:07.249238 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:12:24 master k3s[550]: time="2020-01-02T21:12:24.244186839+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:12:24 master k3s[550]: E0102 21:12:24.293070 550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:13:07 master k3s[550]: I0102 21:13:07.278093 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:14:07 master k3s[550]: I0102 21:14:07.292650 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.490492 550 controller.go:606] quota admission added evaluator for: replicasets.apps
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.505901 550 event.go:274] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"default", Name:"nginx-1", UID:"9f62d494-264e-43af
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.557512 550 event.go:274] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"nginx-1-775985c86", UID:"4984c18f
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.570165 550 event.go:274] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"nginx-1-775985c86", UID:"4984c18f
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.662089 550 event.go:274] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"default", Name:"nginx-1", UID:"55f3f3a9-ba7c-44de-
Jan 02 21:14:42 master k3s[550]: E0102 21:14:42.365014 550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:15:07 master k3s[550]: I0102 21:15:07.319919 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:15:08 master k3s[550]: time="2020-01-02T21:15:08.153599154+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:15:08 master k3s[550]: E0102 21:15:08.180379 550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:16:07 master k3s[550]: I0102 21:16:07.334593 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:17:04 master k3s[550]: time="2020-01-02T21:17:04.958137785+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.203:49916: i/o
Jan 02 21:17:07 master k3s[550]: I0102 21:17:07.362715 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:17:29 master k3s[550]: time="2020-01-02T21:17:29.298245766+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.202:35906: i/o
Jan 02 21:17:30 master k3s[550]: time="2020-01-02T21:17:30.580422869+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.201:59938: i/o
Jan 02 21:17:30 master k3s[550]: time="2020-01-02T21:17:30.580988438+01:00" level=error msg="Remotedialer proxy error" error="read tcp 192.168.0.201:59938->192.168.0.201:6443: i/o t
Jan 02 21:17:31 master k3s[550]: time="2020-01-02T21:17:31.605583377+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.204:52256: i/o
Jan 02 21:17:34 master k3s[550]: I0102 21:17:34.106828 550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker2", UID:"b2288a80-db5b-43be-9c00-4330be9
Jan 02 21:17:35 master k3s[550]: time="2020-01-02T21:17:35.587707038+01:00" level=info msg="Connecting to proxy" url="wss://192.168.0.201:6443/v1-k3s/connect"
Jan 02 21:17:37 master k3s[550]: E0102 21:17:37.476561 550 pod_workers.go:191] Error syncing pod ed9ae66f-71eb-49b0-b0d3-05dedd447d5f ("local-path-provisioner-58fb86bdfd-8sbzq_k
Jan 02 21:17:38 master k3s[550]: time="2020-01-02T21:17:38.200829872+01:00" level=error msg="Failed to connect to proxy" error="dial tcp 192.168.0.201:6443: connect: no route to hos
Jan 02 21:17:38 master k3s[550]: time="2020-01-02T21:17:38.200932464+01:00" level=error msg="Remotedialer proxy error" error="dial tcp 192.168.0.201:6443: connect: no route to host"
Jan 02 21:17:39 master k3s[550]: E0102 21:17:39.318397 550 resource_quota_controller.go:407] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the ser
Jan 02 21:17:39 master k3s[550]: E0102 21:17:39.823064 550 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.233.67
Jan 02 21:17:42 master k3s[550]: W0102 21:17:42.211175 550 garbagecollector.go:640] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to
Jan 02 21:17:43 master k3s[550]: time="2020-01-02T21:17:43.201207406+01:00" level=info msg="Connecting to proxy" url="wss://192.168.0.201:6443/v1-k3s/connect"
Jan 02 21:17:44 master k3s[550]: time="2020-01-02T21:17:44.521597124+01:00" level=error msg="Failed to connect to proxy" error="dial tcp 192.168.0.201:6443: connect: no route to hos
Jan 02 21:17:44 master k3s[550]: time="2020-01-02T21:17:44.521784474+01:00" level=error msg="Remotedialer proxy error" error="dial tcp 192.168.0.201:6443: connect: no route to host"
Jan 02 21:17:44 master k3s[550]: E0102 21:17:44.827117 550 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.233.67
Jan 02 21:17:45 master k3s[550]: E0102 21:17:45.545554 550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:17:47 master k3s[550]: E0102 21:17:47.649744 550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:17:47 master k3s[550]: time="2020-01-02T21:17:47.843063318+01:00" level=info msg="Handling backend connection request [worker2]"
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252599 550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"nginx-nginx-ingress-controller-595c6
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252704 550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"helm-install-traefik-7k7qv", UID:"",
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252742 550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kubernetes-dashboard", Name:"kubernetes-dashboard-599655
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252776 550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"helm-install-traefik-vgn7b", UID:"",
Jan 02 21:17:49 master k3s[550]: time="2020-01-02T21:17:49.522014337+01:00" level=info msg="Connecting to proxy" url="wss://192.168.0.201:6443/v1-k3s/connect"
Jan 02 21:17:49 master k3s[550]: time="2020-01-02T21:17:49.557293978+01:00" level=info msg="Handling backend connection request [master]"
Jan 02 21:17:49 master k3s[550]: time="2020-01-02T21:17:49.767827024+01:00" level=info msg="Handling backend connection request [worker1]"
Jan 02 21:17:52 master k3s[550]: time="2020-01-02T21:17:52.492071111+01:00" level=info msg="Handling backend connection request [worker3]"
Jan 02 21:17:56 master k3s[550]: E0102 21:17:56.271504 550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:17:58 master k3s[550]: E0102 21:17:58.666991 550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:18:02 master k3s[550]: time="2020-01-02T21:18:02.494515040+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.204:52364: i/o
Jan 02 21:18:04 master k3s[550]: time="2020-01-02T21:18:04.771525434+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.202:36488: i/o
Jan 02 21:18:07 master k3s[550]: I0102 21:18:07.390554 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:18:07 master k3s[550]: time="2020-01-02T21:18:07.844839055+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.203:51336: i/o
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.293129 550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker2", UID:"b2288a80-db5b-43be-9c00-4330be9
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.442367 550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker1", UID:"f4226dde-8c79-473b-88c3-9d65ffa
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.782884 550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker3", UID:"23d267c0-0917-4852-82da-830b9e9
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.927947 550 node_lifecycle_controller.go:1058] Controller detected that all Nodes are not-Ready. Entering master disruption mode.
Jan 02 21:18:40 master k3s[550]: E0102 21:18:40.072909 550 daemon_controller.go:302] metallb-system/speaker failed with : error storing status for daemon set &v1.DaemonSet{TypeM
Jan 02 21:18:40 master k3s[550]: E0102 21:18:40.156328 550 daemon_controller.go:302] metallb-system/speaker failed with : error storing status for daemon set &v1.DaemonSet{TypeM
Jan 02 21:19:07 master k3s[550]: I0102 21:19:07.418583 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:19:42 master k3s[550]: E0102 21:19:42.370539 550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:20:07 master k3s[550]: I0102 21:20:07.434826 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:20:35 master k3s[550]: time="2020-01-02T21:20:35.907270451+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:20:35 master k3s[550]: E0102 21:20:35.950947 550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:21:07 master k3s[550]: I0102 21:21:07.463278 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:22:07 master k3s[550]: I0102 21:22:07.512476 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:23:07 master k3s[550]: I0102 21:23:07.549374 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:24:07 master k3s[550]: I0102 21:24:07.564454 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:24:42 master k3s[550]: E0102 21:24:42.365024 550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:25:07 master k3s[550]: I0102 21:25:07.579445 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:26:07 master k3s[550]: I0102 21:26:07.594475 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:27:07 master k3s[550]: I0102 21:27:07.623520 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:27:30 master k3s[550]: I0102 21:27:30.102377 550 node_lifecycle_controller.go:1085] Controller detected that some Nodes are Ready. Exiting master disruption mode.
Jan 02 21:27:30 master k3s[550]: time="2020-01-02T21:27:30.185484725+01:00" level=info msg="Handling backend connection request [worker2]"
Jan 02 21:28:07 master k3s[550]: I0102 21:28:07.664740 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:29:07 master k3s[550]: I0102 21:29:07.699334 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:29:42 master k3s[550]: E0102 21:29:42.365156 550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:30:07 master k3s[550]: I0102 21:30:07.713346 550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
I restarted everything and booted only master to get the same result
I waited 30 minutes but nothing happen, I managed to drain nodes manually with this script:
#!/bin/bash
KUBECTL="/usr/local/bin/kubectl"
NOT_READY_NODES=$($KUBECTL get nodes | grep 'NotReady' | awk '{print $1}')
while IFS= read -r line; do
if [[ ! $line =~ [^[:space:]] ]] ; then
continue
fi
echo "Found $line node to be dead, draining..."
$KUBECTL drain --ignore-daemonsets --force $line
done <<< "$NOT_READY_NODES"
READY_NODES=$(kubectl get nodes | grep '\sReady,SchedulingDisabled' | awk '{print $1}')
while IFS= read -r line; do
if [[ ! $line =~ [^[:space:]] ]] ; then
continue
fi
echo "Found $line node to be online again, undraining..."
$KUBECTL uncordon $line
done <<< "$READY_NODES"
although this script should never be needed, the whole point of Kubernetes is to have ability to self healing
I found that when you delete a NotReady node it will actually reassign pods but worker gets added to the cluster only after k3s-agent service is rebooted
I powered off a worker node (worker2) on my 3 node Raspberry PI4 cluster running Rook/Ceph some 3.5 hours ago and my cluster still has not really recovered. If we overlook the the Wordpress failure due to the fact that the the new instance cannot bind to the pvc because still thinks there is a claim from the terminating instance on the powered off node the k3s provisioned traefik lb instance is still listed as terminating and hanging there.
The things that have recovered are the ones (mostly rook) that do not have a pvc so even though the instances on the failed node are still listed as terminating it does not stop the new instances coming up.
Am I missing something here regarding Kunernetes node failure.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/helm-install-traefik-2zd8t 0/1 Completed 0 11d 10.42.0.3 master <none> <none>
kubernetes-dashboard pod/kubernetes-dashboard-544f4d6b8c-4bmbm 1/1 Running 1 2d 10.42.1.127 worker1 <none> <none>
kubernetes-dashboard pod/dashboard-metrics-scraper-744c77948-n2z5w 1/1 Running 1 2d 10.42.1.126 worker1 <none> <none>
kube-system pod/svclb-traefik-zq5sw 3/3 Running 30 11d 10.42.1.128 worker1 <none> <none>
cert-manager pod/cert-manager-5c47f46f57-ww4ql 1/1 Running 1 45h 10.42.0.114 master <none> <none>
kube-system pod/metrics-server-6d684c7b5-pgmtf 1/1 Running 9 11d 10.42.0.117 master <none> <none>
kube-system pod/local-path-provisioner-58fb86bdfd-xxkr9 1/1 Running 9 11d 10.42.0.118 master <none> <none>
kube-system pod/svclb-traefik-q6tx6 3/3 Running 27 11d 10.42.0.115 master <none> <none>
cert-manager pod/cert-manager-webhook-547567b88f-4nhx9 1/1 Running 1 45h 10.42.0.112 master <none> <none>
kube-system pod/coredns-d798c9dd-b5h2l 1/1 Running 9 11d 10.42.0.119 master <none> <none>
kube-system pod/traefik-65bccdc4bd-2qglj 1/1 Running 9 11d 10.42.0.116 master <none> <none>
rook-ceph pod/rook-discover-dthqw 1/1 Running 0 17h 10.42.0.120 master <none> <none>
rook-ceph pod/rook-discover-jb5gm 1/1 Running 0 17h 10.42.1.129 worker1 <none> <none>
rook-ceph pod/rook-ceph-agent-fhct7 1/1 Running 0 17h 192.168.10.107 worker1 <none> <none>
rook-ceph pod/rook-ceph-agent-wkl5s 1/1 Running 0 17h 192.168.10.102 master <none> <none>
rook-ceph pod/rook-ceph-mon-a-7987b7749c-dqhv9 1/1 Running 0 17h 10.42.1.132 worker1 <none> <none>
rook-ceph pod/rook-ceph-mon-c-59d7b8fb4d-7sqjj 1/1 Running 0 17h 10.42.0.122 master <none> <none>
rook-ceph pod/rook-ceph-crashcollector-worker1-6bbbbf6696-zxzqc 1/1 Running 0 17h 10.42.1.133 worker1 <none> <none>
rook-ceph pod/rook-ceph-crashcollector-master-8cf749cdc-zw6ph 1/1 Running 0 17h 10.42.0.123 master <none> <none>
rook-ceph pod/rook-ceph-osd-1-dbb578859-6rv64 1/1 Running 0 17h 10.42.1.135 worker1 <none> <none>
rook-ceph pod/rook-ceph-osd-2-6c7d9966cd-56ggs 1/1 Running 0 17h 10.42.0.125 master <none> <none>
rook-ceph pod/rook-ceph-tools-57d8bd875b-nzmdh 1/1 Running 0 17h 192.168.10.107 worker1 <none> <none>
default pod/adminer-69bcfb4764-bngsb 1/1 Running 0 15h 10.42.0.129 master <none> <none>
rook-cockroachdb-system pod/rook-cockroachdb-operator-784f89dcc5-hgzq7 1/1 Running 0 5h59m 10.42.0.130 master <none> <none>
default pod/mariadb-0 1/1 Running 0 4h4m 10.42.1.143 worker1 <none> <none>
kube-system pod/svclb-traefik-lxfjb 3/3 Running 21 10d 10.42.2.117 worker2 <none> <none>
rook-ceph pod/rook-discover-grnvm 1/1 Running 0 17h 10.42.2.119 worker2 <none> <none>
rook-ceph pod/rook-ceph-agent-5nz5d 1/1 Running 0 17h 192.168.10.95 worker2 <none> <none>
rook-ceph pod/rook-ceph-mgr-a-7f65b8f79f-kqzvw 1/1 Terminating 2 17h 10.42.2.122 worker2 <none> <none>
rook-ceph pod/rook-ceph-mgr-a-7f65b8f79f-p7vrh 1/1 Running 0 3h32m 10.42.1.144 worker1 <none> <none>
default pod/wordpress-6c7c6fcccf-8hsvc 1/1 Terminating 0 4h8m 10.42.2.134 worker2 <none> <none>
rook-ceph pod/rook-ceph-osd-0-6786789854-6qzd5 1/1 Terminating 0 17h 10.42.2.125 worker2 <none> <none>
rook-ceph pod/rook-ceph-mon-b-565bc66f97-64q84 1/1 Terminating 0 17h 10.42.2.121 worker2 <none> <none>
rook-ceph pod/rook-ceph-crashcollector-worker2-67895bf8df-f8cqr 1/1 Terminating 0 17h 10.42.2.126 worker2 <none> <none>
cert-manager pod/cert-manager-cainjector-6659d6844d-krnhk 1/1 Terminating 2 45h 10.42.2.116 worker2 <none> <none>
rook-ceph pod/rook-ceph-operator-6d794bf987-plntb 1/1 Terminating 0 17h 10.42.2.118 worker2 <none> <none>
rook-ceph pod/rook-ceph-mon-b-565bc66f97-gs8h5 0/1 Pending 0 3h27m <none> <none> <none> <none>
rook-ceph pod/rook-ceph-osd-0-6786789854-6v765 0/1 Pending 0 3h27m <none> <none> <none> <none>
default pod/wordpress-6c7c6fcccf-8mhdd 0/1 ContainerCreating 0 3h27m <none> worker1 <none> <none>
rook-ceph pod/rook-ceph-crashcollector-worker2-67895bf8df-5sksv 0/1 Pending 0 3h27m <none> <none> <none> <none>
rook-ceph pod/rook-ceph-operator-6d794bf987-bq6zm 1/1 Running 0 3h27m 10.42.0.133 master <none> <none>
cert-manager pod/cert-manager-cainjector-6659d6844d-7p7p5 1/1 Running 0 3h27m 10.42.1.145 worker1 <none> <none>
rook-ceph pod/rook-ceph-osd-prepare-master-bphzw 0/1 Completed 0 3h5m 10.42.0.135 master <none> <none>
rook-ceph pod/rook-ceph-osd-prepare-worker1-mdhmt 0/1 Completed 0 3h5m 10.42.1.146 worker1 <none> <none>
rook-ceph pod/rook-ceph-mon-d-canary-666965574c-62b2f 0/1 Pending 0 15m <none> <none> <none> <none>
NAMESPACE NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/worker2 NotReady <none> 10d v1.16.3-k3s.2 192.168.15.9 <none> Ubuntu 19.10 5.3.0-1014-raspi2 containerd://1.3.0-k3s.5
node/master Ready master 11d v1.16.3-k3s.2 192.168.15.10 <none> Ubuntu 19.10 5.3.0-1014-raspi2 containerd://1.3.0-k3s.5
node/worker1 Ready <none> 11d v1.16.3-k3s.2 192.168.15.11 <none> Ubuntu 19.10 5.3.0-1014-raspi2 containerd://1.3.0-k3s.5
So powering up the 'failed' node allowed all the 'terminating' instances to finally end, the Rook config sorted itself out and my Wordpress instance finally came back along with Certman as the pvc (on work press) finally was released,
I learned that there's a difference between having a node in NotReady state and deleting the node. When your node goes into NotReady state then Kubernetes will not reschedule running pods to other ones as Kubernetes cannot distinguish between node restart, network error or kubelet error. Kubernetes will reschedule pods only when it's sure that they are not running and just because node is in NotReady state does not mean that pods are not running, they might be running but just the fact that Kubernetes cannot communicate with kubelet does not mean that they are not running :/ It's really a bummer for me as
1/1 Running via kubectlAlthough that's just my point of view, it's really weird that k3s on it's own does not seem to support --pod-eviction-timeout flag which is 5 minutes by default
The script that I published cordons the faulty nodes, drains them and then eventually deletes them, it will uncordon the node once it's in Ready state. K3s seems to be rejoining the master only when it restarts though
Please see https://kubernetes.io/docs/concepts/architecture/nodes/, from that link:
In versions of Kubernetes prior to 1.5, the node controller would force delete these unreachable pods from the apiserver. However, in 1.5 and higher, the node controller does not force delete pods until it is confirmed that they have stopped running in the cluster. You can see the pods that might be running on an unreachable node as being in the
TerminatingorUnknownstate. In cases where Kubernetes cannot deduce from the underlying infrastructure if a node has permanently left a cluster, the cluster administrator may need to delete the node object by hand. Deleting the node object from Kubernetes causes all the Pod objects running on the node to be deleted from the apiserver, and frees up their names.
So pods stuck in a Terminating state but with a duplicate running on another node look to be expected. The --pod-eviction-timeout flag should be able to be set like:
k3s server --kube-controller-manager-arg pod-eviction-timeout=1m.
The key in the original issue is, "Controller detected that all Nodes are not-Ready. Entering master disruption mode.", looks to be related to https://github.com/kubernetes/kubernetes/issues/42733. If all of the nodes become Not Ready the controller manager may refuse to evict.
@erikwilson in my case none of the pods was in Terminating/Unknown state (it was the same when only one node was NotReady) and that issue was fixed? 🤔 Will set that --kube-controller-manager-arg pod-eviction-timeout=1m flag and see what happens
It looks like the expected behavior, also see from that docs link:
The corner case is when all zones are completely unhealthy (i.e. there are no healthy nodes in the cluster). In such case, the node controller assumes that there’s some problem with master connectivity and stops all evictions until some connectivity is restored.
@ericwilson
Ok thanks, that would tie in with last time I tried this as it was on an earlier version of Kubernetes and I was not aware of that change, also I think I have done this on a Rancher managed cluster with some node management options set so never had an issue.
Hi. I’m experiencing the same issue and mitigated it with the following script in my launch template user data:
kubectl get nodes |
awk -v "host=$(hostname)" '$1 != host && $2 == "NotReady" { print $1 }' |
xargs --no-run-if-empty kubectl delete node
So when one node goes down, the autoscaling group creates a new instance that will run the above script when booting.
I advise you to triple check that hostname returns the correct hostname for your nodes, otherwise you risk deleting the current node...
The node draining was not working and getting stuck forever since the target node was dead. So much for HA!
We're hitting this issue consistently as well and even trying to drain the node (which is Not Ready and Disabled for scheduling)
NAME STATUS ROLES AGE VERSION
node1 NotReady,SchedulingDisabled master 43m v1.17.3+k3s1
node2 Ready master 43m v1.17.3+k3s1
The pods from node1 stay in "Terminating" mode forever, until the node comes back up.
This is not just an issue. We have 1 Daemonset (rabbitmq) and that pod doesnt Terminate or get deleted, which causes other services to try to connect to it, which caused those pods to not come up right.
I noticed the same thing, I had to drain nodes and delete pods with force To get rid of such pods
Same issue here. Only masters running rancher-server. Pods are stuck on Running even though those nodes have been in NotReady for more than 15 minutes.
Does this happen when using 3+ nodes?
I've only tested this on 2 or 3 nodes and it happens for both the setups
It happens with HA when using 3 master nodes and taking 1 of the nodes down?
Using what type of database?
Was using a postgres dB as the backend and when 1 node was taken down. My main use-case is on a 2 node k3s cluster and its very easy to see this.
I don't think kubernetes supports 2 nodes clusters and taking 1 node down very well, as cited in the messages above.
Also having this issue with nodes not going away after they've been replaced:
NAME STATUS ROLES AGE VERSION
ip-10-12-82-234 NotReady <none> 12d v1.17.9+k3s1
ip-10-12-65-201 NotReady <none> 15d v1.17.9+k3s1
ip-10-12-90-123 NotReady <none> 15d v1.17.9+k3s1
ip-10-12-48-200 NotReady <none> 12d v1.17.9+k3s1
ip-10-12-78-179 NotReady <none> 12d v1.17.9+k3s1
ip-10-12-52-75 NotReady <none> 15d v1.17.9+k3s1
ip-10-12-67-220 NotReady master 29d v1.17.9+k3s1
ip-10-12-81-212 NotReady master 29d v1.17.9+k3s1
ip-10-12-55-185 NotReady master 14d v1.17.9+k3s1
ip-10-12-83-151 NotReady master 7d3h v1.17.9+k3s1
ip-10-12-49-50 NotReady master 7d3h v1.17.9+k3s1
ip-10-12-48-195 NotReady <none> 5d1h v1.17.9+k3s1
ip-10-12-68-212 NotReady <none> 5d1h v1.17.9+k3s1
ip-10-12-94-45 NotReady <none> 5d1h v1.17.9+k3s1
ip-10-12-95-46 NotReady master 3h10m v1.17.9+k3s1
ip-10-12-56-63 NotReady master 4h1m v1.17.9+k3s1
ip-10-12-79-230 NotReady master 4h13m v1.17.9+k3s1
ip-10-12-79-118 NotReady <none> 3h17m v1.17.9+k3s1
ip-10-12-88-104 NotReady <none> 3h17m v1.17.9+k3s1
ip-10-12-53-206 NotReady <none> 3h17m v1.17.9+k3s1
ip-10-12-90-16 NotReady <none> 3h1m v1.17.9+k3s1
ip-10-12-54-163 NotReady master 3h10m v1.17.9+k3s1
ip-10-12-53-78 NotReady <none> 3h1m v1.17.9+k3s1
ip-10-12-71-230 NotReady master 3h10m v1.17.9+k3s1
ip-10-12-86-199 NotReady master 4h7m v1.17.9+k3s1
ip-10-12-79-37 NotReady <none> 3h1m v1.17.9+k3s1
ip-10-12-91-161 NotReady master 146m v1.17.4+k3s1
ip-10-12-68-68 NotReady master 146m v1.17.4+k3s1
ip-10-12-57-50 NotReady master 146m v1.17.4+k3s1
ip-10-12-52-91 NotReady <none> 147m v1.17.4+k3s1
ip-10-12-84-159 NotReady <none> 146m v1.17.4+k3s1
ip-10-12-73-9 NotReady <none> 146m v1.17.4+k3s1
ip-10-12-49-200 Ready master 29m v1.17.9+k3s1
ip-10-12-70-140 Ready <none> 27m v1.17.9+k3s1
ip-10-12-84-215 Ready <none> 27m v1.17.9+k3s1
ip-10-12-55-103 Ready <none> 27m v1.17.9+k3s1
ip-10-12-83-6 Ready master 27m v1.17.9+k3s1
ip-10-12-76-62 Ready master 27m v1.17.9+k3s1
@rogersd k3s does not delete nodes on its own. It has no way of knowing if the nodes are just temporarily offline, or if they are gone forever.
If you install an out-of-tree cloud provider (such as https://github.com/kubernetes/cloud-provider-aws) it has the necessary hooks to talk to your cloud provider API, and delete nodes that have been terminated. You could also just script this manually using the Kubernetes API or kubectl, deleting nodes that have been offline (NotReady) for a period of time.
Most helpful comment
We're hitting this issue consistently as well and even trying to drain the node (which is Not Ready and Disabled for scheduling)
NAME STATUS ROLES AGE VERSION
node1 NotReady,SchedulingDisabled master 43m v1.17.3+k3s1
node2 Ready master 43m v1.17.3+k3s1
The pods from node1 stay in "Terminating" mode forever, until the node comes back up.
This is not just an issue. We have 1 Daemonset (rabbitmq) and that pod doesnt Terminate or get deleted, which causes other services to try to connect to it, which caused those pods to not come up right.