K3s: K3S Claims that pods are running but hosts (nodes) are dead

Created on 2 Jan 2020  Â·  21Comments  Â·  Source: k3s-io/k3s

Version:
k3s version v1.0.0 (18bd921c)

Describe the bug
I have a cluster that consists of 1 master and 3 workers, after I unplugged 3 workers none of running pods were reassigned to master from workers and Kubectl claims that they are alive:

➜  ~ kubectl get nodes
NAME      STATUS     ROLES    AGE   VERSION
worker2   NotReady   node     15d   v1.16.3-k3s.2
worker1   NotReady   node     15d   v1.16.3-k3s.2
worker3   NotReady   node     15d   v1.16.3-k3s.2
master    Ready      master   16d   v1.16.3-k3s.2
➜  ~ kubectl get pods --all-namespaces -o wide
NAMESPACE              NAME                                                   READY   STATUS    RESTARTS   AGE    IP              NODE      NOMINATED NODE   READINESS GATES
kube-system            metrics-server-6d684c7b5-8fzld                         1/1     Running   29         16d    10.42.0.139     master    <none>           <none>
metallb-system         speaker-lv7cq                                          1/1     Running   7          3d2h   192.168.0.201   master    <none>           <none>
default                nginx-1-775985c86-4q5xq                                1/1     Running   18         5d7h   10.42.0.142     master    <none>           <none>
kube-system            coredns-d798c9dd-f2wrb                                 1/1     Running   28         16d    10.42.0.140     master    <none>           <none>
kube-system            local-path-provisioner-58fb86bdfd-8sbzq                1/1     Running   4          32h    10.42.0.141     master    <none>           <none>
kubernetes-dashboard   kubernetes-dashboard-5996555fd8-k684f                  1/1     Running   23         15d    10.42.2.59      worker2   <none>           <none>
metallb-system         speaker-hdq7h                                          1/1     Running   5          3d2h   192.168.0.203   worker2   <none>           <none>
kube-system            nginx-nginx-ingress-controller-595c6b856c-m6997        1/1     Running   3          31h    10.42.2.58      worker2   <none>           <none>
kube-system            nginx-nginx-ingress-default-backend-6595d9d88b-vff2c   1/1     Running   2          30h    10.42.1.59      worker1   <none>           <none>
metallb-system         speaker-54h22                                          1/1     Running   5          3d2h   192.168.0.202   worker1   <none>           <none>
metallb-system         controller-57967b9448-mjgcb                            1/1     Running   5          3d2h   10.42.1.60      worker1   <none>           <none>
kubernetes-dashboard   dashboard-metrics-scraper-76585494d8-bzccd             1/1     Running   31         15d    10.42.1.58      worker1   <none>           <none>
metallb-system         speaker-grzfq                                          1/1     Running   6          3d2h   192.168.0.204   worker3   <none>           <none>



md5-e1a9f850257694a3de60a4285503b623



pi@master:~ $ sudo journalctl -u k3s --since "20 minutes ago"
-- Logs begin at Wed 2020-01-01 22:17:01 CET, end at Thu 2020-01-02 21:30:46 CET. --
Jan 02 21:11:02 master k3s[550]: time="2020-01-02T21:11:02.232431780+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:11:02 master k3s[550]: E0102 21:11:02.273803     550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:11:07 master k3s[550]: I0102 21:11:07.220756     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:12:07 master k3s[550]: I0102 21:12:07.249238     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:12:24 master k3s[550]: time="2020-01-02T21:12:24.244186839+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:12:24 master k3s[550]: E0102 21:12:24.293070     550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:13:07 master k3s[550]: I0102 21:13:07.278093     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:14:07 master k3s[550]: I0102 21:14:07.292650     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.490492     550 controller.go:606] quota admission added evaluator for: replicasets.apps
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.505901     550 event.go:274] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"default", Name:"nginx-1", UID:"9f62d494-264e-43af
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.557512     550 event.go:274] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"nginx-1-775985c86", UID:"4984c18f
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.570165     550 event.go:274] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"nginx-1-775985c86", UID:"4984c18f
Jan 02 21:14:14 master k3s[550]: I0102 21:14:14.662089     550 event.go:274] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"default", Name:"nginx-1", UID:"55f3f3a9-ba7c-44de-
Jan 02 21:14:42 master k3s[550]: E0102 21:14:42.365014     550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:15:07 master k3s[550]: I0102 21:15:07.319919     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:15:08 master k3s[550]: time="2020-01-02T21:15:08.153599154+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:15:08 master k3s[550]: E0102 21:15:08.180379     550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:16:07 master k3s[550]: I0102 21:16:07.334593     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:17:04 master k3s[550]: time="2020-01-02T21:17:04.958137785+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.203:49916: i/o
Jan 02 21:17:07 master k3s[550]: I0102 21:17:07.362715     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:17:29 master k3s[550]: time="2020-01-02T21:17:29.298245766+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.202:35906: i/o
Jan 02 21:17:30 master k3s[550]: time="2020-01-02T21:17:30.580422869+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.201:59938: i/o
Jan 02 21:17:30 master k3s[550]: time="2020-01-02T21:17:30.580988438+01:00" level=error msg="Remotedialer proxy error" error="read tcp 192.168.0.201:59938->192.168.0.201:6443: i/o t
Jan 02 21:17:31 master k3s[550]: time="2020-01-02T21:17:31.605583377+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.204:52256: i/o
Jan 02 21:17:34 master k3s[550]: I0102 21:17:34.106828     550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker2", UID:"b2288a80-db5b-43be-9c00-4330be9
Jan 02 21:17:35 master k3s[550]: time="2020-01-02T21:17:35.587707038+01:00" level=info msg="Connecting to proxy" url="wss://192.168.0.201:6443/v1-k3s/connect"
Jan 02 21:17:37 master k3s[550]: E0102 21:17:37.476561     550 pod_workers.go:191] Error syncing pod ed9ae66f-71eb-49b0-b0d3-05dedd447d5f ("local-path-provisioner-58fb86bdfd-8sbzq_k
Jan 02 21:17:38 master k3s[550]: time="2020-01-02T21:17:38.200829872+01:00" level=error msg="Failed to connect to proxy" error="dial tcp 192.168.0.201:6443: connect: no route to hos
Jan 02 21:17:38 master k3s[550]: time="2020-01-02T21:17:38.200932464+01:00" level=error msg="Remotedialer proxy error" error="dial tcp 192.168.0.201:6443: connect: no route to host"
Jan 02 21:17:39 master k3s[550]: E0102 21:17:39.318397     550 resource_quota_controller.go:407] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the ser
Jan 02 21:17:39 master k3s[550]: E0102 21:17:39.823064     550 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.233.67
Jan 02 21:17:42 master k3s[550]: W0102 21:17:42.211175     550 garbagecollector.go:640] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to
Jan 02 21:17:43 master k3s[550]: time="2020-01-02T21:17:43.201207406+01:00" level=info msg="Connecting to proxy" url="wss://192.168.0.201:6443/v1-k3s/connect"
Jan 02 21:17:44 master k3s[550]: time="2020-01-02T21:17:44.521597124+01:00" level=error msg="Failed to connect to proxy" error="dial tcp 192.168.0.201:6443: connect: no route to hos
Jan 02 21:17:44 master k3s[550]: time="2020-01-02T21:17:44.521784474+01:00" level=error msg="Remotedialer proxy error" error="dial tcp 192.168.0.201:6443: connect: no route to host"
Jan 02 21:17:44 master k3s[550]: E0102 21:17:44.827117     550 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.233.67
Jan 02 21:17:45 master k3s[550]: E0102 21:17:45.545554     550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:17:47 master k3s[550]: E0102 21:17:47.649744     550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:17:47 master k3s[550]: time="2020-01-02T21:17:47.843063318+01:00" level=info msg="Handling backend connection request [worker2]"
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252599     550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"nginx-nginx-ingress-controller-595c6
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252704     550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"helm-install-traefik-7k7qv", UID:"",
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252742     550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kubernetes-dashboard", Name:"kubernetes-dashboard-599655
Jan 02 21:17:49 master k3s[550]: I0102 21:17:49.252776     550 event.go:274] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"helm-install-traefik-vgn7b", UID:"",
Jan 02 21:17:49 master k3s[550]: time="2020-01-02T21:17:49.522014337+01:00" level=info msg="Connecting to proxy" url="wss://192.168.0.201:6443/v1-k3s/connect"
Jan 02 21:17:49 master k3s[550]: time="2020-01-02T21:17:49.557293978+01:00" level=info msg="Handling backend connection request [master]"
Jan 02 21:17:49 master k3s[550]: time="2020-01-02T21:17:49.767827024+01:00" level=info msg="Handling backend connection request [worker1]"
Jan 02 21:17:52 master k3s[550]: time="2020-01-02T21:17:52.492071111+01:00" level=info msg="Handling backend connection request [worker3]"
Jan 02 21:17:56 master k3s[550]: E0102 21:17:56.271504     550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:17:58 master k3s[550]: E0102 21:17:58.666991     550 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
Jan 02 21:18:02 master k3s[550]: time="2020-01-02T21:18:02.494515040+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.204:52364: i/o
Jan 02 21:18:04 master k3s[550]: time="2020-01-02T21:18:04.771525434+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.202:36488: i/o
Jan 02 21:18:07 master k3s[550]: I0102 21:18:07.390554     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:18:07 master k3s[550]: time="2020-01-02T21:18:07.844839055+01:00" level=info msg="error in remotedialer server [400]: read tcp 192.168.0.201:6443->192.168.0.203:51336: i/o
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.293129     550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker2", UID:"b2288a80-db5b-43be-9c00-4330be9
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.442367     550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker1", UID:"f4226dde-8c79-473b-88c3-9d65ffa
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.782884     550 event.go:274] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker3", UID:"23d267c0-0917-4852-82da-830b9e9
Jan 02 21:18:39 master k3s[550]: I0102 21:18:39.927947     550 node_lifecycle_controller.go:1058] Controller detected that all Nodes are not-Ready. Entering master disruption mode.
Jan 02 21:18:40 master k3s[550]: E0102 21:18:40.072909     550 daemon_controller.go:302] metallb-system/speaker failed with : error storing status for daemon set &v1.DaemonSet{TypeM
Jan 02 21:18:40 master k3s[550]: E0102 21:18:40.156328     550 daemon_controller.go:302] metallb-system/speaker failed with : error storing status for daemon set &v1.DaemonSet{TypeM
Jan 02 21:19:07 master k3s[550]: I0102 21:19:07.418583     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:19:42 master k3s[550]: E0102 21:19:42.370539     550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:20:07 master k3s[550]: I0102 21:20:07.434826     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:20:35 master k3s[550]: time="2020-01-02T21:20:35.907270451+01:00" level=info msg="Updating TLS secret for k3s-serving (count: 8): map[listener.cattle.io/cn-10.43.0.1:10.43
Jan 02 21:20:35 master k3s[550]: E0102 21:20:35.950947     550 controller.go:117] error syncing 'kube-system/k3s-serving': handler tls-storage: Secret "k3s-serving" is invalid: meta
Jan 02 21:21:07 master k3s[550]: I0102 21:21:07.463278     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:22:07 master k3s[550]: I0102 21:22:07.512476     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:23:07 master k3s[550]: I0102 21:23:07.549374     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:24:07 master k3s[550]: I0102 21:24:07.564454     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:24:42 master k3s[550]: E0102 21:24:42.365024     550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:25:07 master k3s[550]: I0102 21:25:07.579445     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:26:07 master k3s[550]: I0102 21:26:07.594475     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:27:07 master k3s[550]: I0102 21:27:07.623520     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:27:30 master k3s[550]: I0102 21:27:30.102377     550 node_lifecycle_controller.go:1085] Controller detected that some Nodes are Ready. Exiting master disruption mode.
Jan 02 21:27:30 master k3s[550]: time="2020-01-02T21:27:30.185484725+01:00" level=info msg="Handling backend connection request [worker2]"
Jan 02 21:28:07 master k3s[550]: I0102 21:28:07.664740     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:29:07 master k3s[550]: I0102 21:29:07.699334     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Jan 02 21:29:42 master k3s[550]: E0102 21:29:42.365156     550 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or d
Jan 02 21:30:07 master k3s[550]: I0102 21:30:07.713346     550 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
Unscheduled

Most helpful comment

We're hitting this issue consistently as well and even trying to drain the node (which is Not Ready and Disabled for scheduling)

NAME STATUS ROLES AGE VERSION
node1 NotReady,SchedulingDisabled master 43m v1.17.3+k3s1
node2 Ready master 43m v1.17.3+k3s1

The pods from node1 stay in "Terminating" mode forever, until the node comes back up.

This is not just an issue. We have 1 Daemonset (rabbitmq) and that pod doesnt Terminate or get deleted, which causes other services to try to connect to it, which caused those pods to not come up right.

All 21 comments

I restarted everything and booted only master to get the same result

I waited 30 minutes but nothing happen, I managed to drain nodes manually with this script:

#!/bin/bash

KUBECTL="/usr/local/bin/kubectl"

NOT_READY_NODES=$($KUBECTL get nodes | grep  'NotReady' | awk '{print $1}')

while IFS= read -r line; do
    if [[ ! $line =~ [^[:space:]] ]] ; then
        continue
    fi
    echo "Found $line node to be dead, draining..."
    $KUBECTL drain --ignore-daemonsets --force $line
done <<< "$NOT_READY_NODES"

READY_NODES=$(kubectl get nodes | grep '\sReady,SchedulingDisabled' | awk '{print $1}')

while IFS= read -r line; do
    if [[ ! $line =~ [^[:space:]] ]] ; then
        continue
    fi
    echo "Found $line node to be online again, undraining..."
    $KUBECTL uncordon $line
done <<< "$READY_NODES"

although this script should never be needed, the whole point of Kubernetes is to have ability to self healing

I found that when you delete a NotReady node it will actually reassign pods but worker gets added to the cluster only after k3s-agent service is rebooted

I powered off a worker node (worker2) on my 3 node Raspberry PI4 cluster running Rook/Ceph some 3.5 hours ago and my cluster still has not really recovered. If we overlook the the Wordpress failure due to the fact that the the new instance cannot bind to the pvc because still thinks there is a claim from the terminating instance on the powered off node the k3s provisioned traefik lb instance is still listed as terminating and hanging there.

The things that have recovered are the ones (mostly rook) that do not have a pvc so even though the instances on the failed node are still listed as terminating it does not stop the new instances coming up.

Am I missing something here regarding Kunernetes node failure.

NAMESPACE                 NAME                                                    READY   STATUS              RESTARTS   AGE     IP               NODE      NOMINATED NODE   READINESS GATES
kube-system               pod/helm-install-traefik-2zd8t                          0/1     Completed           0          11d     10.42.0.3        master    <none>           <none>
kubernetes-dashboard      pod/kubernetes-dashboard-544f4d6b8c-4bmbm               1/1     Running             1          2d      10.42.1.127      worker1   <none>           <none>
kubernetes-dashboard      pod/dashboard-metrics-scraper-744c77948-n2z5w           1/1     Running             1          2d      10.42.1.126      worker1   <none>           <none>
kube-system               pod/svclb-traefik-zq5sw                                 3/3     Running             30         11d     10.42.1.128      worker1   <none>           <none>
cert-manager              pod/cert-manager-5c47f46f57-ww4ql                       1/1     Running             1          45h     10.42.0.114      master    <none>           <none>
kube-system               pod/metrics-server-6d684c7b5-pgmtf                      1/1     Running             9          11d     10.42.0.117      master    <none>           <none>
kube-system               pod/local-path-provisioner-58fb86bdfd-xxkr9             1/1     Running             9          11d     10.42.0.118      master    <none>           <none>
kube-system               pod/svclb-traefik-q6tx6                                 3/3     Running             27         11d     10.42.0.115      master    <none>           <none>
cert-manager              pod/cert-manager-webhook-547567b88f-4nhx9               1/1     Running             1          45h     10.42.0.112      master    <none>           <none>
kube-system               pod/coredns-d798c9dd-b5h2l                              1/1     Running             9          11d     10.42.0.119      master    <none>           <none>
kube-system               pod/traefik-65bccdc4bd-2qglj                            1/1     Running             9          11d     10.42.0.116      master    <none>           <none>
rook-ceph                 pod/rook-discover-dthqw                                 1/1     Running             0          17h     10.42.0.120      master    <none>           <none>
rook-ceph                 pod/rook-discover-jb5gm                                 1/1     Running             0          17h     10.42.1.129      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-agent-fhct7                               1/1     Running             0          17h     192.168.10.107   worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-agent-wkl5s                               1/1     Running             0          17h     192.168.10.102   master    <none>           <none>
rook-ceph                 pod/rook-ceph-mon-a-7987b7749c-dqhv9                    1/1     Running             0          17h     10.42.1.132      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-mon-c-59d7b8fb4d-7sqjj                    1/1     Running             0          17h     10.42.0.122      master    <none>           <none>
rook-ceph                 pod/rook-ceph-crashcollector-worker1-6bbbbf6696-zxzqc   1/1     Running             0          17h     10.42.1.133      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-crashcollector-master-8cf749cdc-zw6ph     1/1     Running             0          17h     10.42.0.123      master    <none>           <none>
rook-ceph                 pod/rook-ceph-osd-1-dbb578859-6rv64                     1/1     Running             0          17h     10.42.1.135      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-osd-2-6c7d9966cd-56ggs                    1/1     Running             0          17h     10.42.0.125      master    <none>           <none>
rook-ceph                 pod/rook-ceph-tools-57d8bd875b-nzmdh                    1/1     Running             0          17h     192.168.10.107   worker1   <none>           <none>
default                   pod/adminer-69bcfb4764-bngsb                            1/1     Running             0          15h     10.42.0.129      master    <none>           <none>
rook-cockroachdb-system   pod/rook-cockroachdb-operator-784f89dcc5-hgzq7          1/1     Running             0          5h59m   10.42.0.130      master    <none>           <none>
default                   pod/mariadb-0                                           1/1     Running             0          4h4m    10.42.1.143      worker1   <none>           <none>
kube-system               pod/svclb-traefik-lxfjb                                 3/3     Running             21         10d     10.42.2.117      worker2   <none>           <none>
rook-ceph                 pod/rook-discover-grnvm                                 1/1     Running             0          17h     10.42.2.119      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-agent-5nz5d                               1/1     Running             0          17h     192.168.10.95    worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-mgr-a-7f65b8f79f-kqzvw                    1/1     Terminating         2          17h     10.42.2.122      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-mgr-a-7f65b8f79f-p7vrh                    1/1     Running             0          3h32m   10.42.1.144      worker1   <none>           <none>
default                   pod/wordpress-6c7c6fcccf-8hsvc                          1/1     Terminating         0          4h8m    10.42.2.134      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-osd-0-6786789854-6qzd5                    1/1     Terminating         0          17h     10.42.2.125      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-mon-b-565bc66f97-64q84                    1/1     Terminating         0          17h     10.42.2.121      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-crashcollector-worker2-67895bf8df-f8cqr   1/1     Terminating         0          17h     10.42.2.126      worker2   <none>           <none>
cert-manager              pod/cert-manager-cainjector-6659d6844d-krnhk            1/1     Terminating         2          45h     10.42.2.116      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-operator-6d794bf987-plntb                 1/1     Terminating         0          17h     10.42.2.118      worker2   <none>           <none>
rook-ceph                 pod/rook-ceph-mon-b-565bc66f97-gs8h5                    0/1     Pending             0          3h27m   <none>           <none>    <none>           <none>
rook-ceph                 pod/rook-ceph-osd-0-6786789854-6v765                    0/1     Pending             0          3h27m   <none>           <none>    <none>           <none>
default                   pod/wordpress-6c7c6fcccf-8mhdd                          0/1     ContainerCreating   0          3h27m   <none>           worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-crashcollector-worker2-67895bf8df-5sksv   0/1     Pending             0          3h27m   <none>           <none>    <none>           <none>
rook-ceph                 pod/rook-ceph-operator-6d794bf987-bq6zm                 1/1     Running             0          3h27m   10.42.0.133      master    <none>           <none>
cert-manager              pod/cert-manager-cainjector-6659d6844d-7p7p5            1/1     Running             0          3h27m   10.42.1.145      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-osd-prepare-master-bphzw                  0/1     Completed           0          3h5m    10.42.0.135      master    <none>           <none>
rook-ceph                 pod/rook-ceph-osd-prepare-worker1-mdhmt                 0/1     Completed           0          3h5m    10.42.1.146      worker1   <none>           <none>
rook-ceph                 pod/rook-ceph-mon-d-canary-666965574c-62b2f             0/1     Pending             0          15m     <none>           <none>    <none>           <none>

NAMESPACE   NAME           STATUS     ROLES    AGE   VERSION         INTERNAL-IP      EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION      CONTAINER-RUNTIME
            node/worker2   NotReady   <none>   10d   v1.16.3-k3s.2   192.168.15.9    <none>        Ubuntu 19.10   5.3.0-1014-raspi2   containerd://1.3.0-k3s.5
            node/master    Ready      master   11d   v1.16.3-k3s.2   192.168.15.10   <none>        Ubuntu 19.10   5.3.0-1014-raspi2   containerd://1.3.0-k3s.5
            node/worker1   Ready      <none>   11d   v1.16.3-k3s.2   192.168.15.11   <none>        Ubuntu 19.10   5.3.0-1014-raspi2   containerd://1.3.0-k3s.5

So powering up the 'failed' node allowed all the 'terminating' instances to finally end, the Rook config sorted itself out and my Wordpress instance finally came back along with Certman as the pvc (on work press) finally was released,

I learned that there's a difference between having a node in NotReady state and deleting the node. When your node goes into NotReady state then Kubernetes will not reschedule running pods to other ones as Kubernetes cannot distinguish between node restart, network error or kubelet error. Kubernetes will reschedule pods only when it's sure that they are not running and just because node is in NotReady state does not mean that pods are not running, they might be running but just the fact that Kubernetes cannot communicate with kubelet does not mean that they are not running :/ It's really a bummer for me as

  1. There should be a deadline like if a node is NotReady for 5 minutes then it should drain it with force, no matter if something might be running or not
  2. Pods that are potentially running on NotReady notes should be marked somehow, definitely not shown as 1/1 Running via kubectl

Although that's just my point of view, it's really weird that k3s on it's own does not seem to support --pod-eviction-timeout flag which is 5 minutes by default

The script that I published cordons the faulty nodes, drains them and then eventually deletes them, it will uncordon the node once it's in Ready state. K3s seems to be rejoining the master only when it restarts though

Please see https://kubernetes.io/docs/concepts/architecture/nodes/, from that link:

In versions of Kubernetes prior to 1.5, the node controller would force delete these unreachable pods from the apiserver. However, in 1.5 and higher, the node controller does not force delete pods until it is confirmed that they have stopped running in the cluster. You can see the pods that might be running on an unreachable node as being in the Terminating or Unknown state. In cases where Kubernetes cannot deduce from the underlying infrastructure if a node has permanently left a cluster, the cluster administrator may need to delete the node object by hand. Deleting the node object from Kubernetes causes all the Pod objects running on the node to be deleted from the apiserver, and frees up their names.

So pods stuck in a Terminating state but with a duplicate running on another node look to be expected. The --pod-eviction-timeout flag should be able to be set like:
k3s server --kube-controller-manager-arg pod-eviction-timeout=1m.

The key in the original issue is, "Controller detected that all Nodes are not-Ready. Entering master disruption mode.", looks to be related to https://github.com/kubernetes/kubernetes/issues/42733. If all of the nodes become Not Ready the controller manager may refuse to evict.

@erikwilson in my case none of the pods was in Terminating/Unknown state (it was the same when only one node was NotReady) and that issue was fixed? 🤔 Will set that --kube-controller-manager-arg pod-eviction-timeout=1m flag and see what happens

It looks like the expected behavior, also see from that docs link:

The corner case is when all zones are completely unhealthy (i.e. there are no healthy nodes in the cluster). In such case, the node controller assumes that there’s some problem with master connectivity and stops all evictions until some connectivity is restored.

@ericwilson

Ok thanks, that would tie in with last time I tried this as it was on an earlier version of Kubernetes and I was not aware of that change, also I think I have done this on a Rancher managed cluster with some node management options set so never had an issue.

Hi. I’m experiencing the same issue and mitigated it with the following script in my launch template user data:

kubectl get nodes |
  awk -v "host=$(hostname)" '$1 != host && $2 == "NotReady" { print $1 }' |
  xargs --no-run-if-empty kubectl delete node

So when one node goes down, the autoscaling group creates a new instance that will run the above script when booting.

I advise you to triple check that hostname returns the correct hostname for your nodes, otherwise you risk deleting the current node...

The node draining was not working and getting stuck forever since the target node was dead. So much for HA!

We're hitting this issue consistently as well and even trying to drain the node (which is Not Ready and Disabled for scheduling)

NAME STATUS ROLES AGE VERSION
node1 NotReady,SchedulingDisabled master 43m v1.17.3+k3s1
node2 Ready master 43m v1.17.3+k3s1

The pods from node1 stay in "Terminating" mode forever, until the node comes back up.

This is not just an issue. We have 1 Daemonset (rabbitmq) and that pod doesnt Terminate or get deleted, which causes other services to try to connect to it, which caused those pods to not come up right.

I noticed the same thing, I had to drain nodes and delete pods with force To get rid of such pods

Same issue here. Only masters running rancher-server. Pods are stuck on Running even though those nodes have been in NotReady for more than 15 minutes.

Does this happen when using 3+ nodes?

I've only tested this on 2 or 3 nodes and it happens for both the setups

It happens with HA when using 3 master nodes and taking 1 of the nodes down?
Using what type of database?

Was using a postgres dB as the backend and when 1 node was taken down. My main use-case is on a 2 node k3s cluster and its very easy to see this.

I don't think kubernetes supports 2 nodes clusters and taking 1 node down very well, as cited in the messages above.

Also having this issue with nodes not going away after they've been replaced:

NAME STATUS ROLES AGE VERSION ip-10-12-82-234 NotReady <none> 12d v1.17.9+k3s1 ip-10-12-65-201 NotReady <none> 15d v1.17.9+k3s1 ip-10-12-90-123 NotReady <none> 15d v1.17.9+k3s1 ip-10-12-48-200 NotReady <none> 12d v1.17.9+k3s1 ip-10-12-78-179 NotReady <none> 12d v1.17.9+k3s1 ip-10-12-52-75 NotReady <none> 15d v1.17.9+k3s1 ip-10-12-67-220 NotReady master 29d v1.17.9+k3s1 ip-10-12-81-212 NotReady master 29d v1.17.9+k3s1 ip-10-12-55-185 NotReady master 14d v1.17.9+k3s1 ip-10-12-83-151 NotReady master 7d3h v1.17.9+k3s1 ip-10-12-49-50 NotReady master 7d3h v1.17.9+k3s1 ip-10-12-48-195 NotReady <none> 5d1h v1.17.9+k3s1 ip-10-12-68-212 NotReady <none> 5d1h v1.17.9+k3s1 ip-10-12-94-45 NotReady <none> 5d1h v1.17.9+k3s1 ip-10-12-95-46 NotReady master 3h10m v1.17.9+k3s1 ip-10-12-56-63 NotReady master 4h1m v1.17.9+k3s1 ip-10-12-79-230 NotReady master 4h13m v1.17.9+k3s1 ip-10-12-79-118 NotReady <none> 3h17m v1.17.9+k3s1 ip-10-12-88-104 NotReady <none> 3h17m v1.17.9+k3s1 ip-10-12-53-206 NotReady <none> 3h17m v1.17.9+k3s1 ip-10-12-90-16 NotReady <none> 3h1m v1.17.9+k3s1 ip-10-12-54-163 NotReady master 3h10m v1.17.9+k3s1 ip-10-12-53-78 NotReady <none> 3h1m v1.17.9+k3s1 ip-10-12-71-230 NotReady master 3h10m v1.17.9+k3s1 ip-10-12-86-199 NotReady master 4h7m v1.17.9+k3s1 ip-10-12-79-37 NotReady <none> 3h1m v1.17.9+k3s1 ip-10-12-91-161 NotReady master 146m v1.17.4+k3s1 ip-10-12-68-68 NotReady master 146m v1.17.4+k3s1 ip-10-12-57-50 NotReady master 146m v1.17.4+k3s1 ip-10-12-52-91 NotReady <none> 147m v1.17.4+k3s1 ip-10-12-84-159 NotReady <none> 146m v1.17.4+k3s1 ip-10-12-73-9 NotReady <none> 146m v1.17.4+k3s1 ip-10-12-49-200 Ready master 29m v1.17.9+k3s1 ip-10-12-70-140 Ready <none> 27m v1.17.9+k3s1 ip-10-12-84-215 Ready <none> 27m v1.17.9+k3s1 ip-10-12-55-103 Ready <none> 27m v1.17.9+k3s1 ip-10-12-83-6 Ready master 27m v1.17.9+k3s1 ip-10-12-76-62 Ready master 27m v1.17.9+k3s1

@rogersd k3s does not delete nodes on its own. It has no way of knowing if the nodes are just temporarily offline, or if they are gone forever.

If you install an out-of-tree cloud provider (such as https://github.com/kubernetes/cloud-provider-aws) it has the necessary hooks to talk to your cloud provider API, and delete nodes that have been terminated. You could also just script this manually using the Kubernetes API or kubectl, deleting nodes that have been offline (NotReady) for a period of time.

Was this page helpful?
0 / 5 - 0 ratings