K3s: CrashLoopBackOff after Reboot

Created on 8 Apr 2020  路  7Comments  路  Source: k3s-io/k3s

Version:
k3s version v1.17.2+k3s1 (cdab19b0)

K3s arguments:
Raspberry PI4 running Raspbian lite

OS version:

pi@k3sserver:~ $ uname -a
Linux k3sserver 4.19.97-v7l+ rancher/k3s#1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux
pi@k3sserver:~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 10 (buster)
Release:        10
Codename:       buster

K3S installed with curl -sfL https://get.k3s.io | sh -
Process written up here https://github.com/gazzyt/raspberry-pi-kubernetes

Describe the bug
After install all pods were running successfully. After every reboot, the local-path-provisioner, merics-server and (sometimes) kubernetes-dashboard ping between Error and CrashLookBackoff never starting successfully.

pi@k3sserver:~ $ sudo kubectl get pod --all-namespaces
NAMESPACE              NAME                                         READY   STATUS             RESTARTS   AGE
kube-system            helm-install-traefik-7lkl7                   0/1     Completed          2          43d
kube-system            svclb-traefik-ppvqq                          2/2     Running            6          23d
default                svclb-http-server-pfxcg                      1/1     Running            3          23d
default                svclb-hello-world-tt9w5                      1/1     Running            3          23d
default                web-server-1                                 2/2     Running            0          8d
default                svclb-http-server-7bsvx                      1/1     Running            9          31d
default                web-server-0                                 2/2     Running            6          8d
kubernetes-dashboard   dashboard-metrics-scraper-7b8b58dc8b-mbcx9   1/1     Running            9          40d
default                svclb-hello-world-87gsq                      1/1     Running            9          40d
default                hello-world-6df9f4cc87-hmh5d                 1/1     Running            9          40d
kube-system            svclb-traefik-5lmw9                          2/2     Running            18         43d
kube-system            coredns-d798c9dd-xvhk6                       1/1     Running            9          43d
kube-system            traefik-6787cddb4b-z5vlg                     1/1     Running            9          43d
default                icinga-0                                     1/1     Running            2          4d7h
kubernetes-dashboard   kubernetes-dashboard-866f987876-qc2pz        0/1     CrashLoopBackOff   97         40d
kube-system            local-path-provisioner-58fb86bdfd-x65lm      0/1     Error              109        43d
kube-system            metrics-server-6d684c7b5-q2sxq               0/1     Error              106        43d

The logs show the same error in each pod:

pi@k3sserver:~ $ sudo kubectl  logs metrics-server-6d684c7b5-q2sxq -n kube-system
I0408 20:44:21.452853       1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
Error: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.43.0.1:443: connect: connection refused
Usage:
   [flags]
<<-- SNIP -->>

panic: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.43.0.1:443: connect: connection refused

goroutine 1 [running]:
main.main()
        /go/src/github.com/kubernetes-incubator/metrics-server/cmd/metrics-server/metrics-server.go:39 +0x10c

I have a workaround which I apply after every reboot which allows the pods to startup:

pi@k3sserver:~ $ sudo systemctl stop k3s
pi@k3sserver:~ $ sudo systemctl start k3s

To Reproduce
Reboot the PI

Expected behavior
All pods enter the Running state.

Actual behavior
Some pods ping between Error and CrashLookBackoff states.

Additional context / logs

Most helpful comment

I ran into this as well, it seems to be pretty common now from looking at other issues online in various forums. The "After" parameter will ensure that things execute in the correct order, in this case, networking must be fully up then k3s.service can start.

-- Fix --
Edit: /etc/systemd/system/k3s.service
in the section "[Unit]", add a line under Wants=network-online.target that says:
After=network-online.target
Save the file.
systemctl daemon-reload
reboot

All 7 comments

With 1.17.4+k3s1 installed, I have the same issue which started after I upgraded my kernel last week to the latest version (same version as you are reporting):

$ uname -a
Linux pi4 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux
$ k version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4+k3s1", GitCommit:"3eee8ac3a1cf0a216c8a660571329d4bda3bdf77", GitTreeState:"clean", BuildDate:"2020-03-25T16:13:40Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/arm"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4+k3s1", GitCommit:"3eee8ac3a1cf0a216c8a660571329d4bda3bdf77", GitTreeState:"clean", BuildDate:"2020-03-25T16:13:40Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/arm"}

For whatever reason, this latest version of raspberrypi kernel seems to have issues on network start and lets k3s think that everything should be ready for it to start up Wants=network-online.target

But I realized that it also takes up to 2 minutes now to even ssh into my pi4. By the time I do get in, k3s is in a mangled state and must be restarted in order for the pods to correctly connect to the API on default 10.43.0.1 address.

Seems like an issue with the latest raspberrypi version rather than k3s?

From journalctl:

Apr 13 08:55:09 pi4 kernel: bcmgenet: Skipping UMAC reset
Apr 13 08:55:09 pi4 **dhcpcd[316]: eth0: waiting for carrier**
Apr 13 08:55:09 pi4 kernel: bcmgenet fd580000.genet: configuring instance for external RGMII (no delay)
Apr 13 08:55:09 pi4 **kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready**
Apr 13 08:55:10 pi4 rpi-eeprom-update[315]: BOOTLOADER: up-to-date
Apr 13 08:55:10 pi4 rpi-eeprom-update[315]: CURRENT: Tue 10 Sep 2019 10:41:50 AM UTC (1568112110)
Apr 13 08:55:10 pi4 rpi-eeprom-update[315]:  LATEST: Tue 10 Sep 2019 10:41:50 AM UTC (1568112110)
Apr 13 08:55:10 pi4 rpi-eeprom-update[315]:  FW DIR: /lib/firmware/raspberrypi/bootloader/critical
Apr 13 08:55:10 pi4 rpi-eeprom-update[315]: VL805: up-to-date
Apr 13 08:55:10 pi4 rpi-eeprom-update[315]: CURRENT: 000137ad
Apr 13 08:55:10 pi4 rpi-eeprom-update[315]:  LATEST: 000137ad
Apr 13 08:55:10 pi4 systemd[1]: Started Check for Raspberry Pi EEPROM updates.
Apr 13 08:55:10 pi4 **kernel: bcmgenet fd580000.genet eth0: Link is Down**
Apr 13 08:55:12 pi4 k3s[380]: time="2020-04-13T08:55:12.131624774+02:00" level=info msg=**"Starting k3s v1.17.4+k3s1 (3eee8ac3)"**
Apr 13 08:55:12 pi4 k3s[380]: time="2020-04-13T08:55:12.142885570+02:00" level=info msg="Cluster bootstrap already complete"
Apr 13 08:55:12 pi4 k3s[380]: time="2020-04-13T08:55:12.349750459+02:00" level=info msg="Kine listening on unix://kine.sock"
Apr 13 08:55:12 pi4 k3s[380]: time="2020-04-13T08:55:12.359983273+02:00" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonym
Apr 13 08:55:12 pi4 k3s[380]: Flag --basic-auth-file has been deprecated, Basic authentication mode is deprecated and will be removed in a future release.
Apr 13 08:55:12 pi4 k3s[380]: Error: Unable to find suitable network address.error='no default routes found in "/proc/net/route" or "/proc/net/ipv6_route"'.

Unfortunate that systemd interprets this state as network-online.

if systemd is starting services that depend on the network-online target before the network is up, that's not k3s's fault. The 'warnings and errors' are from the kernel and other system services.

Thanks for the info. Seems pretty clear it's not a k3s issue. I went down the rabbithole a bit learning how this is supposed to work and trying to configure Raspbian by enabling ifupdown-wait-online.service then editing /etc/default/networking as mentioned here but all to no avail.

I'll just live with this.

I ran into this as well, it seems to be pretty common now from looking at other issues online in various forums. The "After" parameter will ensure that things execute in the correct order, in this case, networking must be fully up then k3s.service can start.

-- Fix --
Edit: /etc/systemd/system/k3s.service
in the section "[Unit]", add a line under Wants=network-online.target that says:
After=network-online.target
Save the file.
systemctl daemon-reload
reboot

I can confirm @cconkrig solution works on Raspbian Buster

I ran into this as well, it seems to be pretty common now from looking at other issues online in various forums. The "After" parameter will ensure that things execute in the correct order, in this case, networking must be fully up then k3s.service can start.

-- Fix --
Edit: /etc/systemd/system/k3s.service
in the section "[Unit]", add a line under Wants=network-online.target that says:
After=network-online.target
Save the file.
systemctl daemon-reload
reboot

Same problem, same solution. Can we update the docs? I am not sure where this would belong, k3s / Networking maybe?
I am also wondering if this is a bandaid or a proper solution, in which case the change could be added to the .service file.

Was this page helpful?
0 / 5 - 0 ratings