k3s startup fails with "starting kubernetes: preparing server: start cluster and https: raft_start(): io: load closed segment 0000000024946269-0000000024946590: found 321 entries (expected 322)"

Created on 9 Feb 2020 · 2Comments · Source: k3s-io/k3s

Version: k3s version v1.17.2+k3s1 (cdab19b0)

Description:

k3s master fails to start with in the log "starting kubernetes: preparing server: start cluster and https: raft_start(): io: load closed segment 0000000024946269-0000000024946590: found 321 entries (expected 322)"

This has happened after the machines were forcefully shut down (power loss). There's no info on the web on how to resolve this error or what to do next.

To Reproduce:

install cluster using Ansible scripts on at least two nodes
unplug power (I guess?)

Expected behavior:

cluster survives power outages / gives a clear path how to restore it manually

Actual behavior:

cluster doesn't startup anymore

Additional context

k3s is (was) running on a cluster of TWO machines
k3s non-master node seems to start up successfully
k3s is installed on almost clean Armbian, on Pine64
cluster was working fine before the power loss

uname -a
Linux ariana 5.4.7-sunxi64 #19.11.6 SMP Sat Jan 4 19:40:10 CET 2020 aarch64 GNU/Linux


lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:    10
Codename:   buster


cat /etc/systemd/system/k3s.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network-online.target
[Service]
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s server --cluster-init --write-kubeconfig-mode 664
KillMode=process
Delegate=yes
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

/var/log/syslog:

...
Feb  9 00:00:12 ariana systemd[1]: Starting Lightweight Kubernetes...
Feb  9 00:00:12 ariana systemd[1]: Started Lightweight Kubernetes.
Feb  9 00:00:13 ariana k3s[3961]: time="2020-02-09T00:00:13.429349422Z" level=info msg="Starting k3s v1.17.2+k3s1 (cdab19b0)"
Feb  9 00:00:16 ariana k3s[3961]: time="2020-02-09T00:00:16.592512841Z" level=fatal msg="starting kubernetes: preparing server: start cluster and https: raft_start(): io: load closed segment 0000000024946269-0000000024946590: found 321 entries (expected 322)"
Feb  9 00:00:16 ariana systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Feb  9 00:00:16 ariana systemd[1]: k3s.service: Failed with result 'exit-code'.
Feb  9 00:00:21 ariana systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Feb  9 00:00:21 ariana systemd[1]: k3s.service: Scheduled restart job, restart counter is at 5380.
Feb  9 00:00:21 ariana systemd[1]: Stopped Lightweight Kubernetes.
...

Source

bokysan

👍2

Most helpful comment

This appears to be the upstream dqlite issue: https://github.com/canonical/dqlite/issues/190

dqlite is still experimental; there does not appear to be a way to recover from this at the moment. If you need more production-ready HA you should probably be using an external DB.

Also, a two-node dqlite cluster won't meet Raft consensus requirements (no quorum if one goes down) so this setup probably won't ever work as expected.

brandond on 27 Feb 2020

👍3

All 2 comments

seeing the same issues, I was purposefully deleting master nodes at various intervals and discovered this on reboot after a couple of times.

Kampe on 27 Feb 2020

This appears to be the upstream dqlite issue: https://github.com/canonical/dqlite/issues/190

dqlite is still experimental; there does not appear to be a way to recover from this at the moment. If you need more production-ready HA you should probably be using an external DB.

Also, a two-node dqlite cluster won't meet Raft consensus requirements (no quorum if one goes down) so this setup probably won't ever work as expected.

brandond on 27 Feb 2020

👍3

Was this page helpful?

0 / 5 - 0 ratings