Environmental Info:
K3s Version:
1.19.1-k3s1
Node(s) CPU architecture, OS, and Version:
Linux k8s-master-c 5.4.0-47-generic #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux k8s-master-a 5.4.0-47-generic #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux k8s-master-b 5.4.0-47-generic #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
3 masters all same OS and arch, 0 workers
Describe the bug:
Duplicate of https://github.com/rancher/k3s/issues/2249 but this is on amd64
Steps To Reproduce:
k get nodes -o wideExpected behavior:
After a shutdown, all master nodes should reconnect. At least, one of then should start correctly
Actual behavior:
Connection error to nodes
Additional context / logs:
See my comments in https://github.com/rancher/k3s/issues/2249#issuecomment-694540765 and https://github.com/rancher/k3s/issues/2249#issuecomment-694941471
Just to confirm, you're on Ubuntu whereas #2249 was on Raspbian?
Yes, Ubuntu 20.04.1. I think that guy is also using Ubuntu 20.04 but on arm64.
R nodes rebooted one by one?
@liyimeng no, I turned them all off at once. The expectation is that this would work after they are all powered back on again. I will try the work-around in https://github.com/rancher/k3s/issues/2249 after I am done tinkering with other projects.
@onedr0p they of cause won’t be able to bring up since since u have shutoff entire cluster and force it lose quorums. Please google “etcd lost quorum” for the theory behind this.
I know enough about lost quorum. Maybe the Rancher folks need to provide more documentation around that, as well as ways to backup and restore etcd state with their embedded option??
But in any case @liyimeng it seems as the work-around is to update the systemd files https://github.com/rancher/k3s/issues/2249#issuecomment-695181813 and then you can reboot, shutdown all you want.
@onedr0p Thanks a lot! Providing it is an experimental feature. Maybe we should not push them too hard :D. they have been busy in getting 1.19.1 out to release. But you scream dose make this happening faster. LOL. Thanks for the info! I love to git it a try!
Yes, this is on my short list to fix although it will probably involve more than just swapping a couple lines of code. The workaround is easy enough in the mean time.
The whole team is excited to see the embedded etcd mature quickly. There are a lot of new features here to document and iron the kinks out of.
Since this is confirmed as a software issue, not architecture or operating system bug I will close this issue. Main issue is https://github.com/rancher/k3s/issues/2249
Most helpful comment
Yes, this is on my short list to fix although it will probably involve more than just swapping a couple lines of code. The workaround is easy enough in the mean time.
The whole team is excited to see the embedded etcd mature quickly. There are a lot of new features here to document and iron the kinks out of.