K3s: HA dqlite seems to require the first server to be always UP

Created on 6 Feb 2020 · 24Comments · Source: k3s-io/k3s

Version:
k3s version v1.17.2+k3s1 (cdab19b0)

Describe the bug
three nodes ha cluster

kubectl get node
NAME   STATUS   ROLES    AGE   VERSION
cbo1   Ready    master   38m   v1.17.2+k3s1
cbo3   Ready    master   35m   v1.17.2+k3s1
cbo2   Ready    master   37m   v1.17.2+k3s1

cbo1 is the first server
cbo2 and cbo3 joined to him

when either cbo2 or cbo3 are down - cluster is up

get node
NAME   STATUS     ROLES    AGE   VERSION
cbo1   Ready      master   42m   v1.17.2+k3s1
cbo2   Ready      master   41m   v1.17.2+k3s1
cbo3   NotReady   master   38m   v1.17.2+k3s1

however when cbo1 is down

kubectl get node
Error from server: rpc error: code = Unknown desc = failed to create dqlite connection: no available dqlite leader server found

To Reproduce
cbo1
/usr/local/bin/k3s server --cluster-init --tls-san cbo1.ip --flannel-backend ipsec
cbo2 and cbo3
/usr/local/bin/k3s server --server https://cbo1.ip:6443 --flannel-backend ipsec

Expected behavior
kubectl get node

NAME   STATUS   ROLES    AGE   VERSION
cbo1   NotReady    master   38m   v1.17.2+k3s1
cbo3   Ready    master   35m   v1.17.2+k3s1
cbo2   Ready    master   37m   v1.17.2+k3s1

Actual behavior

kubectl get node
Error from server: rpc error: code = Unknown desc = failed to create dqlite connection: no available dqlite leader server found

Additional context

To Triage aredqlite

Source

eleaner

👍16

Most helpful comment

I assume #1770 meant to replace dqlite by the embedded version of etcd - so I don't think any further fixes will be made to dqlite.

CuBiC3D on 18 May 2020

👍4

All 24 comments

Is your k3s.yaml pointed at just the one node, or does it point at an address or hostname that will remain available when the first node is down? You might have multiple masters but it doesn't do you much good if you can't talk to any of them but the first one.

brandond on 6 Feb 2020

thanks, @brandond. I didn't think it is worth to mention

k3s.yaml points to all three servers
technically there re three A records in the DNS with 1s TTL
but I also addressed each one of the active servers directly - the result is the same

however, if I address the first server that is completely down I get
Unable to connect to the server: dial tcp cbo1.ip:6443: connect: operation timed out
and if I just stop k3s
The connection to the server cbo1.ip:6443 was refused - did you specify the right host or port?
that's as expected

eleaner on 6 Feb 2020

What do the logs say?

I built a 3 node dqlite cluster on Ubuntu in VirtualBox just a day or two ago so I know it should work, mostly. CPU usage seemed to be unusual high so I went back to single master.

brandond on 7 Feb 2020

I never said the cluster is not working
I only said that it cannot find dqlite leader when the first server is down

the most interesting thing in the logs seems to be
ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
I can see it every second, even if the cluster is up

Feb 07 07:52:43 cbo2 k3s[24261]: E0207 07:52:43.717128   24261 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.30.222:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.30.222:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Feb 07 07:52:43 cbo2 k3s[24261]: E0207 07:52:43.778638   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:43 cbo2 k3s[24261]: I0207 07:52:43.778679   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:44 cbo2 k3s[24261]: E0207 07:52:44.647233   24261 controller.go:114] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error trying to reach service: 'dial tcp 10.43.30.222:443: i/o timeout', Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
Feb 07 07:52:44 cbo2 k3s[24261]: I0207 07:52:44.647272   24261 controller.go:127] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
Feb 07 07:52:44 cbo2 k3s[24261]: E0207 07:52:44.779024   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:44 cbo2 k3s[24261]: I0207 07:52:44.779067   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:45 cbo2 k3s[24261]: E0207 07:52:45.779385   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:45 cbo2 k3s[24261]: I0207 07:52:45.779430   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:46 cbo2 k3s[24261]: E0207 07:52:46.779827   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:46 cbo2 k3s[24261]: I0207 07:52:46.779877   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:47 cbo2 k3s[24261]: E0207 07:52:47.780244   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:47 cbo2 k3s[24261]: I0207 07:52:47.780289   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:48 cbo2 k3s[24261]: E0207 07:52:48.722943   24261 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.30.222:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.30.222:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Feb 07 07:52:48 cbo2 k3s[24261]: E0207 07:52:48.780575   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:48 cbo2 k3s[24261]: I0207 07:52:48.780622   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:49 cbo2 k3s[24261]: E0207 07:52:49.780820   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:49 cbo2 k3s[24261]: I0207 07:52:49.780852   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:50 cbo2 k3s[24261]: E0207 07:52:50.781207   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:50 cbo2 k3s[24261]: I0207 07:52:50.781250   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:51 cbo2 k3s[24261]: I0207 07:52:51.353469   24261 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io

when the first server is down it adds information on failing dqlite

Feb 07 07:58:10 cbo2 k3s[24261]: time="2020-02-07T07:58:10.550290916Z" level=error msg="error while range on /registry/storageclasses /registry/storageclasses: failed to create dqlite connection: no available dqlite leader server found"
Feb 07 07:58:10 cbo2 k3s[24261]: time="2020-02-07T07:58:10.812393865Z" level=info msg="Connecting to proxy" url="wss://167.86.89.20:6443/v1-k3s/connect"
Feb 07 07:58:10 cbo2 k3s[24261]: time="2020-02-07T07:58:10.813253780Z" level=error msg="Failed to connect to proxy" error="dial tcp 167.86.89.20:6443: connect: connection refused"
Feb 07 07:58:10 cbo2 k3s[24261]: time="2020-02-07T07:58:10.813287562Z" level=error msg="Remotedialer proxy error" error="dial tcp 167.86.89.20:6443: connect: connection refused"
Feb 07 07:58:11 cbo2 k3s[24261]: E0207 07:58:11.108462   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:58:11 cbo2 k3s[24261]: I0207 07:58:11.108506   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:58:12 cbo2 k3s[24261]: E0207 07:58:12.108723   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:58:12 cbo2 k3s[24261]: I0207 07:58:12.108766   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:58:12 cbo2 k3s[24261]: time="2020-02-07T07:58:12.333153888Z" level=error msg="error while range on /registry/leases /registry/leases: failed to create dqlite connection: no available dqlite leader server found"
Feb 07 07:58:13 cbo2 k3s[24261]: time="2020-02-07T07:58:13.085579967Z" level=error msg="error while range on /registry/clusterrolebindings /registry/clusterrolebindings: failed to create dqlite connection: no available dqlite leader server found"
Feb 07 07:58:13 cbo2 k3s[24261]: E0207 07:58:13.108975   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:58:13 cbo2 k3s[24261]: I0207 07:58:13.109016   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:58:14 cbo2 k3s[24261]: E0207 07:58:14.109329   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:58:14 cbo2 k3s[24261]: I0207 07:58:14.109392   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:58:14 cbo2 k3s[24261]: time="2020-02-07T07:58:14.399545671Z" level=error msg="error while range on /registry/csinodes /registry/csinodes: failed to create dqlite connection: no available dqlite leader server found"
Feb 07 07:58:14 cbo2 k3s[24261]: time="2020-02-07T07:58:14.482484472Z" level=error msg="error while range on /registry/minions /registry/minions: failed to create dqlite connection: no available dqlite leader server found"

eleaner on 7 Feb 2020

oh, and now it seems I have a different message returned by kubectl

kubectl get node
Error from server (ServiceUnavailable): the server is currently unable to handle the request

eleaner on 7 Feb 2020

How long did you wait after adding the other servers before shutting down the first master? Is it possible that it hadn't finished replicating all the data and reaching quorum? Unfortunately the various k3s controllers don't export any metrics or write many non-error logs so it's quite hard to figure out what's going on, but I wonder if dqlite wasn't ready to run without the master when you shut it down.

brandond on 8 Feb 2020

Hey ! I'm also having some issues by stress testings the nodes in HA embedded with 3 masters and 2 workers.
There is one open PR that seems to be related: https://github.com/rancher/k3s/pull/1377

mikklfr on 9 Feb 2020

👍1

I am facing the exact same issue. Kinda defies the purpose of HA entirely as the first node becomes single point of failure.
I cannot add much more info than @eleaner already did. my logs look pretty much the same😕

philkry on 25 Feb 2020

👍3

I see the same issues with my HA cluster, raft gets really upset when you start knocking out master nodes, especially if one happened to be the leader at the time.

Kampe on 27 Feb 2020

I am also experiencing the same problem on a 2 master ARM cluster. I even have a floating IP using heartbeat so the advertised IP of the cluster has moved over to Master2. I attempted to stop/start k3s on Master2 but it hangs and never successfully resumes. After starting the service back up on Master1 I can start the service on Master2.

castironclay on 2 Mar 2020

@castironclay 2 master nodes does not have a quorum, please see:
https://github.com/rancher/k3s/issues/1407

wilmardo on 19 Mar 2020

I have hit this bug in my k3s setup in DO. I have 2 nodes in EU and 1 in US. I started the cluster with the node in US, and then added the other two nodes.

After stopping k3s service in US node, cluster became unresponsive, and although after restarting the service a few times workloads have shifted to the other 2 nodes, it seems that dqlite is still looking for the US node as a leader, and is incapable of handing over the leader role.

Is there any information you are currently lacking?

txomon on 22 Mar 2020

I have something similar with the latest version but exactly the same. I start 3 node master with vagrant. First start node1 wait to boot then node2 and 3.

When I shutdown node1 the nodes 2 and 3 keep working but soon as I shutdown node 2 then on node 3 I get this error every time I call kubectl. "failed to create dqlite connection: no available dqlite leader server found"

If I only have 2 nodes for example node 1 and 2 and I shutdown node 1 the same happens

fabiodcorreia on 15 Apr 2020

Ran into this as well, take down the initial node, the remaining masters are unable to determine a leader. I feel like this might be a problem with the documentation as the quickstart for dqlite is different from the rest of the multi-master configs because there is no fixed registration address, just instructions to point other nodes to the initial node by IP address. The remaining server processes are unable to mesh because they all point to one master and can't "figure it out".

brontide on 26 Apr 2020

@brontide I also noticed the different documentation so i setup a virtual IP with keepalived and pointed the individual nodes to that IP upon cluster creation, however the behaviour stayed the same, even though the IP switched to another node when i took the first node offline. the nodes fail to find a leader..

philkry on 26 Apr 2020

@philkry yeah, nodes would not even come up. Pretty clear that the bootstrapping of the dqlite cHA cluster is lacking. I've gone back to single master for now just to get off the ground.

brontide on 26 Apr 2020

@brontide @philkry I can confirm, I set up three master nodes, with a loadbalancer in front of them, used as virtual ip to join them. Problem persists then, too.

srgvg on 27 Apr 2020

I tried working around this by chaining the master nodes (master3 -> master2 -> master1) and it failed completely, dqlite was continuously restarting on each node shortly after I connected the third node.

Actually I was hoping to make a ring of 5 master nodes this way, but it was already broken with three nodes so I gave up.

stellirin on 27 Apr 2020

👍2

I'm hoping that they eventually fix the underlying bootstrapping issue in the meantime I've switched back to single master and looking to backup the db via script in case I ever need to rebuild or replace the master. Seems like they have a good idea but the execution is still lacking.

brontide on 27 Apr 2020

Update 11 May 2020: still checking out if HA is indeed true by analyzing log files and testing.

Update: unsure if fully solved with v1.18.2+k3s1

Starting, restarting and stopping of k3s services seems much more robust across the master nodes.
Command used to install v1.18.2 -> sudo curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=latest INSTALL_K3S_EXEC="--flannel-backend=wireguard" sh -

I also have the same issue (/w v1.17.4+k3s1).

Question:
What is or should be the proper k3s.yaml config in a three node master HA with DQLite? Can't seem to find the proper location for documentation.

My k3s.yaml on all three master nodes (they are identical):

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <removed>
    server: https://127.0.0.1:6443
  name: default
contexts:
- context:
    cluster: default
    user: default
  name: default
current-context: default
kind: Config
preferences: {}
users:
- name: default
  user:
    password: <removed>
    username: admin

Isn't the _--server part_ the problem? All my other subsequent master nodes starts/executes with:
ExecStart=/usr/local/bin/k3s server --flannel-backend=wireguard --server=https://k3s-mn-01.biz:6443 --token=<removed>

This is according to https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/

My first master node starts with:
ExecStart=/usr/local/bin/k3s server --flannel-backend=wireguard

remkolems on 10 May 2020

I can confirm a new cluster running v1.18.2+k3s1 on 3 Pi 4s exhibits this issue. I spun this up using k3sup but all the behaviours seem the same

Like @remkolems I'm not sure how later nodes should be joined in terms of configuration. The init script starts k3s with --server https://192.168.0.103:6443 (the first/initial node with --cluster-init). Does that mean it only ever looks for this node?

I would guess not, as /var/lib/rancher/k3s/server/db/state.dqlite/peers.db contains all the other peers, and logs talk about connecting WebSockets to other peers but nothing seems immediately dqlite-specific otherwise (in terms of reporting status etc).

(Also at the time of writing my node 2 appears to be whining about not finding node 3, all after rebooting node 1 - level=warning msg="no known leader address=192.168.0.105:6443 attempt=0". A simple find in the codebase shows this is from dqlite itself. kubectl get nodes reports everything as good though.)

jamesorlakin on 10 May 2020

I can confirm a new cluster running v1.18.2+k3s1 on 3 Pi 4s exhibits this issue. I spun this up using k3sup but all the behaviours seem the same

Like @remkolems I'm not sure how later nodes should be joined in terms of configuration. The init script starts k3s with --server https://192.168.0.103:6443 (the first/initial node with --cluster-init). Does that mean it only ever looks for this node?

I would guess not, as /var/lib/rancher/k3s/server/db/state.dqlite/peers.db contains all the other peers, and logs talk about connecting WebSockets to other peers but nothing seems immediately dqlite-specific otherwise (in terms of reporting status etc).

(Also at the time of writing my node 2 appears to be whining about not finding node 3, all after rebooting node 1 - level=warning msg="no known leader address=192.168.0.105:6443 attempt=0". A simple find in the codebase shows this is from dqlite itself. kubectl get nodes reports everything as good though.)

Just tested by restarting my first node, during restart/after successful restart of node (and services) used the following command multiple times on all other master nodes:

sudo kubectl get nodes -o wide

I still get the expected results (Ready/NotReady/Ready). Although, I didn't dive into the log files yet.
At what log files _should_ we look at precisely to validate k3s?

Even after restarting my second node, I get the above result.

System specification
UP Squared ATOM Quad Core (memory 4GB / eMMC 32GB)

remkolems on 10 May 2020

behaviour changed for me with v1.18.2+k3s1 as well, but still no HA.
when i restart the first node, the cluster reassembles successfully after the node is back up (was not the case with previous version), but during the restart the control plane was not available.

during the restart logs are flooded with

May 11 10:06:25 node3 k3s[21207]: time="2020-05-11T10:06:25.160331591+02:00" level=warning msg="server unavailable err=failed to establish network connection: tls dial: dial tcp 192.168.64.3:6443: connect: connection refused address=192.168.64.3:6443 attempt=20"

followed by

May 11 10:06:26 node3 k3s[21207]: I0511 10:06:26.557062   21207 trace.go:116] Trace[725908169]: "Create" url:/api/v1/namespaces,user-agent:k3s/v1.18.2+k3s1 (linux/amd64) kubernetes/698e444,client:127.0.0.1 (started: 2020-05-11 10:05:52.547149949 +0200 CEST m=+135.705775041) (total time: 34.009764336s):
May 11 10:06:26 node3 k3s[21207]: Trace[725908169]: [34.009764336s] [34.009468004s] END
May 11 10:06:26 node3 k3s[21207]: E0511 10:06:26.559413   21207 controller.go:207] unable to create required kubernetes system namespace kube-system: Timeout: request did not complete within requested timeout 34s
May 11 10:06:27 node3 k3s[21207]: time="2020-05-11T10:06:27.869245125+02:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
May 11 10:06:27 node3 k3s[21207]: time="2020-05-11T10:06:27.871508631+02:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
May 11 10:06:27 node3 k3s[21207]: time="2020-05-11T10:06:27.881083327+02:00" level=warning msg="no known leader address=192.168.64.3:6443 attempt=49"
May 11 10:06:27 node3 k3s[21207]: time="2020-05-11T10:06:27.881965120+02:00" level=warning msg="no known leader address=192.168.64.3:6443 attempt=68"

...

May 11 10:06:29 node3 k3s[21207]: I0511 10:06:29.169976   21207 trace.go:116] Trace[513043459]: "List etcd3" key:/minions,resourceVersion:,limit:500,continue: (started: 2020-05-11 10:05:29.165133706 +0200 CEST m=+112.323763994) (total time: 1m0.004767816s):
May 11 10:06:29 node3 k3s[21207]: Trace[513043459]: [1m0.004767816s] [1m0.004767816s] END
May 11 10:06:29 node3 k3s[21207]: E0511 10:06:29.175960   21207 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
May 11 10:06:29 node3 k3s[21207]: E0511 10:06:29.177627   21207 writers.go:105] apiserver was unable to write a JSON response: http: Handler timeout
May 11 10:06:29 node3 k3s[21207]: E0511 10:06:29.178796   21207 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}
May 11 10:06:29 node3 k3s[21207]: E0511 10:06:29.179879   21207 writers.go:118] apiserver was unable to write a fallback JSON response: http: Handler timeout
May 11 10:06:29 node3 k3s[21207]: I0511 10:06:29.182012   21207 trace.go:116] Trace[16562761]: "List" url:/api/v1/nodes,user-agent:kubectl/v1.18.2+k3s1 (linux/amd64) kubernetes/698e444,client:127.0.0.1 (started: 2020-05-11 10:05:29.164887063 +0200 CEST m=+112.323517550) (total time: 1m0.016441022s):
May 11 10:06:29 node3 k3s[21207]: Trace[16562761]: [1m0.016441022s] [1m0.016235552s] END
May 11 10:06:29 node3 k3s[21207]: time="2020-05-11T10:06:29.188705508+02:00" level=warning msg="no known leader address=192.168.64.3:6443 attempt=41"
May 11 10:06:29 node3 k3s[21207]: time="2020-05-11T10:06:29.293854107+02:00" level=warning msg="no known leader address=192.168.64.3:6443 attempt=39"

until the first node is back up, i cannot connect on any of the remaining master nodes:

ubuntu@node2:~$ sudo kubectl get nodes
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)

philkry on 11 May 2020

I assume #1770 meant to replace dqlite by the embedded version of etcd - so I don't think any further fixes will be made to dqlite.

CuBiC3D on 18 May 2020

👍4

Was this page helpful?

0 / 5 - 0 ratings