Microk8s: When the node is trying to join the cluster the api server stops listening on 16443 address, the node doesn't show up in the list of nodes

Created on 10 Jul 2020 · 19Comments · Source: ubuntu/microk8s

HA mode. microk8s is deployed successfully.
Performing microk8s join
The process runs fine showing "node joining" and completes.
However the new node is not showing up in the list of nodes and the api server doesn't listen on the endpoint anymore:

[root@bm2-microk8s-ha ~]# k get nodes
NAME STATUS ROLES AGE VERSION
bm2-microk8s-ha Ready 85s v1.18.5-33+2b6eed5dfebf7c
[root@bm2-microk8s-ha ~]# microk8s join 10.243.38.195:25000/cd0ea4934136619d03471f898506c288
Waiting for node to join the cluster.
[root@bm2-microk8s-ha ~]# k get nodes
The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?

Source

grebennikov

Most helpful comment

Did you ever fix this? I'm seeing the same behavior right now. Ubuntu 20.04, microk8s 1.19/edge (v1.19.2 (1735)).

sirpeanut on 30 Sep 2020

👍4

All 19 comments

I cannot confirm/reproduce this. Does this happen consistently? Can you attach the microk8s.inspect tarball?

k is an alias to what? Is it possible k uses the kubeconfig file of the node before joining? Can you try with microk8s.kubectl?

ktsakalozos on 13 Jul 2020

@ktsakalozos sorry, k was alias for microk8s.kubectl
This is very reproducible
inspection-report-20200720_163758.tar.gz

grebennikov on 22 Jul 2020

In the attached tarball I see the apiserver failing to start with "Error: context deadline exceeded".

This may mean the dqlite cluster was not formed and most probably indicates some network configuration issue.

Dqlite listens on port 19001 by default on each node. The cluster nodes should be able to "see" each other _not_ through a nat/proxy.

Could you describe the exact steps you follow. Could you also describe the network on each node? How many interfaces and IPs you see on each node? How does your /etc/hosts look like on the two nodes. Can you also share the contents of /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml?

ktsakalozos on 23 Jul 2020

👍1

sudo cat /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml
[sudo] password for ubuntu:

Address: 10.243.38.195:19001
ID: 3297041220608546238
Role: 0
Address: 10.243.38.201:19001
ID: 1352879667948006471
Role: 2
Address: 10.243.38.202:19001
ID: 14991859051565241993
Role: 0

Single interface.

$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 bm-microk8s-node1

The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.16.108.12 shared-isilon.lab176.local
172.16.108.13 shared-isilon.lab176.local
172.16.108.14 shared-isilon.lab176.local
172.16.108.15 shared-isilon.lab176.local

grebennikov on 23 Jul 2020

👍1

Did you ever fix this? I'm seeing the same behavior right now. Ubuntu 20.04, microk8s 1.19/edge (v1.19.2 (1735)).

sirpeanut on 30 Sep 2020

👍4

I've ran into the same issue, trying to setup micrk8s on my raspi4:

After installing via snap and confirming that microk8s is running, when trying to join a node microk8s dies...

root@pi4:~# snap install microk8s --classic
microk8s (1.19/stable) v1.19.0 from Canonical✓ installed
root@pi4:~# microk8s status --wait-ready
microk8s is running
high-availability: no
  datastore master nodes: 127.0.0.1:19001
  datastore standby nodes: none
root@pi4:~# microk8s kubectl get no
NAME   STATUS   ROLES    AGE     VERSION
pi4    Ready    <none>   3m57s   v1.19.0-34+ff9309c628eb68
root@pi4:~# microk8s join k8:25000/0bf30a1da10ce263d45b78206d0a487f
Contacting cluster at k8
Waiting for this node to finish joining the cluster. .. .. .. .. .. .. .. .. .. ..  
root@pi4:~# microk8s kubectl get nodes
The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?
root@pi4:~# microk8s status
microk8s is not running. Use microk8s inspect for a deeper inspection.

Other node is reachable via network

Schlump on 18 Oct 2020

👍1

I've ran into the same issue, trying to setup micrk8s on my raspi4:

After installing via snap and confirming that microk8s is running, when trying to join a node microk8s dies...

root@pi4:~# snap install microk8s --classic
microk8s (1.19/stable) v1.19.0 from Canonical✓ installed
root@pi4:~# microk8s status --wait-ready
microk8s is running
high-availability: no
  datastore master nodes: 127.0.0.1:19001
  datastore standby nodes: none
root@pi4:~# microk8s kubectl get no
NAME   STATUS   ROLES    AGE     VERSION
pi4    Ready    <none>   3m57s   v1.19.0-34+ff9309c628eb68
root@pi4:~# microk8s join k8:25000/0bf30a1da10ce263d45b78206d0a487f
Contacting cluster at k8
Waiting for this node to finish joining the cluster. .. .. .. .. .. .. .. .. .. ..  
root@pi4:~# microk8s kubectl get nodes
The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?
root@pi4:~# microk8s status
microk8s is not running. Use microk8s inspect for a deeper inspection.

Other node is reachable via network

I have same issue :/

nguyenorlab on 21 Oct 2020

Could you attach the microk8s inspect tarball so we see why the API server is not responding? Does the node show up on microk8s kubectl get no?

ktsakalozos on 21 Oct 2020

inspection-report-20201022_041458.tar.gz
Hi @ktsakalozos
Here is my microk8s inspect tarball. Can you help me to solve it?
Thank you.

nguyenorlab on 22 Oct 2020

Same issue here @ktsakalozos. I have an 8-node rpi4 cluster where I successfully joined six nodes (one of them had to be retried a few times). All nodes have been installed and configured using the exact same Ansible playbook.
The last node however doesn't feel like joining. Attached are two inspection reports, one just before the join, one after.
The apiserver service keeps restarting in a loop and port 16443 never comes online.

Thanks!

before-join: inspection-report-20201028_003902.tar.gz
after-join: inspection-report-20201028_004243.tar.gz

kcalliauw on 28 Oct 2020

Same issue here. 3 rpi4 nodes already joined, I've used the same procedure for them all. The forth node doesn't join, microk8s goes down.

otaviojr on 28 Oct 2020

👍1

Same issue, I found a workaround though.

microk8s stop
In /var/snap/microk8s/1769/var/kubernetes/backend/
- delete join and metadata1
- update contents of cluster.yaml with the contents from the same file on a working node
microk8s start

chasain on 28 Oct 2020

👍2

@chasain this unfortunately didn't work for me. I did notice that the node which I'm trying to join is in the cluster.yaml file of the other nodes, though it doesn't show up with kubectl get nodes

kcalliauw on 29 Oct 2020

@freeekanayaka in the apiserver logs I see

Oct 28 00:41:32 raspi-8 microk8s.daemon-apiserver[9970]: Error: context deadline exceeded
Oct 28 00:41:32 raspi-8 systemd[1]: snap.microk8s.daemon-apiserver.service: Main process exited, code=exited, status=1/FAILURE

Since the node already appears in cluster.yaml, is it possible dqlite assumes this node is already registered with a different IP?

@kcalliauw could you change the failing node's IP and hostname, snap install microk8s again and try to join it? This will tell us if there is an issue with past node registrations.

ktsakalozos on 29 Oct 2020

@ktsakalozos is the node is already in cluster.yaml before trying to join, it must mean that the node was already added before, at least in terms of dqlite clustering (not necessarily in terms of k8s clustering). If you try to join again an existing node you should get an error. The context deadline exceeded error seems to indicate some network issue. You should be able to use the dqlite command line client to inspect the situation at least in terms of dqlite clustering.

freeekanayaka on 29 Oct 2020

Had an issue adding another node today, forgot to mention I disabled firewall while trying to diagnose and my procedure didn't work with it on. Now I've got it working after I purged the install, added ufw rules to allow 16443,19001,10250/tcp then ran through my procedure above

chasain on 29 Oct 2020

I'm a little suspicious of the high-availablity add-on causing this. I don't know much about it but it's installed by default now and I'm wondering if it's the thing telling the node to talk to itself for the cluster API Service when it hasn't fully joined yet, if I get another bad node I'll try disabling that before joining it to the cluster.

chasain on 29 Oct 2020

Reinstalling microk8s on the "master" node fixed the issue.

Schlump on 30 Oct 2020

I ran into the same issue today on one node in my 3 node rpi cluster. A reinstall of the problematic node did not fix the issue, so I reinstalled all 3 nodes and joined them again without issues. Although I can't prove it, I have the suspicion that running the join for both nodes at the same time led to the issue initially.