Version:
k3s version v1.17.2+k3s1 (cdab19b0)
curl -sfL https://get.k3s.io | sh -curl -sfL https://get.k3s.io | K3S_URL=https://controllerpi.local:6443 K3S_TOKEN=... sh -Simple build here: two brand new RPi4's running Raspbian Buster Lite, apt update/upgrade were run before I installed k3s. I also updated the RPi's eeprom (with rpi-eeprom-update). Server has hostname controllerpi and worker has hostname workerpi1.
Worker won't connect to server. Log is full of:
=info msg="Starting k3s agent v1.17.2+k3s1 (cdab19b0)"
=info msg="module overlay was already loaded"
=info msg="module br_netfilter was already loaded"
=info msg="Running load balancer 127.0.0.1:41461 -> [controllerpi.local:6443]"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: read tcp 127.0.0.1:48874->127.0.0.1:41461: read: co
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
I've tried using curl on the load balancer url (https://127.0.0.1:41461/cacerts), the server's hostname (https://controllerpi.local:6443/cacerts), and the server's IP (https://192.168.1.X:6443/cacerts). They all fail with:
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
If I add the -k flag to curl it works in all cases (load balancer, hostname, and IP). I'm wondering if that has something to do with it.
Thanks!
@bmatcuk do your pi鈥檚 have the correct time or were the certs possibly created when they didn鈥檛 have the correct time.
What Linux distro is this on? Might be another issue with iptables vs nftables.
@dweomer that's an excellent question. I'm not sure what the system time was when I installed k3s, but, I checked the validity dates on all of the certs in /var/lib/rancher/k3s/server/tls and they look good:
root@controllerpi:/var/lib/rancher/k3s/server/tls# for f in *.crt; do openssl x509 -in "$f" -noout -text | grep -A2 Validity; done
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 02:55:00 2021 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 02:55:00 2021 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 4 02:55:00 2030 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 02:55:00 2021 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 02:55:00 2021 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 02:55:00 2021 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 02:55:00 2021 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 02:55:00 2021 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 02:55:00 2021 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 4 02:55:00 2030 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 4 02:55:00 2030 GMT
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 02:55:00 2021 GMT
root@controllerpi:/var/lib/rancher/k3s/server/tls# date
Fri 7 Feb 14:45:44 GMT 2020
```
I also read in another github issue that this is related to the `k3s-serving` secret, so, I checked that cert as well:
root@controllerpi:/var/lib/rancher/k3s/server/tls# kubectl get secret -o yaml k3s-serving -n kube-system
apiVersion: v1
data:
tls.crt: ...
etc
root@controllerpi:/var/lib/rancher/k3s/server/tls# echo 'tls.crt here' | base64 --decode | openssl x509 -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
2a:c7:56:64:94:35:ea:2e
Signature Algorithm: ecdsa-with-SHA256
Issuer: CN = k3s-server-ca@1581044100
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 04:49:32 2021 GMT
Subject: O = k3s, CN = k3s
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:3b:69:66:8e:d6:56:0e:f9:ce:9b:e9:88:8f:c3:
c6:08:fe:7b:e4:76:2b:76:5c:df:02:2a:6f:da:f4:
13:65:79:77:c7:6c:d5:32:ad:44:d6:d4:17:87:f4:
6f:1a:d2:e2:87:29:fb:2f:6c:f2:74:eb:4e:85:e3:
9e:65:a7:ad:a3
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Subject Alternative Name:
DNS:controllerpi.local, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc.cluster.local, DNS:localhost, IP Address:10.43.0.1, IP Address:127.0.0.1, IP Address:169.254.231.129, IP Address:192.168.1.168
Signature Algorithm: ecdsa-with-SHA256
etc
```
Date looks fine and the cert includes hostname (controllerpi.local) and IP (192.168.1.168).
I tried curl'ing https://localhost:6443/cacerts on the server: it also fails to verify the cert issuer. If I run curl --cacert /var/lib/rancher/server/tls/server-ca.crt https://localhost:6443/cacerts it works. I tried copying server-ca.crt to the worker node and installing it (via update-ca-certificates), but that did not solve the problem =(
@brandond Raspbian Buster Lite, updated and upgraded before k3s install.
I tried uninstalling k3s on both my controller and worker, ran apt update/upgrade again to make sure I'm on latest, and reinstalled. Same issue. I checked my iptables - looks like it's saying there are legacy tables present? Does that indicate an iptables issue? Here's the output:
root@controllerpi:/home/pi# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
# Warning: iptables-legacy tables present, use iptables-legacy to see them
root@controllerpi:/home/pi# iptables-legacy -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL all -- anywhere anywhere
Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
ACCEPT all -- 10.42.0.0/16 anywhere
ACCEPT all -- anywhere 10.42.0.0/16
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL all -- anywhere anywhere
Chain KUBE-EXTERNAL-SERVICES (1 references)
target prot opt source destination
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
Chain KUBE-FORWARD (1 references)
target prot opt source destination
DROP all -- anywhere anywhere ctstate INVALID
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT all -- 10.42.0.0/16 anywhere /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere 10.42.0.0/16 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain KUBE-PROXY-CANARY (0 references)
target prot opt source destination
Chain KUBE-SERVICES (3 references)
target prot opt source destination
I found issue #1353 which, combined with some googlin', explained the nftables vs iptables-legacy issue in kubernetes. So, I tried uninstalling both the server and agent, running the update-alternatives command on both, and reinstalling. Still no dice.
So, I found some expanded documentation in the official kubernetes documentation (link at bottom of this comment). Once again, uninstall, run the prescribed commands to use legacy mode, reinstall. Still no dice.
However, I have noticed a small change. If I try to curl controller_ip:6443/cacerts on the agent node, I still get an error about being unable to verify the cert. However, if I try to curl /cacerts through the load balancer on the agent node, I get a different error:
root@workerpi1:/etc/alternatives# curl -k -vvv https://127.0.0.1:36999/cacerts
* Expire in 0 ms for 6 (transfer 0x1e7e880)
* Trying 127.0.0.1...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x1e7e880)
* Connected to 127.0.0.1 (127.0.0.1) port 36999 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:36999
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:36999
By the way, the "Known Issues" page recommends iptables v1.6.1+ to avoid the nftables issue, but I was using v1.8.2 so I thought I was "safe". That section should be expanded. The official kubernetes documentation has instructions: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#ensure-iptables-tooling-does-not-use-the-nftables-backend
After trying many things, I eventually figured out the issue: when I installed the agent, I used K3S_URL=https://controllerpi.local:6443. This seemed to work fine - I can resolve controllerpi.local everywhere on my network. But, it appears the k3s agent load balancer cannot resolve that hostname. When I edited the /etc/systemd/system/k3s-agent.service.env file to change the K3S_URL to https://192.168.1.168:6443 and then restart the k3s-agent service it worked fine.
@bmatcuk - Thanks for posting your solution here; I had a similar problem, when setting up a Pi cluster today. I had changed the hostname of the master node after installing K3s, but forgot that the hostname is embedded in the agent nodes' k3s service file config. So I had to update all those configs and make sure they were pointed at the right DNS name/IP of the master, and they all connected pretty much immediately after a service restart!
Thanks for putting up your solution, @bmatcuk . There was an important hint in here, namely that the error could be caused by a node not being able to talk to the master. In my case I had an incorrect firewall policy that prevented some of the nodes from reaching the master, but the error thrown left me clueless. You likely saved me a lot of time today.
One of the things to note about the .local domain is that this is usually served by mDNS (aka Bonjour). While you might be able to resolve .local addresses from the host command line, k3s uses the golang native resolver that does not support mDNS. You can use hosts file entries, a local DNS server, or IP addresses, instead.
Aha! That finally explains it!
Most helpful comment
After trying many things, I eventually figured out the issue: when I installed the agent, I used
K3S_URL=https://controllerpi.local:6443. This seemed to work fine - I can resolvecontrollerpi.localeverywhere on my network. But, it appears the k3s agent load balancer cannot resolve that hostname. When I edited the/etc/systemd/system/k3s-agent.service.envfile to change the K3S_URL tohttps://192.168.1.168:6443and then restart the k3s-agent service it worked fine.