K3s: Worker agent: "failed to get CA certs: EOF"

Created on 7 Feb 2020  路  10Comments  路  Source: k3s-io/k3s

Version:

k3s version v1.17.2+k3s1 (cdab19b0)

  • Server was installed with curl -sfL https://get.k3s.io | sh -
  • Agent was installed with curl -sfL https://get.k3s.io | K3S_URL=https://controllerpi.local:6443 K3S_TOKEN=... sh -

Simple build here: two brand new RPi4's running Raspbian Buster Lite, apt update/upgrade were run before I installed k3s. I also updated the RPi's eeprom (with rpi-eeprom-update). Server has hostname controllerpi and worker has hostname workerpi1.

Worker won't connect to server. Log is full of:

=info msg="Starting k3s agent v1.17.2+k3s1 (cdab19b0)"
=info msg="module overlay was already loaded"
=info msg="module br_netfilter was already loaded"
=info msg="Running load balancer 127.0.0.1:41461 -> [controllerpi.local:6443]"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: read tcp 127.0.0.1:48874->127.0.0.1:41461: read: co
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"

I've tried using curl on the load balancer url (https://127.0.0.1:41461/cacerts), the server's hostname (https://controllerpi.local:6443/cacerts), and the server's IP (https://192.168.1.X:6443/cacerts). They all fail with:

curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

If I add the -k flag to curl it works in all cases (load balancer, hostname, and IP). I'm wondering if that has something to do with it.

Thanks!

Most helpful comment

After trying many things, I eventually figured out the issue: when I installed the agent, I used K3S_URL=https://controllerpi.local:6443. This seemed to work fine - I can resolve controllerpi.local everywhere on my network. But, it appears the k3s agent load balancer cannot resolve that hostname. When I edited the /etc/systemd/system/k3s-agent.service.env file to change the K3S_URL to https://192.168.1.168:6443 and then restart the k3s-agent service it worked fine.

All 10 comments

@bmatcuk do your pi鈥檚 have the correct time or were the certs possibly created when they didn鈥檛 have the correct time.

What Linux distro is this on? Might be another issue with iptables vs nftables.

@dweomer that's an excellent question. I'm not sure what the system time was when I installed k3s, but, I checked the validity dates on all of the certs in /var/lib/rancher/k3s/server/tls and they look good:

root@controllerpi:/var/lib/rancher/k3s/server/tls# for f in *.crt; do openssl x509 -in "$f" -noout -text | grep -A2 Validity; done
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  6 02:55:00 2021 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  6 02:55:00 2021 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  4 02:55:00 2030 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  6 02:55:00 2021 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  6 02:55:00 2021 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  6 02:55:00 2021 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  6 02:55:00 2021 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  6 02:55:00 2021 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  6 02:55:00 2021 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  4 02:55:00 2030 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  4 02:55:00 2030 GMT
        Validity
            Not Before: Feb  7 02:55:00 2020 GMT
            Not After : Feb  6 02:55:00 2021 GMT

root@controllerpi:/var/lib/rancher/k3s/server/tls# date
Fri  7 Feb 14:45:44 GMT 2020
 ```

I also read in another github issue that this is related to the `k3s-serving` secret, so, I checked that cert as well:

root@controllerpi:/var/lib/rancher/k3s/server/tls# kubectl get secret -o yaml k3s-serving -n kube-system
apiVersion: v1
data:
tls.crt: ...
etc

root@controllerpi:/var/lib/rancher/k3s/server/tls# echo 'tls.crt here' | base64 --decode | openssl x509 -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
2a:c7:56:64:94:35:ea:2e
Signature Algorithm: ecdsa-with-SHA256
Issuer: CN = k3s-server-ca@1581044100
Validity
Not Before: Feb 7 02:55:00 2020 GMT
Not After : Feb 6 04:49:32 2021 GMT
Subject: O = k3s, CN = k3s
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:3b:69:66:8e:d6:56:0e:f9:ce:9b:e9:88:8f:c3:
c6:08:fe:7b:e4:76:2b:76:5c:df:02:2a:6f:da:f4:
13:65:79:77:c7:6c:d5:32:ad:44:d6:d4:17:87:f4:
6f:1a:d2:e2:87:29:fb:2f:6c:f2:74:eb:4e:85:e3:
9e:65:a7:ad:a3
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Subject Alternative Name:
DNS:controllerpi.local, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc.cluster.local, DNS:localhost, IP Address:10.43.0.1, IP Address:127.0.0.1, IP Address:169.254.231.129, IP Address:192.168.1.168
Signature Algorithm: ecdsa-with-SHA256
etc
```

Date looks fine and the cert includes hostname (controllerpi.local) and IP (192.168.1.168).

I tried curl'ing https://localhost:6443/cacerts on the server: it also fails to verify the cert issuer. If I run curl --cacert /var/lib/rancher/server/tls/server-ca.crt https://localhost:6443/cacerts it works. I tried copying server-ca.crt to the worker node and installing it (via update-ca-certificates), but that did not solve the problem =(

@brandond Raspbian Buster Lite, updated and upgraded before k3s install.

I tried uninstalling k3s on both my controller and worker, ran apt update/upgrade again to make sure I'm on latest, and reinstalled. Same issue. I checked my iptables - looks like it's saying there are legacy tables present? Does that indicate an iptables issue? Here's the output:

root@controllerpi:/home/pi# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
# Warning: iptables-legacy tables present, use iptables-legacy to see them
root@controllerpi:/home/pi# iptables-legacy -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
KUBE-FORWARD  all  --  anywhere             anywhere             /* kubernetes forwarding rules */
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
ACCEPT     all  --  10.42.0.0/16         anywhere
ACCEPT     all  --  anywhere             10.42.0.0/16

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere

Chain KUBE-EXTERNAL-SERVICES (1 references)
target     prot opt source               destination

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere             ctstate INVALID
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  10.42.0.0/16         anywhere             /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             10.42.0.0/16         /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-KUBELET-CANARY (0 references)
target     prot opt source               destination

Chain KUBE-PROXY-CANARY (0 references)
target     prot opt source               destination

Chain KUBE-SERVICES (3 references)
target     prot opt source               destination

I found issue #1353 which, combined with some googlin', explained the nftables vs iptables-legacy issue in kubernetes. So, I tried uninstalling both the server and agent, running the update-alternatives command on both, and reinstalling. Still no dice.

So, I found some expanded documentation in the official kubernetes documentation (link at bottom of this comment). Once again, uninstall, run the prescribed commands to use legacy mode, reinstall. Still no dice.

However, I have noticed a small change. If I try to curl controller_ip:6443/cacerts on the agent node, I still get an error about being unable to verify the cert. However, if I try to curl /cacerts through the load balancer on the agent node, I get a different error:

root@workerpi1:/etc/alternatives# curl -k -vvv https://127.0.0.1:36999/cacerts
* Expire in 0 ms for 6 (transfer 0x1e7e880)
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x1e7e880)
* Connected to 127.0.0.1 (127.0.0.1) port 36999 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:36999
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:36999

By the way, the "Known Issues" page recommends iptables v1.6.1+ to avoid the nftables issue, but I was using v1.8.2 so I thought I was "safe". That section should be expanded. The official kubernetes documentation has instructions: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#ensure-iptables-tooling-does-not-use-the-nftables-backend

After trying many things, I eventually figured out the issue: when I installed the agent, I used K3S_URL=https://controllerpi.local:6443. This seemed to work fine - I can resolve controllerpi.local everywhere on my network. But, it appears the k3s agent load balancer cannot resolve that hostname. When I edited the /etc/systemd/system/k3s-agent.service.env file to change the K3S_URL to https://192.168.1.168:6443 and then restart the k3s-agent service it worked fine.

@bmatcuk - Thanks for posting your solution here; I had a similar problem, when setting up a Pi cluster today. I had changed the hostname of the master node after installing K3s, but forgot that the hostname is embedded in the agent nodes' k3s service file config. So I had to update all those configs and make sure they were pointed at the right DNS name/IP of the master, and they all connected pretty much immediately after a service restart!

Thanks for putting up your solution, @bmatcuk . There was an important hint in here, namely that the error could be caused by a node not being able to talk to the master. In my case I had an incorrect firewall policy that prevented some of the nodes from reaching the master, but the error thrown left me clueless. You likely saved me a lot of time today.

One of the things to note about the .local domain is that this is usually served by mDNS (aka Bonjour). While you might be able to resolve .local addresses from the host command line, k3s uses the golang native resolver that does not support mDNS. You can use hosts file entries, a local DNS server, or IP addresses, instead.

Aha! That finally explains it!

Was this page helpful?
0 / 5 - 0 ratings