Environmental Info:
K3s Master Version: k3s version v1.19.2+k3s-48ed47c4 (48ed47c4)
k3s Agent Version: same as latest k3os version
Node(s) CPU architecture, OS, and Version:
K3s Master: Linux
Cluster Configuration:
1 master, 4 workers with plans to expand master (thus HA)
Describe the bug:
Cannot connect lower level node to higher level one
Steps To Reproduce:
Expected behavior:
k3s agent on 1.18 should be able to connect to the 1.19 master
Actual behavior:
it fails with 401 Unauthorized
Can you provide additional information on the error you're getting? The message you're describing is not actually one that is returned when trying to add nodes, so I suspect something else is going on.
@brandond Just unauthorized. And on the master side it is also nothing but generic "tls bad certificate".
First, connecting 1.19 on agent node to 1.19 master works fine as expected.
Then as a comparison I destroyed the master and downgraded it to the most recent 1.18 to match the version of agents with the same token before.
It worked. This proves 1.18 agent can't connect to 1.19.
Thus I think this is a regression.
I will see if there's more valuable data from debug log...Also I think the converse wouldn't be problematic (i.e 1.19 agent connecting to 1.18 master)
The following docker-compose.yaml demonstrates the problem:
version: "3.7"
services:
leader:
container_name: leader
hostname: leader
image: "rancher/k3s:v1.19.2-k3s1"
command: ["server"]
privileged: true
environment:
- K3S_TOKEN=issue/2311
ports:
- "6443:6443"
worker:
depends_on:
- leader
container_name: worker
hostname: worker
privileged: true
image: "rancher/k3s:v1.18.9-k3s1"
command: ["agent"]
environment:
- K3S_TOKEN=issue/2311
- K3S_URL=https://leader:6443
I see matching errors on leader/worker, e.g. docker-compose logs -f:
worker | time="2020-09-26T09:13:51.720481710Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
leader | time="2020-09-26T09:13:53.723018107Z" level=info msg="Cluster-Http-Server 2020/09/26 09:13:53 http: TLS handshake error from 172.23.0.3:54876: remote error: tls: bad certificate"
worker | time="2020-09-26T09:13:53.733182682Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
leader | time="2020-09-26T09:13:55.735863801Z" level=info msg="Cluster-Http-Server 2020/09/26 09:13:55 http: TLS handshake error from 172.23.0.3:54894: remote error: tls: bad certificate"
worker | time="2020-09-26T09:13:55.749880143Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
leader | time="2020-09-26T09:13:57.752446742Z" level=info msg="Cluster-Http-Server 2020/09/26 09:13:57 http: TLS handshake error from 172.23.0.3:54916: remote error: tls: bad certificate"
worker | time="2020-09-26T09:13:57.758415643Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
leader | time="2020-09-26T09:13:59.760852992Z" level=info msg="Cluster-Http-Server 2020/09/26 09:13:59 http: TLS handshake error from 172.23.0.3:54934: remote error: tls: bad certificate"
worker | time="2020-09-26T09:13:59.767157866Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
leader | time="2020-09-26T09:14:01.768755883Z" level=info msg="Cluster-Http-Server 2020/09/26 09:14:01 http: TLS handshake error from 172.23.0.3:54954: remote error: tls: bad certificate"
worker | time="2020-09-26T09:14:01.772217224Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
Hmm, interesting. That's not a URI I was expecting to see it hitting with a token. Did we not get our basic authenticator plugged back in to some of the routes? From the lack of a "bad username/password" message on the leader, I don't think it's even being used.
@bradtopol Yes, using 1.19 I tried to access the control plane in port 6443, and the past behavior (Basic auth) is gone.
Yes, basic auth was dropped from upstream Kubernetes in 1.19. We kept around a copy that we use just for bootstrapping but apparently it's not registered in all the right places.
Looks like the issue has something to do with the addition of the supervisor port https://github.com/rancher/k3s/commit/e5fe184a441ec5a61420a30aaf3d5e6524ebc08e
Validated in k3s v1.19.3-rc1+k3s2, 1.18 agent successfully joined 1.19 master node
Master: v1.19.3-rc1+k3s2
Agent: v1.18.10+k3s1
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-47-192 Ready master 31m v1.19.3-rc1+k3s2
ip-172-31-35-140 Ready <none> 13s v1.18.10+k3s1