K3s: 1.18 agent fails to join against 1.19 server, 401 unauthorized

Created on 25 Sep 2020 · 8Comments · Source: k3s-io/k3s

Environmental Info:
K3s Master Version: k3s version v1.19.2+k3s-48ed47c4 (48ed47c4)
k3s Agent Version: same as latest k3os version

Node(s) CPU architecture, OS, and Version:
K3s Master: Linux 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
1 master, 4 workers with plans to expand master (thus HA)

Describe the bug:
Cannot connect lower level node to higher level one

Steps To Reproduce:

install 1.19 on master
install latest k3os on workers
create a common secret set set it both on master(s) and worker(s)
start master(s) and worker(s)

Expected behavior:
k3s agent on 1.18 should be able to connect to the 1.19 master

Actual behavior:
it fails with 401 Unauthorized

Done

Source

stevefan1999-personal

All 8 comments

Can you provide additional information on the error you're getting? The message you're describing is not actually one that is returned when trying to add nodes, so I suspect something else is going on.

brandond on 25 Sep 2020

@brandond Just unauthorized. And on the master side it is also nothing but generic "tls bad certificate".

First, connecting 1.19 on agent node to 1.19 master works fine as expected.
Then as a comparison I destroyed the master and downgraded it to the most recent 1.18 to match the version of agents with the same token before.
It worked. This proves 1.18 agent can't connect to 1.19.
Thus I think this is a regression.

I will see if there's more valuable data from debug log...Also I think the converse wouldn't be problematic (i.e 1.19 agent connecting to 1.18 master)

stevefan1999-personal on 26 Sep 2020

The following docker-compose.yaml demonstrates the problem:

version: "3.7"
services:
  leader:
    container_name: leader
    hostname: leader
    image: "rancher/k3s:v1.19.2-k3s1"
    command: ["server"]
    privileged: true
    environment:
      - K3S_TOKEN=issue/2311
    ports:
      - "6443:6443"
  worker:
    depends_on:
      - leader
    container_name: worker
    hostname: worker
    privileged: true
    image: "rancher/k3s:v1.18.9-k3s1"
    command: ["agent"]
    environment:
      - K3S_TOKEN=issue/2311
      - K3S_URL=https://leader:6443

I see matching errors on leader/worker, e.g. docker-compose logs -f:

worker    | time="2020-09-26T09:13:51.720481710Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
leader    | time="2020-09-26T09:13:53.723018107Z" level=info msg="Cluster-Http-Server 2020/09/26 09:13:53 http: TLS handshake error from 172.23.0.3:54876: remote error: tls: bad certificate"
worker    | time="2020-09-26T09:13:53.733182682Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
leader    | time="2020-09-26T09:13:55.735863801Z" level=info msg="Cluster-Http-Server 2020/09/26 09:13:55 http: TLS handshake error from 172.23.0.3:54894: remote error: tls: bad certificate"
worker    | time="2020-09-26T09:13:55.749880143Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
leader    | time="2020-09-26T09:13:57.752446742Z" level=info msg="Cluster-Http-Server 2020/09/26 09:13:57 http: TLS handshake error from 172.23.0.3:54916: remote error: tls: bad certificate"
worker    | time="2020-09-26T09:13:57.758415643Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
leader    | time="2020-09-26T09:13:59.760852992Z" level=info msg="Cluster-Http-Server 2020/09/26 09:13:59 http: TLS handshake error from 172.23.0.3:54934: remote error: tls: bad certificate"
worker    | time="2020-09-26T09:13:59.767157866Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"
leader    | time="2020-09-26T09:14:01.768755883Z" level=info msg="Cluster-Http-Server 2020/09/26 09:14:01 http: TLS handshake error from 172.23.0.3:54954: remote error: tls: bad certificate"
worker    | time="2020-09-26T09:14:01.772217224Z" level=error msg="token is not valid: https://127.0.0.1:34973/apis: 401 Unauthorized"

dweomer on 26 Sep 2020

👍1

Hmm, interesting. That's not a URI I was expecting to see it hitting with a token. Did we not get our basic authenticator plugged back in to some of the routes? From the lack of a "bad username/password" message on the leader, I don't think it's even being used.

brandond on 26 Sep 2020

@bradtopol Yes, using 1.19 I tried to access the control plane in port 6443, and the past behavior (Basic auth) is gone.

stevefan1999-personal on 27 Sep 2020

Yes, basic auth was dropped from upstream Kubernetes in 1.19. We kept around a copy that we use just for bootstrapping but apparently it's not registered in all the right places.

brandond on 27 Sep 2020

👀1

Looks like the issue has something to do with the addition of the supervisor port https://github.com/rancher/k3s/commit/e5fe184a441ec5a61420a30aaf3d5e6524ebc08e

erikwilson on 3 Oct 2020

Validated in k3s v1.19.3-rc1+k3s2, 1.18 agent successfully joined 1.19 master node
Master: v1.19.3-rc1+k3s2
Agent: v1.18.10+k3s1

kubectl get nodes 
NAME               STATUS   ROLES    AGE   VERSION
ip-172-31-47-192   Ready    master   31m   v1.19.3-rc1+k3s2
ip-172-31-35-140   Ready    <none>   13s   v1.18.10+k3s1