K3s: Failed to start agent (node.crt) 403 forbidden

Created on 10 May 2019  路  19Comments  路  Source: k3s-io/k3s

Command i run to start k3s:

`
root@pi02:/home/pirate# sudo k3s agent --server https://192.168.178.129:6443 --token MyLongTokenHere

INFO[2019-05-10T09:41:07.179226896Z] Starting k3s agent v0.5.0 (8c0116dd)
ERRO[2019-05-10T09:41:09.748260722Z] https://192.168.178.129:6443/v1-k3s/node.crt: 403 Forbidden
ERRO[2019-05-10T09:41:15.767480924Z] https://192.168.178.129:6443/v1-k3s/node.crt: 403 Forbidden

`

kinbug

Most helpful comment

Agent nodes are now registering with the server using a randomly generated password, where the server will save the password and verify against future attempts.

See /var/lib/rancher/k3s/server/cred/passwd and /var/lib/rancher/k3s/agent/node-password.txt.

We should improve the behavior of this process and/or documentation.

All 19 comments

In the k3s server console is this messages send aswell:

ERRO[2019-05-10T09:52:55.556246642Z] Node password validation failed for [pi02]

Hello, I have the same error.

I started a k3s master on a Raspberry Pi 3, then added a node on another raspberry. All worked well.

I then tried to add a old laptop with amd64 architecture, the node join command fails with error mentioned by you.

I'm not sure yet why this is happening.

It dous not seem to be device related
I tried setting up a small cluster using VM's and i got the same problem
After a reinstall of the master/vm that has to be the node It joined fine

Encountered the same issue. I believe it happens when you connect an agent, then reinstall the agent and then try to connect again. I fixed the issues by reinstalling the master and reconnecting the agents. Maybe it is enough the rename the hostname of the agent but since installing / uninstalling k3s is so quick I've choosen the reinstall.

Same issue on my end. I use PiBakery to flash SD cards regularly, and one of the steps is to change the SSH host key. Could that be the culprit?

Agent nodes are now registering with the server using a randomly generated password, where the server will save the password and verify against future attempts.

See /var/lib/rancher/k3s/server/cred/passwd and /var/lib/rancher/k3s/agent/node-password.txt.

We should improve the behavior of this process and/or documentation.

@erikwilson Thanks for your quick response. Now I understand the current behaviour.

I don't think it is a bug -- it is more a feature which is not described in the documentation. The question which should be answered there: How can we connect a new agent from a machine with a previously installed agent (hence another password).

For now I would simply edit /var/lib/rancher/k3s/agent/node-password.txt to match the password on the server. Not sure if it makes sense to add an additional environment variables (e.g. K3S_AGENT_PASSWORD) since it adds complexity to the public API for an edge-case scenario. It could make sense if you want to make password(s) configurable (e.g. to automatically bootstrap K3S via some other frameworks), but then you would need to add password configuration APIs to the agent and the server.

@gonzochic I just simply remove the row for the agent nodes in :/var/lib/rancher/k3s/server/cred/passwd and the agent will try to connect again and success

359 adds some information to the logs with https://github.com/rancher/k3s/commit/2c9444399b427ffb706818f5bf3892a8880673bf#diff-3bad6c3d4f6253f430990a390e851fdbR87 and https://github.com/rancher/k3s/commit/2c9444399b427ffb706818f5bf3892a8880673bf#diff-c45ec7f2fffc93a75fb235d3efcda580R292, still needs documentation

Docs update at rancher/docs/pull/1593

None of the solutions commented above are working for me (@erikwilson @derekhe). I've installed 3 times servers and agents and still the same issue. I have changed the password in the agent to match server's one and the same. The steps that I followed:

  1. Install server on node 1
    curl -sfL https://get.k3s.io | sh -

  2. Once it finishes, I check token and passwords:

sudo cat  /var/lib/rancher/k3s/server/node-token
K10d0b0a18a5ee13bb5d2d7169b464b7bda4caf7a608aae38570cd0d68c5d7e5a66::node:a98d365cf84b08b10b7423ea55df6563
$ sudo cat /var/lib/rancher/k3s/server/cred/passwd
9e9b5473d1c841d28097a6856bd44152,admin,admin,system:masters
385b9ce5edb6b6cb9f2a026c3997667b,system,system,system:masters
a98d365cf84b08b10b7423ea55df6563,node,node,system:masters

I guess the correct passwd entry is the third one. And it also matches with the part from the token: _node:a98d365cf84b08b10b7423ea55df6563_. I imagine that is the way the agent knows about the server password.

  1. Install agent on node 2
    curl -sfL https://get.k3s.io | K3S_URL=https://192.168.0.115:6443 K3S_TOKEN=K10d0b0a18a5ee13bb5d2d7169b464b7bda4caf7a608aae38570cd0d68c5d7e5a66::node:a98d365cf84b08b10b7423ea55df6563 sh -

  2. journal logs show error authenticating:
    Node password rejected, contents of '/var/lib/rancher/k3s/agent/node-password.txt' may not match server passwd entry

  3. I check the server password in the agent and it is wrong (/var/lib/rancher/k3s/agent/node-password.txt) , so I set it to _a98d365cf84b08b10b7423ea55df6563_

  4. Still the same error as point 4.

Do you have any idea what could be wrong? Thanks :)

@alvgarvilla please re-install master node with k3s-uninstall.sh. it makes that solved in my case.

The node password to remove is in;
/var/lib/rancher/k3s/server/cred/passwd/node-passwd

You will see random string followed by node name.

@steve-winter The node password to remove is in;
/var/lib/rancher/k3s/server/cred/passwd/node-passwd

Clarification... the file is /var/lib/rancher/k3s/server/cred/node-passwd

Worked like a champ. I had a mix-arch cluster (4 arm64's and an intel agent). I screwed up Flannel so just decided to reinstall k3s. I ran uninstall script on all 5 nodes succesfully. And the reinstall worked fine on the arm64 master and three workers. But the Intel agent gave me fits until I found this issue.

Maybe it is enough the rename the hostname of the agent but since installing / uninstalling k3s is so quick I've choosen the reinstall.

@gonzochic, After renaming the agent node, it worked!!

@franciscojsc that is correct. I got both agent and server with the same hostname (raspberry) and it was conflicting. I rename the agent and everything start working again. Is there any place in the documentation specifying that the hostnames have to be different? I could not find it

@erikwilson Can the entry in /var/lib/rancher/k3s/server/cred/node-passwd be removed once the node is removed from the cluster? I am running a cluster with autoscalar, there is quite a chance that node with name similar to an removed node may added to the cluster at some point of time in future.

I have added a gist for same, in case anyone facing the same issue.

Node names within the cluster do need to be unique, for the record. This is a k8s requirement, not just k3s. I'm not aware of any distributed system that works well with duplicate node names...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pierreozoux picture pierreozoux  路  4Comments

VictorRobellini picture VictorRobellini  路  3Comments

ewoutp picture ewoutp  路  4Comments

gilkotton picture gilkotton  路  3Comments

Moep90 picture Moep90  路  3Comments