Kubeadm: JWS token not being created in cluster-info ConfigMap

Created on 3 Jul 2017 · 37Comments · Source: kubernetes/kubeadm

Versions

kubeadm version (use kubeadm version): 1.7.0, commit d3ada0119e776222f11ec7945e6d860061339aad

Environment:

Kubernetes version (use kubectl version): 1.7.0, commit d3ada0119e776222f11ec7945e6d860061339aad
Cloud provider or hardware configuration: Vagrant environment being configured by https://github.com/erhudy/kubeadm-vagrant
OS (e.g. from /etc/os-release): Xenial 16.04.2
Kernel (e.g. uname -a): 4.4.0-81-generic
Others: N/A

What happened?

The current version of kubeadm does not appear to be inserting the JWS token into the cluster-info ConfigMap. I tried providing it a token that I want it to use (the mode used by the Vagrantfile referenced above), and when that failed, resetting kubeadm and re-running init while allowing it to generate the token itself. Both modes failed. The consequence of this is that joining nodes to the master is not possible unless the JWS token is manually created and inserted into the cluster-info ConfigMap.

Rolling back to 1.6.6 (in the Vagrantfile, modifying the package installation line to apt-get install -y docker.io kubelet=1.6.6-00 kubeadm=1.6.6-00 kubectl=1.6.6-00 kubernetes-cni) causes everything to function as expected.

When I compared the config maps generated by 1.6.6 versus 1.7.0, the JWS key is indeed missing from 1.7.0. In 1.6.6, under the top-level data, key, there was a key beginning with jws-kubeconfig-, with its value being a JWS token. No such key exists when the cluster is bootstrapped by kubeadm 1.7.0.

What you expected to happen?

Joining workers to the master should be possible in 1.7.0 without manually editing the cluster-info ConfigMap.

How to reproduce it (as minimally and precisely as possible)?

Run the Vagrantfile from https://github.com/erhudy/kubeadm-vagrant with vagrant up. When it attempts to join the first worker, kubeadm will fail with the error message there is no JWS signed token in the cluster-info ConfigMap.

Anything else we need to know?

No.

kinbug prioritimportant-soon

Source

erhudy

👍1

Most helpful comment

Fixed by https://github.com/kubernetes/kubernetes/issues/48480

liggitt on 5 Jul 2017

🎉4

All 37 comments

I have the same issue, also had to rollback to 1.6.6.
However kubeadm 1.6.6 with kubelet 1.7.0 and kubernetes 1.7.0 works as expected also, so problem in kubeadm.
I'm using CentOS 7.2 instances on AWS

alexpekurovsky on 3 Jul 2017

Facing the same problem as above. Minions are not able to join the cluster and keep on failing with
Failed to connect to API Server "host:6443": there is no JWS signed token in the cluster-info ConfigMap. This token id "4e9c3a" is invalid for this cluster, can't connect

shekharoracle on 3 Jul 2017

Tested on a different Mac with the same Vagrant setup - this one bootstrapped successfully. Not sure what differences there could be, aside from the computer where it's functional being older and slower (which always leads to suspicions of some sort of race condition).

erhudy on 4 Jul 2017

What does the logs of the controller-manager say in the faulty deployment?
The problem seems to be in the controller-manager, since the cluster-info ConfigMap isn't updated

I'm having trouble reproducing this...

luxas on 4 Jul 2017

Just ran the bootstrap on the computer where it was failing - failed again. Here is the controller-manager log from the failing deployment: https://gist.github.com/erhudy/65029423cfbe35983c32ff69d2eec0c8

erhudy on 4 Jul 2017

By way of comparison, here are the controller-manager logs from a successful deployment, immediately after kubeadm joins the first worker to the master: https://gist.github.com/erhudy/102af7fe0394edcfae49c75c9192e187

erhudy on 4 Jul 2017

No question about it:

E0704 17:15:23.523852       1 reflector.go:201] k8s.io/kubernetes/pkg/controller/bootstrap/bootstrapsigner.go:151: Failed to list *v1.ConfigMap: User "system:serviceaccount:kube-system:bootstrap-signer" cannot list configmaps in the namespace "kube-public". (get configmaps)
E0704 17:15:24.527348       1 reflector.go:201] k8s.io/kubernetes/pkg/controller/bootstrap/bootstrapsigner.go:151: Failed to list *v1.ConfigMap: User "system:serviceaccount:kube-system:bootstrap-signer" cannot list configmaps in the namespace "kube-public". (get configmaps)
E0704 17:15:25.530671       1 reflector.go:201] k8s.io/kubernetes/pkg/controller/bootstrap/bootstrapsigner.go:151: Failed to list *v1.ConfigMap: User "system:serviceaccount:kube-system:bootstrap-signer" cannot list configmaps in the namespace "kube-public". (get configmaps)
E0704 17:15:26.532794       1 reflector.go:201] k8s.io/kubernetes/pkg/controller/bootstrap/bootstrapsigner.go:151: Failed to list *v1.ConfigMap: User "system:serviceaccount:kube-system:bootstrap-signer" cannot list configmaps in the namespace "kube-public". (get configmaps)
E0704 17:15:27.535508       1 reflector.go:201] k8s.io/kubernetes/pkg/controller/bootstrap/bootstrapsigner.go:151: Failed to list *v1.ConfigMap: User "system:serviceaccount:kube-system:bootstrap-signer" cannot list configmaps in the namespace "kube-public". (get configmaps)
E0704 17:15:28.537732       1 reflector.go:201] k8s.io/kubernetes/pkg/controller/bootstrap/bootstrapsigner.go:151: Failed to list *v1.ConfigMap: User "system:serviceaccount:kube-system:bootstrap-signer" cannot list configmaps in the namespace "kube-public". (get configmaps)
E0704 17:15:29.541843       1 reflector.go:201] k8s.io/kubernetes/pkg/controller/bootstrap/bootstrapsigner.go:151: Failed to list *v1.ConfigMap: User "system:serviceaccount:kube-system:bootstrap-signer" cannot list configmaps in the namespace "kube-public". (get configmaps)
E0704 17:15:30.543980       1 reflector.go:201] k8s.io/kubernetes/pkg/controller/bootstrap/bootstrapsigner.go:151: Failed to list *v1.ConfigMap: User "system:serviceaccount:kube-system:bootstrap-signer" cannot list configmaps in the namespace "kube-public". (get configmaps)

What does kubectl -n kube-public get role system:controller:bootstrap-signer -oyaml output?

luxas on 4 Jul 2017

ubuntu@master:~$ kubectl -n kube-public get role system:controller:bootstrap-signer -oyaml
Error from server (NotFound): roles.rbac.authorization.k8s.io "system:controller:bootstrap-signer" not found

Strangely enough, while rebuilding the environment again on the computer where it's been consistently failing, it actually joined a worker successfully to the master, so I had to destroy the environment and rebuild it again to get a failure. There definitely seems to be something timing-related going on.

erhudy on 4 Jul 2017

cc @kubernetes/sig-auth-bugs

Seems like it takes a lot of time sometimes to create auto-bootstrapped RBAC rules...

@erhudy The API server is responsible for creating RBAC rules specified here: https://github.com/kubernetes/kubernetes/tree/master/plugin/pkg/auth/authorizer/rbac

It seems like the API server somehow doesn't do that for you (at least not fast enough); which results in a broken state where the BootstrapSigner can't sign the cluster-info ConfigMap so kubeadm join can succeed.

As a workaround; here is what the rule should look like:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:controller:bootstrap-signer
  namespace: kube-public
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resourceNames:
  - cluster-info
  resources:
  - configmaps
  verbs:
  - update
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - patch
  - update

Applying that to a faulty deployment should fix it...

luxas on 4 Jul 2017

If the signer only attempts once, it should wait until the server is healthy (via /healthz) before attempting. If it is done via a controller loop, it should requeue on failure

liggitt on 4 Jul 2017

@liggitt I think the signer tries again and again and again (see the log), but the RBAC Role for it isn't just created as @erhudy confirmed with the kubectl command.

luxas on 4 Jul 2017

apiserver log would be helpful in that case, as well as the /healthz status

liggitt on 4 Jul 2017

@erhudy ^

luxas on 4 Jul 2017

I'm also hitting this: kubeadm/k8s 1.7.0 on GCE/Ubuntu.

I could workaround it by applying the missing role AND rolebinding to the kube-public namespace.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:controller:bootstrap-signer
  namespace: kube-public
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resourceNames:
  - cluster-info
  resources:
  - configmaps
  verbs:
  - update
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - patch
  - update
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:controller:bootstrap-signer
  namespace: kube-public
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: system:controller:bootstrap-signer
subjects:
- kind: ServiceAccount
  name: bootstrap-signer
  namespace: kube-system

Dirbaio on 5 Jul 2017

API server log: https://gist.github.com/erhudy/fe9e30b588025dc4596fc7c06861c01f

erhudy on 5 Jul 2017

healthz status while the join attempts from the worker are ongoing and failing:

ubuntu@master:~$ curl -k https://10.96.0.1/healthz
ok

erhudy on 5 Jul 2017

Looks like there's something causing the kube-public namespace to not be created in time?

E0704 22:35:57.681740       1 storage_rbac.go:235] \
unable to reconcile role.rbac.authorization.k8s.io/system:controller:bootstrap-signer \
in kube-public: namespaces "kube-public" not found

erhudy on 5 Jul 2017

Namespace doesn't exist at reconcile time:

E0704 22:35:57.681740 1 storage_rbac.go:235] unable to reconcile role.rbac.authorization.k8s.io/system:controller:bootstrap-signer in kube-public: namespaces "kube-public" not found

liggitt on 5 Jul 2017

Fixed by https://github.com/kubernetes/kubernetes/issues/48480

liggitt on 5 Jul 2017

🎉4

The kube-public namespace is created by the bootstrap controller, which can race with storage post-start hooks.

liggitt on 5 Jul 2017

bootstrap controller

I suppose you're talking about this code: https://github.com/kubernetes/kubernetes/blob/master/pkg/master/controller.go#L148

Yeah, very unlucky that our e2e CI didn't catch this race condition a single time :/

Thanks to @erhudy @alexpekurovsky @shekharupland and @Dirbaio we are now aware of it and could fix the race condition between the controller-manager and apiserver post-start hooks :+1:!

luxas on 5 Jul 2017

Thanks for the fix!

erhudy on 5 Jul 2017

Does it mean we need to wait for kubernetes 1.7.1?
Thanks for the fix

alexpekurovsky on 5 Jul 2017

@alexpekurovsky Yes. Meanwhile you can just kubectl apply the RoleBinding and Role

luxas on 5 Jul 2017

👍1

https://github.com/kubernetes/kubeadm/issues/335#issuecomment-312962574 worked for me too. Thanks for the fix and the workaround.

rmohr on 6 Jul 2017

Update this issue from my lab test with 3 difference scenario (Vbox, Google Cloud, VMWare On-Premise). We facing this problem only on oracle virtualbox only with difference parameter of " --apiserver-advertise-address " is it issue ?

praparn on 30 Sep 2017

Hi,
same as @praparn, I'm facing the same problem on qemu VM with kubeadm 1.8.1 and with --apiserver-advertise-address=0.0.0.0 parameter changed.

OS is CentOS 7.

dimitrijezivkovic on 12 Oct 2017

@praparn @dimitrijezivkovic If you think you've found a new issue with v1.8.1, please create a new issue with more details.

luxas on 12 Oct 2017

Same problem with 1.8. Any possible repair with:
"ailed to connect to API Server "XXXX:6443": there is no JWS signed token in the cluster-info ConfigMap. This token id "fb0a7d" is invalid for this cluster, can't connect".
That same token was ok last week.
Any possible, logical explantion or workaround? Next year you will have this working 100%?

vglisin on 17 Oct 2017

👍2

@vglisin That is because the token has expired. We have informed about this policy already in kubeadm v1.7 CLI output, and in the release notes: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#behavioral-changes

The default Bootstrap Token created with kubeadm init v1.8 expires and is deleted after 24 hours by default to limit the exposure of the valuable credential. You can create a new Bootstrap Token with kubeadm token create or make the default token permanently valid by specifying --token-ttl 0 to kubeadm init. The default token can later be deleted with kubeadm token delete.

Note that the issue you're describing is vastly different from the topic of this issue. That's why I asked you to open new issues instead of commenting on old, resolved ones.

Also I want that you keep in mind that this is open source. If you find things that are sub-optimal, no one is gonna stop you from contributing a good change.

luxas on 17 Oct 2017

👍2 🎉1

That was my problem @luxas , I missed this part of information and was trying to join with an expired token. Thank you :)

nelsonfassis on 23 Nov 2017

Thanks @luxas. kubeadm init --token-ttl 0 works for me. I'll use it as a workaround.

vhosakot on 18 Dec 2017

@luxas same here. In case you are using kubespray, do the following to check if problem is exactly that:

On master node run this command

kubeadm token create and copy generate token

On worker node, edit /etc/kubernetes/kubeadm-client.conf and put your new token into token field.

Then, run: kubeadm join --config /etc/kubernetes/kubeadm-client.conf --ignore-preflight-errors=all and it shall join the cluster

mlushpenko on 5 Feb 2018

👍2

gnature for token ID "w30hqq", will try again
I0114 15:02:45.146194 5300 round_trippers.go:445] GET https://10.128.0.57:80/api/v1/namespaces/kube-public/co
nfigmaps/cluster-info?timeout=10s 200 OK in 7 milliseconds
I0114 15:02:45.146496 5300 token.go:221] [discovery] The cluster-info ConfigMap does not yet contain a JWS si
gnature for token ID "w30hqq", will try again
I0114 15:02:51.009632 5300 round_trippers.go:445] GET https://10.128.0.57:80/api/v1/namespaces/kube-public/co
nfigmaps/cluster-info?timeout=10s 200 OK in 6 milliseconds
I0114 15:02:51.009999 5300 token.go:221] [discovery] The cluster-info ConfigMap does not yet contain a JWS si
gnature for token ID "w30hqq", will try again

kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c6
80a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:25:59Z", GoVersion:"go1.15.5", Compiler:"gc", P
latform:"linux/amd64"}

ratulb on 14 Jan 2021

👎1

kubeadm init --token-ttl 0 - has no effect.

ratulb on 14 Jan 2021

I am facing this issue intermittently. While joining multiple control-plane nodes - facing this for one or two - while others succeed. This is done in a loop.

I am on cri-containerd-cni 1.3.4.

ratulb on 14 Jan 2021

I0114 15:02:51.009999 5300 token.go:221] [discovery] The cluster-info ConfigMap does not yet contain a JWS si
gnature for token ID "w30hqq", will try again

there is a controller that is responsible for adding the bootstrap tokens in "cluster-info". kubeadm waits for that to happen for a while. if the token is never added, there must be a problem elsewhere - e.g. controller in question or the controller-manager.