Steps to recreate:
v1.8.0-alpha.1 (or building off master)--authorization rbac)The API fails to start successfully and you can see logs such as follows:
[reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: Failed to list *v1.Node: nodes is forbidden: User "kubelet" cannot list nodes at the cluster scope
[reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "kubelet" cannot list pods at the cluster scope
[reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:413: Failed to list *v1.Service: services is forbidden: User "kubelet" cannot list services at the cluster scope
This is now occurring due to the following change in Kubernetes v1.8: https://github.com/kubernetes/kubernetes/pull/49638 (related documentation: https://kubernetes.io/docs/admin/authorization/node/#rbac-node-permissions)
I validated this by using my admin token to add the system:nodes Group back into the system:node ClusterRoleBinding, which resolved these issues and built the cluster successfully.
What do you suggest would be the appropriate way forward to fix this for kops?
Friendly ping @chrislovecnm , @liggitt :)
@liggitt / @justin what new changes do we need in terms of RBAC or kubelet?
I think the docs tell us what to do
Question is what is the node admission plugin ...
To ensure requests from nodes are authorized, you can:
a. ensure nodes user/group names conform to the Node authorizer requirements (in the system:nodes group, with a username of system:node:<nodeName>), and add in the Node authorization mode with --authorization-mode=Node,RBAC
b. OR create an RBAC clusterrolebinding from the system:node role to the system:nodes group (this replicates what was done automatically prior to 1.8, but has all the same issues with any node being able to read any secret and modify status for any other node or pod)
If you use the Node authorization mode and your node usernames/groups conform to its requirements, it would make sense to also enable the NodeRestriction admission plugin (--admission-control=...,NodeRestriction,...), which provides the other half of restricting kubelets to only modify their own node's status and have write permission to pods on their node.
This follows us down security challenge as well https://kubernetes.io/docs/admin/kubelet-tls-bootstrapping/. Which I am uncertain we are currently doing.
It appears that is we get bootstrapping working the name is setup correctly.
As @liggitt mentioned, the way to resolve it until a proper fix is to edit the system:node cluster role binding as follow:
kubectl edit clusterrolebinding system:node
add the group binding to the subjects section, which should look like that:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:node
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:node
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:nodes
I think we should create the bridging policy in kops 1.8, and aim to support the full NodeAuthorizer in kops 1.9.
@KashifSaadat I do not think we are blocked on this now. Have you tested master?
Hey @chrislovecnm, tested with Justin's interim fix in PR #3683 and this is working successfully now for a freshly built cluster.
Will close this issue as the future work required is documented in the roadmap, thanks! :)
$ kubectl set subject clusterrolebinding system:node --group=system:nodes
@JinsYin LGTM
Would it be possible to break the solution down in layman's terms?
I'm experiencing the same problem with k8s 1.12.1 (nodes can't join a brand new kubeadm-created cluster, same error messages) after following the standard kubeadm instructions and use Flannel.
When I follow the advice of @etiennetremel above and edit the clusterrolebinding, I get an error that isn't very meaningful for me:
error: no original object found for &unstructured.Unstructured{Object:map[string]interface {}{"metadata":map[string]interface {}{"annotations":map[string]interface {}{"rbac.authorization.kubernetes.io/autoupdate":"true"}, "labels":map[string]interface {}{"kubernetes.io/bootstrapping":"rbac-defaults"}, "name":"system:node"}, "roleRef":map[string]interface {}{"apiGroup":"rbac.authorization.k8s.io", "kind":"ClusterRole",
"name":"system:node"}, "subjects":[]interface {}{map[string]interface {}{"kind":"Group", "name":"system:nodes", "apiGroup":"rbac.authorization.k8s.io"}}, "apiVersion":"rbac.authorization.k8s.io/v1", "kind":"ClusterRoleBinding"}}
Running command above:
kubectl set subject clusterrolebinding system:node --group=system:nodes
also did nothing. Nodes still can't join:
Oct 24 08:32:02 tc-k8s-002.some.private.domain kubelet[5016]: E1024 08:32:02.693409 5016 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "system:bootstrap:kt708z" cannot list resource "pods" in API group "" at the cluster scope
Oct 24 08:32:02 tc-k8s-002.some.private.domain kubelet[5016]: E1024 08:32:02.694203 5016 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: services is forbidden: User "system:bootstrap:kt708z" cannot list resource "services" in API group "" at the cluster scope
Oct 24 08:32:02 tc-k8s-002.some.private.domain kubelet[5016]: E1024 08:32:02.695741 5016 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: nodes "tc-k8s-002.some.private.domain" is forbidden: User "system:bootstrap:kt708z" cannot list resource "nodes" in API group "" at the cluster scope
Oct 24 08:32:02 tc-k8s-002.some.private.domain kubelet[5016]: E1024 08:32:02.761084 5016 kubelet.go:2236] node "tc-k8s-002.some.private.domain" not found
In the past (on 1.11) my cluster worked fine with permissive clusterrolebinding, but this doesn't seem to work anymore.
Update: I just downgraded everything to 1.11, did a heavy reset and things are working again.
Most helpful comment
As @liggitt mentioned, the way to resolve it until a proper fix is to edit the
system:nodecluster role binding as follow:kubectl edit clusterrolebinding system:nodeadd the group binding to the subjects section, which should look like that: