Kops: RBAC Authorization fails when deploying a fresh cluster on Kubernetes v1.8.0

Created on 6 Oct 2017  Â·  11Comments  Â·  Source: kubernetes/kops

Steps to recreate:

  1. Use kops release v1.8.0-alpha.1 (or building off master)
  2. Create a Kubernetes Cluster (mostly default settings are fine) setting RBAC Authorization (--authorization rbac)
  3. Login to a master node
  4. Follow the journal log for the kubelet service or view the kube-controller-manager logs

The API fails to start successfully and you can see logs such as follows:

[reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: Failed to list *v1.Node: nodes is forbidden: User "kubelet" cannot list nodes at the cluster scope
[reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "kubelet" cannot list pods at the cluster scope
[reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:413: Failed to list *v1.Service: services is forbidden: User "kubelet" cannot list services at the cluster scope

This is now occurring due to the following change in Kubernetes v1.8: https://github.com/kubernetes/kubernetes/pull/49638 (related documentation: https://kubernetes.io/docs/admin/authorization/node/#rbac-node-permissions)

I validated this by using my admin token to add the system:nodes Group back into the system:node ClusterRoleBinding, which resolved these issues and built the cluster successfully.

What do you suggest would be the appropriate way forward to fix this for kops?

Friendly ping @chrislovecnm , @liggitt :)

blocks-next

Most helpful comment

As @liggitt mentioned, the way to resolve it until a proper fix is to edit the system:node cluster role binding as follow:

kubectl edit clusterrolebinding system:node

add the group binding to the subjects section, which should look like that:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:node
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:node
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes

All 11 comments

@liggitt / @justin what new changes do we need in terms of RBAC or kubelet?

I think the docs tell us what to do

  1. Enable the Node authorization mode (--authorization-mode=Node,RBAC) and the NodeRestriction admission plugin
  2. Ensure all kubelets’ credentials conform to the group/username requirements
  3. Audit apiserver logs to ensure the Node authorizer is not rejecting requests from kubelets (no persistent NODE DENY messages logged)

Question is what is the node admission plugin ...

To ensure requests from nodes are authorized, you can:
a. ensure nodes user/group names conform to the Node authorizer requirements (in the system:nodes group, with a username of system:node:<nodeName>), and add in the Node authorization mode with --authorization-mode=Node,RBAC
b. OR create an RBAC clusterrolebinding from the system:node role to the system:nodes group (this replicates what was done automatically prior to 1.8, but has all the same issues with any node being able to read any secret and modify status for any other node or pod)

If you use the Node authorization mode and your node usernames/groups conform to its requirements, it would make sense to also enable the NodeRestriction admission plugin (--admission-control=...,NodeRestriction,...), which provides the other half of restricting kubelets to only modify their own node's status and have write permission to pods on their node.

This follows us down security challenge as well https://kubernetes.io/docs/admin/kubelet-tls-bootstrapping/. Which I am uncertain we are currently doing.

It appears that is we get bootstrapping working the name is setup correctly.

As @liggitt mentioned, the way to resolve it until a proper fix is to edit the system:node cluster role binding as follow:

kubectl edit clusterrolebinding system:node

add the group binding to the subjects section, which should look like that:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:node
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:node
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes

I think we should create the bridging policy in kops 1.8, and aim to support the full NodeAuthorizer in kops 1.9.

@KashifSaadat I do not think we are blocked on this now. Have you tested master?

Hey @chrislovecnm, tested with Justin's interim fix in PR #3683 and this is working successfully now for a freshly built cluster.

Will close this issue as the future work required is documented in the roadmap, thanks! :)

$ kubectl set subject clusterrolebinding system:node --group=system:nodes

@JinsYin LGTM

Would it be possible to break the solution down in layman's terms?
I'm experiencing the same problem with k8s 1.12.1 (nodes can't join a brand new kubeadm-created cluster, same error messages) after following the standard kubeadm instructions and use Flannel.

When I follow the advice of @etiennetremel above and edit the clusterrolebinding, I get an error that isn't very meaningful for me:

error: no original object found for &unstructured.Unstructured{Object:map[string]interface {}{"metadata":map[string]interface {}{"annotations":map[string]interface {}{"rbac.authorization.kubernetes.io/autoupdate":"true"}, "labels":map[string]interface {}{"kubernetes.io/bootstrapping":"rbac-defaults"}, "name":"system:node"}, "roleRef":map[string]interface {}{"apiGroup":"rbac.authorization.k8s.io", "kind":"ClusterRole",
"name":"system:node"}, "subjects":[]interface {}{map[string]interface {}{"kind":"Group", "name":"system:nodes", "apiGroup":"rbac.authorization.k8s.io"}}, "apiVersion":"rbac.authorization.k8s.io/v1", "kind":"ClusterRoleBinding"}}

Running command above:

 kubectl set subject clusterrolebinding system:node --group=system:nodes

also did nothing. Nodes still can't join:

Oct 24 08:32:02 tc-k8s-002.some.private.domain kubelet[5016]: E1024 08:32:02.693409 5016 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "system:bootstrap:kt708z" cannot list resource "pods" in API group "" at the cluster scope
Oct 24 08:32:02 tc-k8s-002.some.private.domain kubelet[5016]: E1024 08:32:02.694203 5016 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: services is forbidden: User "system:bootstrap:kt708z" cannot list resource "services" in API group "" at the cluster scope
Oct 24 08:32:02 tc-k8s-002.some.private.domain kubelet[5016]: E1024 08:32:02.695741 5016 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: nodes "tc-k8s-002.some.private.domain" is forbidden: User "system:bootstrap:kt708z" cannot list resource "nodes" in API group "" at the cluster scope
Oct 24 08:32:02 tc-k8s-002.some.private.domain kubelet[5016]: E1024 08:32:02.761084 5016 kubelet.go:2236] node "tc-k8s-002.some.private.domain" not found

In the past (on 1.11) my cluster worked fine with permissive clusterrolebinding, but this doesn't seem to work anymore.
Update: I just downgraded everything to 1.11, did a heavy reset and things are working again.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

olalonde picture olalonde  Â·  4Comments

argusua picture argusua  Â·  5Comments

DocValerian picture DocValerian  Â·  4Comments

yetanotherchris picture yetanotherchris  Â·  3Comments

RXminuS picture RXminuS  Â·  5Comments