kops cluster with RBAC enabled is failing

Created on 21 Feb 2017  路  21Comments  路  Source: kubernetes/kops

I created a kubernetes cluster (v1.5.3) with a master and 3 nodes using kops v1.5.1 on AWS.
I added the following additional config to support RBAC (Before doing kops update cluster --yes).

kubeAPIServer: authorizationMode: RBAC,AlwaysAllow authorizationRbacSuperUser: admin runtimeConfig: rbac.authorization.k8s.io/v1alpha1: "true"

Cluster was working fine. I added necessary serviceaccounts, clusterroles, clusterrolebindings etc..
Now, I removed "AlwaysAllow" string from the authorizationMode in the kops config file and tried to do update / rolling-update. Nothing happened (I think kops hasn't added that feature to detect these changes). So, I tried doing the force rolling update (using --force) and then all the master / nodes were recreated and the serviceaccounts are working as expected (I guess). But the nodes are not bound to the cluster anymore. (When I did kubectl get nodes -o wide only the master was getting displayed. But I can see all 3 nodes in the nodes autoscaling group). I thought this force rolling update broke the cluster networking so I tried creating a fresh cluster, this time directly with RBAC (with out AlwaysAllow) and I faced the same issue with the nodes. Only master was accessible using 'kubectl'.

I also tried editing the instance groups for increasing / decreasing the node count and the issue still remained. Autoscaling group is adding / removing the instances but none of them were accessible using kubectl.

I just want to know whether this behavior is expected (is this going to be fixed in the future versions?). If not (If I am doing something wrong), what is the best way to enable RBAC on a cluster installed with kops.

All 21 comments

@sethpollack any ideas?

What do the clusterroles/clusterrolebindings look like?

I created a new client 'admin' in system:masters group and used these client certificates for accessing the cluster after re-creating all the master & node instances (via kops --force).
Along with that I also added the following cluster role and role binding so that the kubernetes dashboard won't break (which anyway didn't work as the nodes themselves are not bound to the cluster).

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  name: na-ca-role
rules:
- apiGroups:
  - '*'
  attributeRestrictions: null
  resources:
  - '*'
  verbs:
  - '*'
- attributeRestrictions: null
  nonResourceURLs:
  - '*'
  verbs:
  - '*'

---

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  name: na-ca-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: na-ca-role
subjects:
- kind: ServiceAccount
  name: default
  namespace: kube-system
- kind: User
  name: admin
- kind: Group
  name: admin

In kubernetes 1.6 RBAC is moving to beta, will have default roles/bindings, and will start each controller with a distinct service account.

For 1.5, the equivalent would be granting cluster-admin to the kube-controller-manager user.

I am waiting for 1.6 to fix up my roles, and just added name: system:authenticated to cluster-admin for now.

- apiVersion: rbac.authorization.k8s.io/v1alpha1
  kind: ClusterRoleBinding
  metadata:
    creationTimestamp: null
    labels:
      kubernetes.io/bootstrapping: rbac-defaults
    name: cluster-admin
  roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: cluster-admin
  subjects:
  - kind: Group
    name: system:masters
  - kind: Group
    name: system:authenticated

@sethpollack thanks for the suggestion. But, adding name: system:authenticated would give root level access to any client of the cluster right? So, do I have to add name: system:authenticated to cluster-admin role even after granting cluster-admin to the kube-controller-manager user. (I mean is adding name: system:authenticated to cluster-admin compulsory in 1.5 to make RBAC work?)
If not, I am guessing that the nodes are failing because this kube-controller-manager user doesn't have cluster-admin privileges. Am I right?

No you don't need to add name: system:authenticated, It's probably a bad idea, I just used it as a shortcut for now.

All you should need is the kube-controller-manager user.

oh..great thanks. Please don't mind clarifying a small query about this kube-controller-manager and some other user called kubelet (both in kube-system namespace if I am not wrong). What exactly are the roles of these users in the cluster? And, I believe they will be retained in 1.6 as well (Correct me If I am wrong)

I am not sure.

@sethpollack I gave cluster-admin privilege to kube-controller-manager user. Now the nodes are accessible in the cluster. But, I got an issue with deployment kube-dns

Kumuds-MacBook-Pro:Kubernetes RBAC kumud$ kubectl get deployments --all-namespaces -o wide
NAMESPACE     NAME                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   dns-controller            1         1         1            1           5d
kube-system   heapster                  1         1         1            1           5d
kube-system   kube-dns                  2         2         2            0           5d
kube-system   kube-dns-autoscaler       1         1         1            1           5d

Also, the kube-dns pods are giving me this Status

kube-system   kube-dns-782804071-15jwd   2/4       CrashLoopBackOff   26         49m
kube-system   kube-dns-782804071-qrhsn   2/4       CrashLoopBackOff   27         49m

Find the container logs of the kube-dns deployments below

Kumuds-MacBook-Pro:~ kumud$ kubectl log kube-dns-782804071-bf2ts kubedns --namespace=kube-system
W0227 17:30:14.724024   11440 cmd.go:325] log is DEPRECATED and will be removed in a future version. Use logs instead.
I0227 11:59:07.138970       1 dns.go:42] version: v1.6.0-alpha.0.680+3872cb93abf948-dirty
I0227 11:59:07.139486       1 server.go:107] Using https://100.64.0.1:443 for kubernetes master, kubernetes API: <nil>
I0227 11:59:07.140028       1 server.go:68] Using configuration read from ConfigMap: kube-system:kube-dns
I0227 11:59:07.140115       1 server.go:113] FLAG: --alsologtostderr="false"
I0227 11:59:07.140160       1 server.go:113] FLAG: --config-map="kube-dns"
I0227 11:59:07.140218       1 server.go:113] FLAG: --config-map-namespace="kube-system"
I0227 11:59:07.140256       1 server.go:113] FLAG: --dns-bind-address="0.0.0.0"
I0227 11:59:07.140289       1 server.go:113] FLAG: --dns-port="10053"
I0227 11:59:07.140326       1 server.go:113] FLAG: --domain="cluster.local."
I0227 11:59:07.140379       1 server.go:113] FLAG: --federations=""
I0227 11:59:07.140418       1 server.go:113] FLAG: --healthz-port="8081"
I0227 11:59:07.140455       1 server.go:113] FLAG: --kube-master-url=""
I0227 11:59:07.140490       1 server.go:113] FLAG: --kubecfg-file=""
I0227 11:59:07.140521       1 server.go:113] FLAG: --log-backtrace-at=":0"
I0227 11:59:07.140592       1 server.go:113] FLAG: --log-dir=""
I0227 11:59:07.140627       1 server.go:113] FLAG: --log-flush-frequency="5s"
I0227 11:59:07.140662       1 server.go:113] FLAG: --logtostderr="true"
I0227 11:59:07.140716       1 server.go:113] FLAG: --stderrthreshold="2"
I0227 11:59:07.140751       1 server.go:113] FLAG: --v="2"
I0227 11:59:07.140783       1 server.go:113] FLAG: --version="false"
I0227 11:59:07.140819       1 server.go:113] FLAG: --vmodule=""
I0227 11:59:07.140893       1 server.go:155] Starting SkyDNS server (0.0.0.0:10053)
I0227 11:59:07.143224       1 server.go:165] Skydns metrics enabled (/metrics:10055)
I0227 11:59:07.143297       1 dns.go:144] Starting endpointsController
I0227 11:59:07.143334       1 dns.go:147] Starting serviceController
I0227 11:59:07.143768       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0227 11:59:07.143838       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
E0227 11:59:37.143823       1 sync.go:105] Error getting ConfigMap kube-system:kube-dns err: Get https://100.64.0.1:443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp 100.64.0.1:443: i/o timeout
E0227 11:59:37.144029       1 dns.go:190] Error getting initial ConfigMap: Get https://100.64.0.1:443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp 100.64.0.1:443: i/o timeout, starting with default values
I0227 11:59:37.144120       1 dns.go:163] Waiting for Kubernetes service
I0227 11:59:37.144157       1 dns.go:169] Waiting for service: default/kubernetes
E0227 11:59:37.145483       1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://100.64.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
E0227 11:59:37.145651       1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://100.64.0.1:443/api/v1/services?resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
I0227 12:00:06.848283       1 server.go:150] Ignoring signal terminated (can only be terminated by SIGKILL)
E0227 12:00:07.145233       1 reflector.go:199] pkg/dns/config/sync.go:114: Failed to list *api.ConfigMap: Get https://100.64.0.1:443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dkube-dns&resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
E0227 12:00:08.168417       1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://100.64.0.1:443/api/v1/services?resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
E0227 12:00:08.168699       1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://100.64.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
Kumuds-MacBook-Pro:~ kumud$ kubectl log kube-dns-782804071-bf2ts dnsmasq --namespace=kube-system
W0227 17:32:09.681666   11455 cmd.go:325] log is DEPRECATED and will be removed in a future version. Use logs instead.
dnsmasq[1]: started, version 2.76 cachesize 1000
dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
dnsmasq[1]: using nameserver 127.0.0.1#10053
dnsmasq[1]: read /etc/hosts - 7 addresses



md5-fd02b4f06c1b341e8128f709932302f6



Kumuds-MacBook-Pro:~ kumud$ kubectl log kube-dns-782804071-bf2ts dnsmasq-metrics --namespace=kube-system
W0227 17:32:38.969032   11460 cmd.go:325] log is DEPRECATED and will be removed in a future version. Use logs instead.
ERROR: logging before flag.Parse: I0227 11:54:55.132991       1 main.go:38] dnsmasq-metrics v1.0
ERROR: logging before flag.Parse: I0227 11:54:55.133036       1 server.go:44] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:dnsmasq PrometheusSubsystem:cache})
ERROR: logging before flag.Parse: W0227 11:57:10.156076       1 server.go:53] Error getting metrics from dnsmasq: read udp 127.0.0.1:43983->127.0.0.1:53: read: connection refused



md5-fd02b4f06c1b341e8128f709932302f6



Kumuds-MacBook-Pro:~ kumud$ kubectl log kube-dns-782804071-bf2ts healthz --namespace=kube-system
W0227 17:32:59.905283   11463 cmd.go:325] log is DEPRECATED and will be removed in a future version. Use logs instead.
2017/02/27 11:55:56 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-02-27 11:55:54.608151211 +0000 UTC, error exit status 1

What could have gone wrong with these deployments now?

Also, please suggest me the best way to restart the kubernetes API server. Currently, I am restarting the whole master instance to achieve this.

Not sure.

I usually just kill my master nodes to restart the kubernetes API server.

Could you please refer someone who can help me with this kube-dns issue ?

If its auth related, maybe @liggitt can help?

No you don't need to add name: system:authenticated, It's probably a bad idea, I just used it as a shortcut for now.

All you should need is the kube-controller-manager user.

if you have pods that access the API (like the kube-dns pod does), you need to give some permissions (not cluster-admin, generally) to that pod's service account.

the role needed by kube-dns is at https://github.com/kubernetes/kubernetes/pull/38816/files#diff-0dd2098231b4213ca11a4c4734757936 (and is included by default in 1.6)

more detailed RBAC doc is in progress for 1.6 at https://github.com/kubernetes/kubernetes.github.io/pull/2618 (preview at https://deploy-preview-2618--kubernetes-io-master-staging.netlify.com/docs/admin/authorization/rbac/). The concepts all apply to 1.5, though the version (v1beta1), the command-line helpers, and many of the default roles are new in 1.6

thanks!

@liggitt Thanks a lot for the clarification.

But, the service account of kube-dns pods has already got cluster-admin role. Please check the cluster role binding applied before (Check the embedded code).

apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  name: na-ca-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: default
  namespace: kube-system

So, I don't think the issue is because of the permissions. Please let me know if you need any further details. Thanks.

@liggitt @sethpollack @chrislovecnm Thank you all

Adding this fixed all my issues

apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  name: na-ca-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: Group
  name: system:serviceaccounts:kube-system

I'd recommend targeting the specific service accounts you want to escalate

@liggitt currently there is only one serviceaccount default in the kube-system namespace. I escalated that as mentioned in my penultimate comment. But, that didn't fix the kube-dns issue.
I dunno why, but when I checked the pod description it is still using the default secret of the default serviceaccount in the kube-system namespace to access the api.

@liggitt @sethpollack @chrislovecnm

I modified my ClusterRoleBinding to this and kube-dns is working with this as well. Is this better than my previous version ??

apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  name: na-ca-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: User
  name: system:serviceaccount:kube-system:default
- kind: User
  name: kubelet
- kind: User
  name: kube-apiserver
- kind: User
  name: kube-controller-manager
- kind: User
  name: kube-proxy
- kind: User
  name: kube-scheduler

that's a lot more targeted, yes. for the first subject, this should work as well:

- kind: ServiceAccount
  name: default
  namespace: kube-system

Great...Thank you all for the support. I am now closing this issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rot26 picture rot26  路  5Comments

DocValerian picture DocValerian  路  4Comments

joshbranham picture joshbranham  路  3Comments

argusua picture argusua  路  5Comments

justinsb picture justinsb  路  4Comments