Kops: Kubernetes 1.11 masters won't start when using RBAC

Created on 6 Nov 2018 · 16Comments · Source: kubernetes/kops

1. What kops version are you running? The command kops version, will display
this information.
Tried both 1.10 and 1.11-alpha1
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.11.4
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Created a cluster from template below
5. What happened after the commands executed?
Cluster was created, but the API DNS records never get updated, so the cluster / masters never become available. I do not see the dns-controller docker containers running either when ssh'ing into the machines.
6. What did you expect to happen?
The cluster to spin up as normal. This does happen when using 1.10+rbac or 1.11+alwaysAllow, so it seems related to the combination of 1.11 and RBAC.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  name: test.example.com
spec:
  cloudProvider: aws
  authorization:
    rbac: {}
  kubeAPIServer:
    authorizationRbacSuperUser: admin
  etcdClusters:
  - name: main
    enableEtcdTLS: true
    version: 3.2.24
    etcdMembers:
    - instanceGroup: master-us-east-1a
      name: us-east-1a
      encryptedVolume: true
    - instanceGroup: master-us-east-1d
      name: us-east-1d
      encryptedVolume: true
    - instanceGroup: master-us-east-1e
      name: us-east-1e
      encryptedVolume: true
  - name: events
    enableEtcdTLS: true
    version: 3.2.24
    etcdMembers:
    - instanceGroup: master-us-east-1a
      name: us-east-1a
      encryptedVolume: true
    - instanceGroup: master-us-east-1d
      name: us-east-1d
      encryptedVolume: true
    - instanceGroup: master-us-east-1e
      name: us-east-1e
      encryptedVolume: true
  kubernetesApiAccess:
  - 10.50.0.0/16
  kubernetesVersion: 1.11.4
  networkCIDR: 10.50.0.0/16
  networking:
    kubenet: {}
  sshAccess:
  - 10.50.0.0/16
  subnets:
  - name: us-east-1a
    type: Public
    zone: us-east-1a
  - name: us-east-1d
    type: Public
    zone: us-east-1d
  - name: us-east-1e
    type: Public
    zone: us-east-1e
  topology:
    dns:
      type: Public
    masters: public
    nodes: public
  kubelet:
    anonymousAuth: false

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: test.example.com
  name: master-us-east-1a
spec:
  associatePublicIp: true
  image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
  machineType: t2.medium
  maxSize: 1
  minSize: 1
  role: Master
  rootVolumeOptimization: false
  rootVolumeSize: 64
  subnets:
  - us-east-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: test.example.com
  name: master-us-east-1d
spec:
  associatePublicIp: true
  image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
  machineType: t2.medium
  maxSize: 1
  minSize: 1
  role: Master
  rootVolumeOptimization: false
  rootVolumeSize: 64
  subnets:
  - us-east-1d

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: test.example.com
  name: master-us-east-1e
spec:
  associatePublicIp: true
  image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
  machineType: t2.medium
  maxSize: 1
  minSize: 1
  role: Master
  rootVolumeOptimization: false
  rootVolumeSize: 64
  subnets:
  - us-east-1e

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: test.example.com
  name: nodes
spec:
  associatePublicIp: true
  image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
  machineType: m5.large
  maxSize: 1
  minSize: 1
  role: Node
  rootVolumeOptimization: true
  rootVolumeSize: 128
  subnets:
  - us-east-1a
  - us-east-1d
  - us-east-1e

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Cluster creation was succesfull from kops point of view, so not sure if this is applicable>
9. Anything else do we need to know?
If I check kubelet logs through journalctl I see quite some dial tcp 127.0.0.1:443: connect: connection refused logs, which I guess are related to the kubelet trying to get to the API and being refused?
I tried also using a single master and rebooting it after it had been up for a moment, to avoid race conditions, but that did not seem to matter.

lifecyclrotten

Source

bcorijn

👍5

Most helpful comment

@zonorti removing that line works perfectly for my RBAC setup. It's a flag that is no longer needed in 1.11.

bcorijn on 4 Jan 2019

👍7

All 16 comments

So after spinning up a new cluster from CLI and comparing output, it seems that it's this line that is causing this issue:

kubeAPIServer:
    authorizationRbacSuperUser: admin

bcorijn on 26 Nov 2018

I can definitely confirm that the authorization-rbac-super-user flag was removed. I think these days you can just bind your admin user(s) to the system:masters group.

I'll also add that I've run into certificate issues with enableEtcdTLS=true and running etcd 3.2. I was however, able to successfully boot up a new cluster with etcd 3.1, k8s 1.11.4, kops 1.11.0-alpha.1.

ripta on 27 Nov 2018

👍6

@ripta Should kops error out when using that authorization-rbac-super-user flag? I guess I should also remove it when I update my existing 1.10 cluster to 1.11.

I have not really noticed any issues with enabelEtcdTls, but will check a bit better now that you mentioned.

bcorijn on 3 Dec 2018

Thanks, guys. I can confirm that too. These lines should be definitely removed from the cluster config during the kops upgrade

kubeAPIServer:
    authorizationRbacSuperUser: admin

legal90 on 4 Jan 2019

👍4

@legal90 I am not sure what will it break. Can you confirm that RBAC is working fine after this change?

zonorti on 4 Jan 2019

@zonorti removing that line works perfectly for my RBAC setup. It's a flag that is no longer needed in 1.11.

bcorijn on 4 Jan 2019

👍7

removing that line worked for me as well.

while1eq1 on 10 Jan 2019

@legal90 I am not sure what will it break. Can you confirm that RBAC is working fine after this change?

@zonorti Yes, I confirm that too

legal90 on 10 Jan 2019

@bcorijn @while1eq1 @legal90 Thanks! I can also confirm that it did not break anything, clusters are working fine for at least 6 days after this change.

zonorti on 10 Jan 2019

I'm having same issue when upgrading from 1.10 to 1.11 with authorizationRbacSuperUser this field, removing it fix the problem

wanghanlin on 16 Jan 2019

We hit this too. Kops should throw a warning rather than blow up the cluster.

macropin on 21 Jan 2019

👍3

I also hit the issue, and this resolved it for me. A note around this in the release notes/upgrade notes for Kops would have saved searching around to resolve.

RAR on 12 Feb 2019

👍3

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 13 May 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 12 Jun 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 12 Jul 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.