1. What kops version are you running? The command kops version, will display
this information.
Tried both 1.10 and 1.11-alpha1
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.11.4
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Created a cluster from template below
5. What happened after the commands executed?
Cluster was created, but the API DNS records never get updated, so the cluster / masters never become available. I do not see the dns-controller docker containers running either when ssh'ing into the machines.
6. What did you expect to happen?
The cluster to spin up as normal. This does happen when using 1.10+rbac or 1.11+alwaysAllow, so it seems related to the combination of 1.11 and RBAC.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
name: test.example.com
spec:
cloudProvider: aws
authorization:
rbac: {}
kubeAPIServer:
authorizationRbacSuperUser: admin
etcdClusters:
- name: main
enableEtcdTLS: true
version: 3.2.24
etcdMembers:
- instanceGroup: master-us-east-1a
name: us-east-1a
encryptedVolume: true
- instanceGroup: master-us-east-1d
name: us-east-1d
encryptedVolume: true
- instanceGroup: master-us-east-1e
name: us-east-1e
encryptedVolume: true
- name: events
enableEtcdTLS: true
version: 3.2.24
etcdMembers:
- instanceGroup: master-us-east-1a
name: us-east-1a
encryptedVolume: true
- instanceGroup: master-us-east-1d
name: us-east-1d
encryptedVolume: true
- instanceGroup: master-us-east-1e
name: us-east-1e
encryptedVolume: true
kubernetesApiAccess:
- 10.50.0.0/16
kubernetesVersion: 1.11.4
networkCIDR: 10.50.0.0/16
networking:
kubenet: {}
sshAccess:
- 10.50.0.0/16
subnets:
- name: us-east-1a
type: Public
zone: us-east-1a
- name: us-east-1d
type: Public
zone: us-east-1d
- name: us-east-1e
type: Public
zone: us-east-1e
topology:
dns:
type: Public
masters: public
nodes: public
kubelet:
anonymousAuth: false
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: test.example.com
name: master-us-east-1a
spec:
associatePublicIp: true
image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
machineType: t2.medium
maxSize: 1
minSize: 1
role: Master
rootVolumeOptimization: false
rootVolumeSize: 64
subnets:
- us-east-1a
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: test.example.com
name: master-us-east-1d
spec:
associatePublicIp: true
image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
machineType: t2.medium
maxSize: 1
minSize: 1
role: Master
rootVolumeOptimization: false
rootVolumeSize: 64
subnets:
- us-east-1d
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: test.example.com
name: master-us-east-1e
spec:
associatePublicIp: true
image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
machineType: t2.medium
maxSize: 1
minSize: 1
role: Master
rootVolumeOptimization: false
rootVolumeSize: 64
subnets:
- us-east-1e
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: test.example.com
name: nodes
spec:
associatePublicIp: true
image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
machineType: m5.large
maxSize: 1
minSize: 1
role: Node
rootVolumeOptimization: true
rootVolumeSize: 128
subnets:
- us-east-1a
- us-east-1d
- us-east-1e
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Cluster creation was succesfull from kops point of view, so not sure if this is applicable>
9. Anything else do we need to know?
If I check kubelet logs through journalctl I see quite some dial tcp 127.0.0.1:443: connect: connection refused logs, which I guess are related to the kubelet trying to get to the API and being refused?
I tried also using a single master and rebooting it after it had been up for a moment, to avoid race conditions, but that did not seem to matter.
So after spinning up a new cluster from CLI and comparing output, it seems that it's this line that is causing this issue:
kubeAPIServer:
authorizationRbacSuperUser: admin
I can definitely confirm that the authorization-rbac-super-user flag was removed. I think these days you can just bind your admin user(s) to the system:masters group.
I'll also add that I've run into certificate issues with enableEtcdTLS=true and running etcd 3.2. I was however, able to successfully boot up a new cluster with etcd 3.1, k8s 1.11.4, kops 1.11.0-alpha.1.
@ripta Should kops error out when using that authorization-rbac-super-user flag? I guess I should also remove it when I update my existing 1.10 cluster to 1.11.
I have not really noticed any issues with enabelEtcdTls, but will check a bit better now that you mentioned.
Thanks, guys. I can confirm that too. These lines should be definitely removed from the cluster config during the kops upgrade
kubeAPIServer:
authorizationRbacSuperUser: admin
@legal90 I am not sure what will it break. Can you confirm that RBAC is working fine after this change?
@zonorti removing that line works perfectly for my RBAC setup. It's a flag that is no longer needed in 1.11.
removing that line worked for me as well.
@legal90 I am not sure what will it break. Can you confirm that RBAC is working fine after this change?
@zonorti Yes, I confirm that too
@bcorijn @while1eq1 @legal90 Thanks! I can also confirm that it did not break anything, clusters are working fine for at least 6 days after this change.
I'm having same issue when upgrading from 1.10 to 1.11 with authorizationRbacSuperUser this field, removing it fix the problem
We hit this too. Kops should throw a warning rather than blow up the cluster.
I also hit the issue, and this resolved it for me. A note around this in the release notes/upgrade notes for Kops would have saved searching around to resolve.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
@zonorti removing that line works perfectly for my RBAC setup. It's a flag that is no longer needed in 1.11.