1. What kops version are you running? The command kops version, will display
this information.
Kops 1.11.
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Kubernetes 1.11.7
3. What cloud provider are you using?
AWS
After upgrading from Kubernetes 1.10.6 to 1.11.7 I have started getting this error in 2 of 3 of my kubernetes controller manager pods.
I0325 21:43:01.642353 1 controllermanager.go:123] Version: v1.11.7
W0325 21:43:01.643354 1 authentication.go:55] Authentication is disabled
I0325 21:43:01.643378 1 insecure_serving.go:49] Serving insecurely on [::]:10252
I0325 21:43:01.643615 1 leaderelection.go:203] attempting to acquire leader lease kube-system/kube-controller-manager...
E0325 21:43:01.644524 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:443: connect: connection refused
E0325 21:43:05.383757 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:443: connect: connection refused
E0325 21:43:09.565787 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:443: connect: connection refused
E0325 21:43:12.001546 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:443: connect: connection refused
E0325 21:43:16.144831 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:443: connect: connection refused
E0325 21:43:18.734905 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:443: connect: connection refused
E0325 21:43:21.105232 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:443: connect: connection refused
E0325 21:43:25.483964 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:443: connect: connection refused
E0325 21:43:28.156557 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:443: connect: connection refused
E0325 21:43:38.311152 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: endpoints "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get endpoints in the namespace "kube-system": RBAC: [clusterrole.rbac.authorization.k8s.io "system:basic-user" not found, clusterrole.rbac.authorization.k8s.io "system:discovery" not found, clusterrole.rbac.authorization.k8s.io "system:kube-controller-manager" not found]
I have checked and the cluster role that it is looking for is there and the other kube controller manager appears to be working correctly. The cluster also validates properly from kops validate cluster.
Any help would be much appreciated!
+1 having the same issue with 1.11.7 even without an upgrade on a previously working cluster
Just had this issue with an existing 1.11.7 cluster. kube-controller-manager was using the flag --use-service-account-credential but was not using the roles specified here : https://kubernetes.io/docs/reference/access-authn-authz/rbac/#controller-roles
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
@pydavid were you able to resolve the issue?
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I also facing this issue currently
I'm also facing this issue after upgrading 1.13.0 to 1.13.10,
E1005 12:44:08.494759 1 leaderelection.go:270] error retrieving resource lock kube-system/kube-controller-manager: endpoints "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get resource "endpoints" in API group "" in the namespace "kube-system"
I'm also in this boat.
For me, the story is different a little bit - a cluster running 1.14.8 was healthy and when I supplied values for the following switches, and did the update, the cluster became fried:
--service-account-signing-key-file
--api-audiences
--service-account-issuer
Logs (/var/log/kube-controller-manager.log) filled with this line:
...
E1113 23:33:05.660819 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-controller-manager: Get https://127.0.0.1/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: dial tcp 127.0.0.1:443: connect: connection refused
...
/reopen
@axozoid: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
+1 having the same issue with 1.11.7 even without an upgrade on a previously working cluster