Autoscaler: cluster-autoscaler 1.15.5 does not work with IAM Roles for Service Accounts

Created on 11 Mar 2020  ยท  26Comments  ยท  Source: kubernetes/autoscaler

Just upgraded my EKS cluster to 1.15, and upgraded cluster-autoscaler along with it.

It appears that 1.15.5 has the same problem that we used to have with 1.14. It does not work with IAM Roles for Service Accounts.

Getting the following error:

E0311 03:36:42.052777 1 aws_manager.go:259] Failed to regenerate ASG cache: cannot autodiscover ASGs: AccessDenied: User: arn:aws:sts::<my-account>:assumed-role/eks-worker/i-059d924ed6f52a032 is not authorized to perform: autoscaling:DescribeTags

This should not be happening, as the autoscaling:DescribeTags permission is assigned to the service-account-based IAM Role, not to the eks-worker instance profile.

This problem was fixed in 1.14.6 but doesn't seem to have made it into 1.15.5

Most helpful comment

All 26 comments

I hit the same issue, downgrading to 1.14.7 seems to work fine. I believe we need to cherry-pick #2323 (or a more recent AWS SDK update).

/assign @Jeffwan

I will go fix this issue. I remember I did the cherry-pick. Need to double check.

Update: #2323 is missing in 1.15. I cherry-picked the one to use new Session but not go modules. I will make the change.

@Jeffwan Thank you!!

@losipiuk @aleksandra-malinowska Trying to cherry-pick a change to v1.15. that branch doesn't have ./hack/update-vendor.sh and k8s/kubernetes dependencies points to /tmp/abc/kubernetes.

Another thing is I notice build-binary and build-in-docker may use different dependencies?

# Using go mod
build-binary: clean deps
        $(ENVVAR) GOOS=$(GOOS) go build -o cluster-autoscaler ${TAGS_FLAG}

# Seems godep
build-in-docker: clean docker-builder
        docker run -v `pwd`:/gopath/src/k8s.io/autoscaler/cluster-autoscaler/ autoscaling-builder:latest bash -c 'cd /gopath/src/k8s.io/autoscaler/cluster-autoscaler && BUILD_TAGS=${BUILD_TAGS} make build-binary'

How can I manage dependencies in this case?

Bumping version should solve #2829 with sts private endpoints too

@Jeffwan: go.mod in the branch cluster-autoscaler-release-1.15 does not define specific versions of modules in k8s.io. That makes updating dependencies with go mod difficult. One possibility is to assume all k8s.io modules are on branch of k8s-v1.15.x, then run the test after upgrading aws-sdk-go. But then we need to checkout all modules' version an mount k8s.io to /tmp/abc/kubernetes path.

There is another hacky way is to manually import transitive dependencies (but not indirect) in aws-sdk-go v1.29.29 (latest on 30/03). I tested in my fork. The build and test did well without any issue. The target Docker image does not have exceptions anymore. Might it be the possible way?

@canhnt I sync with release team. We backport ./hack/update-vendor.sh from later branch and update base for those changes.. I update to v1.28.14 and problem should be resolved. Thanks for the feedback!

For folks who can not wait for upstream release, please either build cluster-autoscaler-release-1.15 by yourself or use my image seedjeffwan/cluster-autoscaler:1.15-dev to mitigate the problem for short term.

FYI: We'll be doing new patch releases on Monday: #2988.

Note that a similar patch will be required in the cluster-autoscaler 1.16 branch (since 1.16 is using aws-sdk-go 1.23.12, and IRSA support was introduced in 1.23.13). Not an immediate issue, but will be once AWS releases EKS with 1.16; if you're already doing a PR for 1.15, might as well do the same thing for 1.16.

@mbarrien Thanks for reminder, yes. I will update to 1.28.x to support regional STS endpoint as well

/close

@Jeffwan: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Note that a similar patch will be required in the cluster-autoscaler 1.16 branch (since 1.16 is using aws-sdk-go 1.23.12, and IRSA support was introduced in 1.23.13). Not an immediate issue, but will be once AWS releases EKS with 1.16; if you're already doing a PR for 1.15, might as well do the same thing for 1.16.

A PR has been filed to address 1.16 issues for EKS IRSA
https://github.com/kubernetes/autoscaler/pull/3003

It will be included in next version

Thank you @Jeffwan !!!

Please use this version https://github.com/kubernetes/autoscaler/releases/tag/cluster-autoscaler-1.15.6

@Jeffwan The image for the 1.15 version isn't in the registry:

โฏ docker pull k8s.gcr.io/cluster-autoscaler:v1.15.5
v1.15.5: Pulling from cluster-autoscaler
07508dcb5d70: Pull complete
57be50ba1432: Pull complete
845bb582d551: Pull complete
7696b1c466cf: Pull complete
Digest: sha256:36438427003b380d7ab3b28fb1818355a36de0f1cc623e7d13b4ac8680e14259
Status: Downloaded newer image for k8s.gcr.io/cluster-autoscaler:v1.15.5
k8s.gcr.io/cluster-autoscaler:v1.15.5

โฏ docker pull k8s.gcr.io/cluster-autoscaler:v1.15.6
Error response from daemon: manifest for k8s.gcr.io/cluster-autoscaler:v1.15.6 not found: manifest unknown: Failed to fetch "v1.15.6" from request "/v2/cluster-autoscaler/manifests/v1.15.6".

Use one of the official registry URLs listed at the bottom of the release notes. There was a reorganization of the registry locations.

Use one of the official registry URLs listed at the bottom of the release notes. There was a reorganization of the registry locations.

Thanks, that worked. I'll put in a PR on the cluster-autoscaler helm chart, which is still pointing at the old URL.

@johnjeffers Hi, i'm facing the same issue, can you pls share me your yaml file, I'm not able to figure out where i'm wrong. Thanks!

Hi,

I have the same problem with eu.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.18.2 while it works with eu.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.18.1

@allamand are you running cluster-autoscaler 1.18 in EKS?

@johnjeffers yes, but after switching several times it seems to also works with the .2 version.. maybe a problem with my IRSA config..

@allamand If you are on EKS, you should not be using 1.18.2. cluster-autoscaler docs say to use the version that corresponds to your Kubernetes version, and the latest Kubernetes for EKS is 1.17. So, for example, if your EKS is on Kubernetes 1.16, you should use cluster-autoscaler 1.16.x.

We are scoping 1.18 recently and it will be available soon. currently , there's no support. User has to choose corresponding minor version with largest patch version. CA uses scheduler logic to do simulation. Keep in mind this is important! :D

Was this page helpful?
0 / 5 - 0 ratings

Related issues

benmoss picture benmoss  ยท  4Comments

pkelleratwork picture pkelleratwork  ยท  5Comments

whereisaaron picture whereisaaron  ยท  7Comments

hjkatz picture hjkatz  ยท  4Comments

jadelafuente picture jadelafuente  ยท  4Comments