Autoscaler: Having a volume makes CA incompatible with EKS Fargate

Created on 31 Mar 2020 · 11Comments · Source: kubernetes/autoscaler

Hi,

What would be the expected impact if one were to deploy CA without this volume:

https://github.com/kubernetes/autoscaler/blob/b3a95a82debe214e9d85728218ea2af6c8558ea8/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml#L164

Fargate on EKS doesn't support running pods with volumes, but I was hoping to run CA there in order to guarantee it can run independent of compute capacity provided by EC2 ASGs.

Kind regards

Andrew

Source

drewhemm

👍1

Most helpful comment

For anyone else trying to set up cluster-autoscaler on a fargate node, I had to:

Remove the ssl-certs volume: it doesn't seem to have an effect?
Set the AWS_REGION env var in the deployment since I think the function that infers the region is dependent on some EC2 instance API calls
Add the permissions specified here to the cluster-autoscaler service account by following these steps

bubba on 22 Dec 2020

👍2

All 11 comments

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 29 Jun 2020

/remove-lifecycle stale

TBBle on 29 Jul 2020

Since this hostPath volume is just pulling in the node-local CA Certificate list, couldn't we just have one included in the container image instead? Or provided as a ConfigMap, perhaps.

TBBle on 29 Jul 2020

Adding it the list of CAs as configmap, does not allow to connect to the endpoint to obtain the AWS region

aws_cloud_provider.go:361] Failed to get AWS Region: Error fetching http://169.254.169.254/latest/dynamic/instance-identity/document

Is this endpoint accessible from fargate?

javierdompablo on 14 Sep 2020

That problem will be unrelated to the CA list, because it's using http, not https. So it won't have any certificates to check. I'm reasonably sure the Instance Identity Documents aren't available on Fargate, based on poking around at Google.

It can get the Availability Zone from the Task Metadata Endpoint v4 though, if that's all it needs from the instance identity. Or even just allow the region to be specified in the config, which reduces dependency on the Instance Metadata Service, which helps when trying to disable Pod access to IMDS.

TBBle on 14 Sep 2020

A quick look at the implementation suggests that if the AWS_REGION env-var is set for Cluster Autoscaler, it won't try to access IMDS anyway.

The Cluster Autoscaler 1.0.3 Helm chart will set the AWS_REGION env-var to the the awsRegion value. This chart currently deploys CA 1.18.1, but if the image.tag value is set to v1.18.2, it should work fine.

So getting past this step should be doable.

TBBle on 22 Sep 2020

I just noticed that the Helm chart does not have the hostPath volume shown in the ticket description. It's in many (perhaps all?) of the per-cloudprovider examples, and mentioned in the AWS README, but it's not mentioned _why_ this is needed, it appears to have been in the README since it was first written in 2016, and going further back, was in the very first version of the Deployment, before AWS was added.

So I _suspect_ that the inclusion of this hostPath mount was because the first version of the CA Dockerfile was based on a busybox image or older Ubuntu image with out-of-date ssl certs, and has just been carried forwards into every example, but was _not_ carried into the Helm charts.

It might be worth identifying if this hostPath still serves a purpose (do we still have a problem with base images with too-old ca-certificates?) and if not, drop it from all the example docs and manifests as a simplification, and to better-align the Helm chart and the example manifests.

TBBle on 24 Oct 2020

For reference I've been running the CA on EKS Fargate for months just fine. If the image contains a not too old ca-bundle everything should just work I think.

Unichron on 27 Oct 2020

👍2

For anyone else trying to set up cluster-autoscaler on a fargate node, I had to:

Remove the ssl-certs volume: it doesn't seem to have an effect?
Set the AWS_REGION env var in the deployment since I think the function that infers the region is dependent on some EC2 instance API calls
Add the permissions specified here to the cluster-autoscaler service account by following these steps

bubba on 22 Dec 2020

👍2

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot on 22 Mar 2021

/remove-lifecycle stale

It'd be good to follow through some or all of @bubba's changes into the source, so we can close this ticket out.

TBBle on 22 Mar 2021

Was this page helpful?

0 / 5 - 0 ratings