Hi,
What would be the expected impact if one were to deploy CA without this volume:
Fargate on EKS doesn't support running pods with volumes, but I was hoping to run CA there in order to guarantee it can run independent of compute capacity provided by EC2 ASGs.
Kind regards
Andrew
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Since this hostPath volume is just pulling in the node-local CA Certificate list, couldn't we just have one included in the container image instead? Or provided as a ConfigMap, perhaps.
Adding it the list of CAs as configmap, does not allow to connect to the endpoint to obtain the AWS region
aws_cloud_provider.go:361] Failed to get AWS Region: Error fetching http://169.254.169.254/latest/dynamic/instance-identity/document
Is this endpoint accessible from fargate?
That problem will be unrelated to the CA list, because it's using http, not https. So it won't have any certificates to check. I'm reasonably sure the Instance Identity Documents aren't available on Fargate, based on poking around at Google.
It can get the Availability Zone from the Task Metadata Endpoint v4 though, if that's all it needs from the instance identity. Or even just allow the region to be specified in the config, which reduces dependency on the Instance Metadata Service, which helps when trying to disable Pod access to IMDS.
A quick look at the implementation suggests that if the AWS_REGION env-var is set for Cluster Autoscaler, it won't try to access IMDS anyway.
The Cluster Autoscaler 1.0.3 Helm chart will set the AWS_REGION env-var to the the awsRegion value. This chart currently deploys CA 1.18.1, but if the image.tag value is set to v1.18.2, it should work fine.
So getting past this step should be doable.
I just noticed that the Helm chart does not have the hostPath volume shown in the ticket description. It's in many (perhaps all?) of the per-cloudprovider examples, and mentioned in the AWS README, but it's not mentioned _why_ this is needed, it appears to have been in the README since it was first written in 2016, and going further back, was in the very first version of the Deployment, before AWS was added.
So I _suspect_ that the inclusion of this hostPath mount was because the first version of the CA Dockerfile was based on a busybox image or older Ubuntu image with out-of-date ssl certs, and has just been carried forwards into every example, but was _not_ carried into the Helm charts.
It might be worth identifying if this hostPath still serves a purpose (do we still have a problem with base images with too-old ca-certificates?) and if not, drop it from all the example docs and manifests as a simplification, and to better-align the Helm chart and the example manifests.
For reference I've been running the CA on EKS Fargate for months just fine. If the image contains a not too old ca-bundle everything should just work I think.
For anyone else trying to set up cluster-autoscaler on a fargate node, I had to:
ssl-certs volume: it doesn't seem to have an effect?AWS_REGION env var in the deployment since I think the function that infers the region is dependent on some EC2 instance API callscluster-autoscaler service account by following these stepsIssues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
It'd be good to follow through some or all of @bubba's changes into the source, so we can close this ticket out.
Most helpful comment
For anyone else trying to set up cluster-autoscaler on a fargate node, I had to:
ssl-certsvolume: it doesn't seem to have an effect?AWS_REGIONenv var in the deployment since I think the function that infers the region is dependent on some EC2 instance API callscluster-autoscalerservice account by following these steps