Recently AWS EKS supports EC2 Instance Metadata Service v2.
In my testing environment, I create a worker node with IMDSv2 only and it requires to use token-backed sessions to access IMDS.
However with this condition, CA seems cannot unmarshall it.
I1008 18:57:01.160950 1 aws_util.go:150] fetching http://169.254.169.254/latest/dynamic/instance-identity/document
..........
W1008 18:57:01.760556 1 aws_util.go:166] Error unmarshalling http://169.254.169.254/latest/dynamic/instance-identity/document, skip...
Check the CA pod, it keeps OOMed and results in CrashLoopBackOff.
# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-5b5489859f-2pkdt 0/1 CrashLoopBackOff 6 13m
# kubectl describe pod cluster-autoscaler-5b5489859f-2pkdt -n kube-system
Name: cluster-autoscaler-5b5489859f-2pkdt
Namespace: kube-system
Priority: 0
Node: ip-172-31-23-13.ap-northeast-1.compute.internal/172.31.23.13
Start Time: Thu, 08 Oct 2020 19:22:15 +0000
Labels: app=cluster-autoscaler
pod-template-hash=5b5489859f
Annotations: kubernetes.io/psp: eks.privileged
prometheus.io/port: 8085
prometheus.io/scrape: true
Status: Running
IP: 172.31.20.73
IPs: <none>
Controlled By: ReplicaSet/cluster-autoscaler-5b5489859f
Containers:
cluster-autoscaler:
Container ID: docker://8cea864df872af960650f9f01061ca52e62855f680306238f75a12cbc798f8a5
Image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.15.7
Image ID: docker-pullable://k8s.gcr.io/autoscaling/cluster-autoscaler@sha256:6641a69b4ea5f911ccbb11b75b2675261d90bf169f612c9e960f60036336d664
Port: <none>
Host Port: <none>
Command:
./cluster-autoscaler
--v=4
--stderrthreshold=info
--cloud-provider=aws
--skip-nodes-with-local-storage=false
--expander=least-waste
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/LAB-EKS-15
--balance-similar-node-groups
--skip-nodes-with-system-pods=false
State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Thu, 08 Oct 2020 19:32:27 +0000
Finished: Thu, 08 Oct 2020 19:33:06 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Thu, 08 Oct 2020 19:29:07 +0000
Finished: Thu, 08 Oct 2020 19:29:46 +0000
If uses IMDSv1 back, it works without issue as following:
I1008 19:05:20.256839 1 aws_util.go:150] fetching http://169.254.169.254/latest/dynamic/instance-identity/document
I1008 19:05:38.256216 1 aws_cloud_provider.go:380] Successfully load 354 EC2 Instance Types [u-9tb1 m5n.8xlarge z1d.12xlarge m5dn.12xlarge m5.12xlarge c5d.4xlarge c5d.xlarge r6g.2xlarge m4.4xlarge c5.24xlarge r3.8xlarge i3en.24xlarge i3.4xlarge a1.xlarge r5ad.large r5dn.metal x1e u-9tb1.metal m5dn.16xlarge r5n.4xlarge t3.small c5n.2xlarge m5ad.large t3.micro c5d.2xlarge c1.xlarge r5a.24xlarge t3.large r6g.metal r5a.xlarge c6g.xlarge i3en.metal g4dn.xlarge r6g.16xlarge c3.large i2.4xlarge r5d.xlarge t4g.small t3a.xlarge c3.8xlarge m5d.4xlarge r5ad.xlarge h1 c5d.18xlarge u-6tb1.metal p2.8xlarge m6g.2xlarge c5d.metal i3en.2xlarge
........
I1008 19:05:44.609556 1 auto_scaling_groups.go:354] Regenerating instance to ASG map for ASGs: []
I1008 19:05:44.609579 1 aws_manager.go:266] Refreshed ASG list, next refresh after 2020-10-08 19:06:44.609574794 +0000 UTC m=+102.445561263
I1008 19:05:44.609801 1 main.go:271] Registered cleanup signal handler
I1008 19:05:44.610023 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I1008 19:05:44.610039 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 1.791碌s
I1008 19:05:54.610015 1 static_autoscaler.go:187] Starting main loop
I1008 19:05:54.610119 1 utils.go:622] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I1008 19:05:54.610130 1 filter_out_schedulable.go:63] Filtering out schedulables
I1008 19:05:54.610168 1 filter_out_schedulable.go:80] No schedulable pods
I1008 19:05:54.610188 1 static_autoscaler.go:334] No unschedulable pods
I1008 19:05:54.610203 1 static_autoscaler.go:381] Calculating unneeded nodes
I suspect CA does not use token-backed sessions to access IMDS.
Got hit with this too, EKS 1.17
We worked around this issue by injecting the AWS_REGION environment variable to the cluster-autoscaler container. Obviously not an ideal solution, which would be to add support for it, but it works.
We worked around this issue by injecting the
AWS_REGIONenvironment variable to the cluster-autoscaler container. Obviously not an ideal solution, which would be to add support for it, but it works.
I was not able to workaround this issue by injecting AWS_REGION or AWS_DEFAULT_REGION into the aws-cluster-autoscaler container. With v1 metadata service [token optional] cluster-autoscaler does not error and has no issues.
Error log / behavior with IMDSv2 [token required]:
I1130 21:13:10.946968 1 aws_cloud_provider.go:371] Successfully load 392 EC2 Instance Types [...truncated...]
E1130 21:13:14.176281 1 aws_manager.go:262] Failed to regenerate ASG cache: cannot autodiscover ASGs: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
F1130 21:13:14.176302 1 aws_cloud_provider.go:376] Failed to create AWS Manager: cannot autodiscover ASGs: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Here's our cluster-autoscaler helm release [chart v9.1.0 setting awsRegion and autoDiscovery.clusterName] as well as attempting to set the ENV variable:
resource "helm_release" "cluster_autoscaler" {
depends_on = [
module.eks, # Wait for cluster to be ready
]
repository = "https://kubernetes.github.io/autoscaler"
chart = "cluster-autoscaler"
version = "9.1.0"
name = "cluster-autoscaler"
namespace = "kube-system"
values = [
# Values set from terraform outputs
<<EOL
awsRegion: ${module.eks.cluster_region}
autoDiscovery:
clusterName: ${module.eks.cluster_name}
EOL
,
# Workaround issue with IMDSv2
# Inject AWS_DEFAULT_REGION into environment
# https://github.com/kubernetes/autoscaler/issues/3592
<<EOL
extraEnv:
AWS_DEFAULT_REGION: ${module.eks.cluster_region}
EOL
,
] # End helm_release.values[]
}
and resulting pod description -- AWS_REGION is already set from the chart:
Name: cluster-autoscaler-aws-cluster-autoscaler-c4b7bdd58-cm2d2
Namespace: kube-system
Priority: 0
Node: ip-10-100-1-57.us-west-2.compute.internal/10.100.1.57
Start Time: Mon, 30 Nov 2020 13:06:38 -0800
Labels: app.kubernetes.io/instance=cluster-autoscaler
app.kubernetes.io/name=aws-cluster-autoscaler
pod-template-hash=c4b7bdd58
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 10.100.0.110
IPs:
IP: 10.100.0.110
Controlled By: ReplicaSet/cluster-autoscaler-aws-cluster-autoscaler-c4b7bdd58
Containers:
aws-cluster-autoscaler:
Container ID: docker://f91c44b21712ebcf385dfd687c5631dd44ceeb76d25afb765e6b9a5cfc43f96c
Image: us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.18.1
Image ID: docker-pullable://us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler@sha256:1f5b11617389b8e4ce15eb45fdbbfd4321daeb63c234d46533449ab780b6ca9a
Port: 8085/TCP
Host Port: 0/TCP
Command:
./cluster-autoscaler
--cloud-provider=aws
--namespace=kube-system
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/kg-cet-917-staging-us-west-2
--logtostderr=true
--stderrthreshold=info
--v=4
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Mon, 30 Nov 2020 13:10:10 -0800
Finished: Mon, 30 Nov 2020 13:10:16 -0800
Ready: False
Restart Count: 5
Liveness: http-get http://:8085/health-check delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
AWS_REGION: us-west-2
AWS_DEFAULT_REGION: us-west-2
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from cluster-autoscaler-aws-cluster-autoscaler-token-dlxmc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cluster-autoscaler-aws-cluster-autoscaler-token-dlxmc:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-autoscaler-aws-cluster-autoscaler-token-dlxmc
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m43s default-scheduler Successfully assigned kube-system/cluster-autoscaler-aws-cluster-autoscaler-c4b7bdd58-cm2d2 to ip-10-100-1-57.us-west-2.compute.internal
Normal Pulling 4m42s kubelet Pulling image "us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.18.1"
Normal Pulled 4m40s kubelet Successfully pulled image "us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.18.1"
Warning BackOff 2m52s (x9 over 4m10s) kubelet Back-off restarting failed container
Normal Created 2m38s (x5 over 4m40s) kubelet Created container aws-cluster-autoscaler
Normal Started 2m38s (x5 over 4m39s) kubelet Started container aws-cluster-autoscaler
Normal Pulled 2m38s (x4 over 4m16s) kubelet Container image "us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.18.1" already present on machine
kubectl version:
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-12T01:09:16Z", GoVersion:"go1.15.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.12-eks-7684af", GitCommit:"7684af4ac41370dd109ac13817023cb8063e3d45", GitTreeState:"clean", BuildDate:"2020-10-20T22:57:40Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
helm version:
version.BuildInfo{Version:"v3.4.1", GitCommit:"c4e74854886b2efe3321e185578e6db9be0a6e29", GitTreeState:"dirty", GoVersion:"go1.15.4"}
I was not able to workaround this issue by injecting AWS_REGION or AWS_DEFAULT_REGION environment into the aws-cluster-autoscaler either.
Also, there are other issues #3276 #3216 related to the load the Instance Type list from pricing API. Thus, I upgraded to the latest version 1.20, and added --aws-use-static-instance-list=true flag. However, it still keeps Terminated with 255 exit code and results in CrashLoopBackOff status.
Here are error log message with IMDSv2 [token required]:
$ kubectl -n kube-system logs deployment.apps/cluster-autoscaler
...
...
I0108 07:20:04.590454 1 reflector.go:255] Listing and watching *v1.Node from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I0108 07:20:04.944164 1 cloud_provider_builder.go:29] Building aws cloud provider.
W0108 07:20:04.944198 1 aws_cloud_provider.go:349] Use static EC2 Instance Types and list could be outdated. Last update time: 2019-10-14
I0108 07:20:04.945035 1 reflector.go:219] Starting reflector *v1.PersistentVolumeClaim (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.945051 1 reflector.go:255] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.945402 1 reflector.go:219] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.945415 1 reflector.go:255] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.945683 1 reflector.go:219] Starting reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.945695 1 reflector.go:255] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.945952 1 reflector.go:219] Starting reflector *v1.StatefulSet (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.945964 1 reflector.go:255] Listing and watching *v1.StatefulSet from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.946231 1 reflector.go:219] Starting reflector *v1.PersistentVolume (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.946242 1 reflector.go:255] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.946531 1 reflector.go:219] Starting reflector *v1.StorageClass (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.946542 1 reflector.go:255] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.946838 1 reflector.go:219] Starting reflector *v1.CSINode (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:04.946850 1 reflector.go:255] Listing and watching *v1.CSINode from k8s.io/client-go/informers/factory.go:134
I0108 07:20:05.039201 1 reflector.go:219] Starting reflector *v1beta1.PodDisruptionBudget (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:05.039225 1 reflector.go:255] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/client-go/informers/factory.go:134
I0108 07:20:05.539276 1 reflector.go:219] Starting reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:05.539475 1 reflector.go:255] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:134
I0108 07:20:05.543333 1 reflector.go:219] Starting reflector *v1.ReplicationController (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:05.543349 1 reflector.go:255] Listing and watching *v1.ReplicationController from k8s.io/client-go/informers/factory.go:134
I0108 07:20:05.543835 1 reflector.go:219] Starting reflector *v1.ReplicaSet (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:20:05.543850 1 reflector.go:255] Listing and watching *v1.ReplicaSet from k8s.io/client-go/informers/factory.go:134
$ kubectl get po -A -w | grep "cluster"
kube-system cluster-autoscaler-bcbc77bc7-lcsf5 1/1 Running 0 2m7s
kube-system cluster-autoscaler-bcbc77bc7-lcsf5 0/1 Error 0 2m21s
kube-system cluster-autoscaler-bcbc77bc7-lcsf5 1/1 Running 1 2m23s
$ kubectl -n kube-system describe po cluster-autoscaler-bcbc77bc7-lcsf5
Name: cluster-autoscaler-bcbc77bc7-lcsf5
Namespace: kube-system
Priority: 0
Node: ip-192-168-33-189.ap-northeast-1.compute.internal/192.168.33.189
Start Time: Fri, 08 Jan 2021 07:19:44 +0000
Labels: app=cluster-autoscaler
pod-template-hash=bcbc77bc7
Annotations: kubectl.kubernetes.io/restartedAt: 2021-01-08T05:40:22Z
kubernetes.io/psp: eks.privileged
prometheus.io/port: 8085
prometheus.io/scrape: true
Status: Running
IP: 192.168.43.50
IPs:
IP: 192.168.43.50
Controlled By: ReplicaSet/cluster-autoscaler-bcbc77bc7
Containers:
cluster-autoscaler:
Container ID: docker://2f0a7f6f1f514c0c75c75499020e788886da125fe1c865cebd0647bb3bf95a64
Image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.20.0
Image ID: docker-pullable://k8s.gcr.io/autoscaling/cluster-autoscaler@sha256:1c19fa17b29db548d0304e9444adf84e8a6f38ee4c0a12d2ecaf262cb10c0e50
Port: <none>
Host Port: <none>
Command:
./cluster-autoscaler
--v=4
--stderrthreshold=info
--cloud-provider=aws
--skip-nodes-with-local-storage=false
--expander=least-waste
--aws-use-static-instance-list=true
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/EKS-LAB
State: Running
Started: Fri, 08 Jan 2021 07:22:07 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Fri, 08 Jan 2021 07:19:46 +0000
Finished: Fri, 08 Jan 2021 07:22:05 +0000
Ready: True
Restart Count: 1
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 100m
memory: 300Mi
Environment:
AWS_REGION: ap-northeast-1
AWS_DEFAULT_REGION: ap-northeast-1
AWS_ROLE_ARN: arn:aws:iam::561333300361:role/eksctl-EKS-LAB-addon-iamserviceaccount-kube-Role1-ZKVBFVVOBNUX
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/etc/ssl/certs/ca-certificates.crt from ssl-certs (ro)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from cluster-autoscaler-token-vkd8b (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs/ca-bundle.crt
HostPathType:
cluster-autoscaler-token-vkd8b:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-autoscaler-token-vkd8b
Optional: false
QoS Class: Guaranteed
Node-Selectors: ng=console
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m37s default-scheduler Successfully assigned kube-system/cluster-autoscaler-bcbc77bc7-lcsf5 to ip-192-168-33-189.ap-northeast-1.compute.internal
Normal Pulling 77s (x2 over 3m37s) kubelet Pulling image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.20.0"
Normal Pulled 76s (x2 over 3m36s) kubelet Successfully pulled image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.20.0"
Normal Created 76s (x2 over 3m36s) kubelet Created container cluster-autoscaler
Normal Started 75s (x2 over 3m36s) kubelet Started container cluster-autoscaler
Rollback to the worker node with IMDSv1.
$ kubectl -n kube-system logs deployment.apps/cluster-autoscaler
...
...
I0108 07:15:03.847604 1 reflector.go:255] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/client-go/informers/factory.go:134
I0108 07:15:03.847633 1 reflector.go:219] Starting reflector *v1.StatefulSet (0s) from k8s.io/client-go/informers/factory.go:134
I0108 07:15:03.847640 1 reflector.go:255] Listing and watching *v1.StatefulSet from k8s.io/client-go/informers/factory.go:134
I0108 07:15:03.943862 1 request.go:591] Throttling request took 96.568619ms, request: GET:https://10.100.0.1:443/api/v1/persistentvolumes?limit=500&resourceVersion=0
I0108 07:15:04.243872 1 request.go:591] Throttling request took 396.383321ms, request: GET:https://10.100.0.1:443/api/v1/pods?limit=500&resourceVersion=0
I0108 07:15:07.069368 1 auto_scaling_groups.go:351] Regenerating instance to ASG map for ASGs: [eks-96bb7009-0e0a-3450-075d-3c7ed43c94e6]
I0108 07:15:07.180416 1 auto_scaling.go:199] 1 launch configurations already in cache
I0108 07:15:07.180443 1 auto_scaling_groups.go:136] Registering ASG eks-96bb7009-0e0a-3450-075d-3c7ed43c94e6
I0108 07:15:07.180456 1 aws_manager.go:269] Refreshed ASG list, next refresh after 2021-01-08 07:16:07.180451669 +0000 UTC m=+81.757019680
I0108 07:15:07.180599 1 main.go:279] Registered cleanup signal handler
I0108 07:15:07.180643 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0108 07:15:07.180654 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 4.43碌s
I0108 07:15:17.180736 1 static_autoscaler.go:229] Starting main loop
W0108 07:15:17.181232 1 clusterstate.go:436] AcceptableRanges have not been populated yet. Skip checking
I0108 07:15:17.181367 1 filter_out_schedulable.go:65] Filtering out schedulables
I0108 07:15:17.181381 1 filter_out_schedulable.go:132] Filtered out 0 pods using hints
I0108 07:15:17.181390 1 filter_out_schedulable.go:170] 0 pods were kept as unschedulable based on caching
I0108 07:15:17.181397 1 filter_out_schedulable.go:171] 0 pods marked as unschedulable can be scheduled.
I0108 07:15:17.181464 1 filter_out_schedulable.go:82] No schedulable pods
I0108 07:15:17.181490 1 static_autoscaler.go:402] No unschedulable pods
I0108 07:15:17.181509 1 static_autoscaler.go:449] Calculating unneeded nodes
Hi Contributors @mwielgus @losipiuk @aleksandra-malinowska @bskiba. As this is causing eks cluster not be upgraded to IMDSv2 support, Can this issue be prioritized, I suspect CA does not use token-backed sessions to access IMDS. CA pod, it keeps OOMed and results in CrashLoopBackOff. Thank you.
It appears there are multiple symptoms here.
NoCredentialProviders: no valid providers in chain.My guess is that (1) is a spurious error, it's difficult to tell. @hans72118, can you follow up with your memory settings? I'll take a look at how IMDSv2 works and what the path forward is here to make sure CAS can use these tokens.
It looks like there's some custom imds logic here https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_util.go#L77. It's not clear why we don't rely on https://docs.aws.amazon.com/sdk-for-go/api/aws/ec2metadata/#EC2Metadata.GetMetadata
It should be possible to skip this logic by using --aws-use-static-instance-list=true https://github.com/kubernetes/autoscaler/blob/43ab0309697271e6b2ad82dd4fc3a28132456399/cluster-autoscaler/main.go#L175
Alternatively, it should be possible to skip by including the AWS_REGION environment variable:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_util.go#L155
@focaaby, it's not clear from your logs or describe pods that this wasn't working for you. Looks like the CA started up normally and populated all listers/watchers?
Most helpful comment
Hi Contributors @mwielgus @losipiuk @aleksandra-malinowska @bskiba. As this is causing eks cluster not be upgraded to IMDSv2 support, Can this issue be prioritized, I suspect CA does not use token-backed sessions to access IMDS. CA pod, it keeps OOMed and results in CrashLoopBackOff. Thank you.