Hi all ,
Running EKS 1.10 so I've been attempting to use the 1.2 release cluster autoscaler
The pod will spin up but looking at the error logs I notice
1 static_autoscaler.go:118] Failed to update node registry: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
I can confirm all the EKS worker nodes have the IAM policy attached to the instance
If there's any additional information I can give please let me know and I'll update this asap
Hi, it seems like a configuration issue specific to environment. None of Cluster Autoscaler's maintainers runs it on AWS, so we don't really know how to debug it. You may have better luck with getting help at #sig-aws slack or via some EKS support channel.
I also encountered this same issue today.
Cloud Provider: AWS
Kubernetes Version: 1.10.8
Cluster Autoscaler Version: 1.2.3
F1017 10:30:35.116205 1 cloud_provider_builder.go:112] Failed to create AWS cloud provider: Failed to get ASGs: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
goroutine 114 [running]:
k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog.stacks(0xc420b73300, 0xc422247180, 0xfa, 0x13c)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog/glog.go:766 +0xa7
k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog.(*loggingT).output(0x5618fa0, 0xc400000003, 0xc42035e4d0, 0x528d12b, 0x19, 0x70, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog/glog.go:717 +0x348
k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog.(*loggingT).printf(0x5618fa0, 0x3, 0x3739198, 0x27, 0xc422dbaff0, 0x1, 0x1)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog/glog.go:655 +0x14f
k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog.Fatalf(0x3739198, 0x27, 0xc422dbaff0, 0x1, 0x1)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog/glog.go:1145 +0x67
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder.CloudProviderBuilder.Build(0x7ffc87c6c835, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder/cloud_provider_builder.go:112 +0x76a
k8s.io/autoscaler/cluster-autoscaler/core.NewAutoscalingContext(0xa, 0x3fe0000000000000, 0x8bb2c97000, 0x1176592e000, 0x0, 0x4e200, 0x0, 0x186a00000, 0x0, 0x7ffc87c6c892, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaling_context.go:148 +0x466
k8s.io/autoscaler/cluster-autoscaler/core.NewStaticAutoscaler(0xa, 0x3fe0000000000000, 0x8bb2c97000, 0x1176592e000, 0x0, 0x4e200, 0x0, 0x186a00000, 0x0, 0x7ffc87c6c892, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:56 +0x14d
k8s.io/autoscaler/cluster-autoscaler/core.(*AutoscalerBuilderImpl).Build(0xc420ef8000, 0x0, 0x0, 0xc422dbbb38, 0x1477780)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaler_builder.go:71 +0x10e
k8s.io/autoscaler/cluster-autoscaler/core.(*PollingAutoscaler).Poll(0xc420b18780, 0x4, 0xed35905d0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/polling_autoscaler.go:81 +0x5a
k8s.io/autoscaler/cluster-autoscaler/core.(*PollingAutoscaler).RunOnce(0xc420b18780, 0xed35905d0, 0xe26e9aba5, 0x5618b60, 0x5618b60, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/polling_autoscaler.go:68 +0x76
main.run(0xc420b4b2c0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:263 +0x46f
main.main.func2(0xc420d96180)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:345 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:145 +0x97
I1017 10:30:58.799308 1 request.go:480] Throttling request took 66.209885ms, request: GET:https://100.64.0.1:443/api/v1/namespaces/kube-system/endpoints/cluster-autoscaler
I've done a little more digging into this problem in order to try and get a better understand of what's going on.
The above error is being produced here, therefore a call to glog.Fatalf() _should_ run os.Exit(255) according to the docs and the source code.
I assume this is supposed to terminate the process which in turn should result in the pod being restarted. However, this doesn't seem to be the case. In my experience, the container continues to run and log but doesn't seem to be performing any scaling tasks. Almost like it's stuck in a weird state?
I'm happy to work on a "fix" for this if anyone can confirm the desired functionality.
The above error is being produced here, therefore a call to glog.Fatalf() should run os.Exit(255) according to the docs and the source code.
I assume this is supposed to terminate the process which in turn should result in the pod being restarted. However, this doesn't seem to be the case. In my experience, the container continues to run and log but doesn't seem to be performing any scaling tasks. Almost like it's stuck in a weird state?
Can you share the full log? It should exit and be automatically restarted.
Unfortunately I don't have the logs to hand, but this has been a regular occurrence over the past week so I'll post them in full next time it happens.
I've definitely seem Fatalf restart container on GCP. I wonder if the difference in behavior could be because of how the binary is started? I think AWS documentation runs the binary itself, rather than using our run.sh wrapper script (GCP manifest: https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/manifests/cluster-autoscaler.manifest#L30).
Hi @Office-Manager, kube2iam maybe??
GKE aside. Like you, I look after a cluster on AWS. However, I'm not using EKS, rather a cluster provisioned with Kops.
I saw the same behaviour after deploying the cluster-autoscaler. I tried the stable chart and the example and saw the same NoCredentialProviders error in the logs. I was positive that I had the correct IAM permissions and noticed that I was unable to query AWS with the awscli from inside the cluster-autoscaler container. I then remembered that we also use kube2iam which can impose namespace restrictions. Going against the documentation, I deployed cluster-autoscaler to a different namespace. i.e. default. You could try that? Personally, I'd rather not modify the kube-system namespace. Also if using kube2iam, you may need to add the appropriate annotation iam.amazonaws.com/role: your-iam-roleto the deployment.
Hope that helps and I'll be interested to know how you fare? Good luck.
Hey @djsd123 ,
Yeah I ended up having to use kube2iam. It was the original intention to always use that but I thought as quick POC to just apply the IAM directly on the instance profile.
I never did get it working with the instance profile but it's working fine with Kube2iam
Just leaving this here for others who might be searching for this error.
I had the same thing, but in my case, it was self-inflicted.
I had enabled HTTP_PROXY environment variables, but forgot to include 169.254.169.254 in NO_PROXY.
Most helpful comment
Hey @djsd123 ,
Yeah I ended up having to use kube2iam. It was the original intention to always use that but I thought as quick POC to just apply the IAM directly on the instance profile.
I never did get it working with the instance profile but it's working fine with Kube2iam