I originally posted this here: https://github.com/kubernetes/autoscaler/issues/1064
Greetings,
I'm running cluster-autoscaler as part of a Kops cluster running on AWS. The problem is simple: when I run cluster-autoscaler on a regular node, it works fine, but when it's running on a master node, the pod times out, receives a CrashLoopBackOff, and then tries again to no avail. The following log message is the most applicable
F0715 16:55:30.458743 1 main.go:319] Failed to get nodes from apiserver: Get https://100.64.0.1:443/api/v1/nodes: dial tcp 100.64.0.1:443: i/o timeout
goroutine 1 [running]:
Kops version: 1.9.1
Kubernetes version: 1.9.6
Cloud provider: AWS
Commands ran: Followed the commands in the Kops cluster-autoscaler documention
What happened after the commands executed?: Pod starts, and then goes into a CrashLoopBackOff loops
What did you expect to happen?: The cluster-autoscaler app goes into its normal loop of checking if the ASG is in it's target node size.
Extra notes:
annotations:
iam.amazonaws.com/role: CompanyIamRole-ClusterAutoscalerEc2
Cluster manifest and full log output: https://gist.github.com/sc250024/81525c85f3cdfc60349b3bfdcce755af
I had the exact same issue and I fixed it by deleting the calico-node pod on the master node, then deleting auto-scaler pod. Autoscaler started fine after.
I am thinking to use cluster-autoscaler charts instead but they don't have a taint of running only on master nodes, which kops autoscaler resource has. So, i was wondering if there is a specific reason to run cluster-autoscaler on master nodes (instead of running them on nodes)? (Is it to prevent cluster-autoscaler running node itself to get removed by autoscaler?)
@prat0318 You can use the chart, and run it on master. Put the following in your values.yaml file:
First, make the master node available for scheduling:
tolerations:
- effect: "NoSchedule"
key: "node-role.kubernetes.io/master"
Optionally, you can run it only on master
nodeSelector:
kubernetes.io/role: "master"
@sc250024 Thanks a lot, that worked. Still wondering if there is a benefit of running it on master nodes instead of regular worker nodes?
@prat0318 It depends on your setup of course. I think the idea of running them on the master nodes is that the master nodes are more stable compared to the worker nodes. Personally, my worker nodes are all run on spot instances whereas the master nodes are dedicated.
If you're using EKS, AFAIK you can't schedule pods on the master nodes since they're managed by AWS. In that case, you can use the following podAnnotation to give the Cluster Autoscaler pods more priority on your worker nodes:
podAnnotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
Thanks @sc250024 !
Most helpful comment
I had the exact same issue and I fixed it by deleting the
calico-nodepod on the master node, then deleting auto-scaler pod. Autoscaler started fine after.