cluster-autoscaler is not launching correctly on AWS with the pod logs giving the following error:
E0129 06:27:24.264163 1 static_autoscaler.go:135] Failed to update node registry: RequestError: send request failed
caused by: Post https://autoscaling.ap-southeast-2b.amazonaws.com/: dial tcp: lookup autoscaling.ap-southeast-2b.amazonaws.com on 172.20.0.2:53: no such host
Bug report information:
I have a running cluster with a number of instance groups. I am using the following config for the autoscaler deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
k8s-app: cluster-autoscaler
template:
metadata:
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
annotations:
# For 1.6, we keep the old tolerations in case of a downgrade to 1.5
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"dedicated", "value":"master"}]'
spec:
serviceAccountName: cluster-autoscaler
containers:
- name: cluster-autoscaler
image: gcr.io/google_containers/cluster-autoscaler:v1.1.0
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --nodes={{MIN_NODES_1}}:{{MAX_NODES_1}}:{{ASG_NAME_1}}
env:
- name: AWS_REGION
value: {{AWS_REGION}}
volumeMounts:
- name: ssl-certs
mountPath: /etc/ssl/certs/ca-certificates.crt
readOnly: true
imagePullPolicy: "Always"
volumes:
- name: ssl-certs
hostPath:
path: /etc/ssl/certs/ca-certificates.crt
dnsPolicy: Default
nodeSelector:
node-role.kubernetes.io/master: ""
tolerations:
- key: "node-role.kubernetes.io/master"
effect: NoSchedule
I have gone through the solutions in Issue 1796 but none of these seemed to help the situation. I have verified that {{ASG_NAME_1}} is being replaced with the correct name for the AWS autoscaling group. Also, the minimum number of nodes for the ASG is 0.
Finally, I have tried using different DNS policies and base images to no avail (v0.4.0 and v0.6.0 neither of which I could get to spin up without crashing).
Any help is greatly appreciated :)
I realised that I had been putting in the AZ instead of the region into the AWS_REGION placeholder. Please disregard.
Most helpful comment
I realised that I had been putting in the AZ instead of the region into the AWS_REGION placeholder. Please disregard.