Hello,
I have installed auto scaler using helm, the command used is as below
helm install stable/cluster-autoscaler \
--namespace kube-system \
--name cluster-autoscaler \
--set image.tag=v1.1.2 \
--set awsRegion=$REGION \
--set rbac.create=true \
--set autoscalingGroups\[0\].name=nodes.$NAME.$DNS_ZONE \
--set autoscalingGroups\[0\].minSize=$MIN_NODES \
--set autoscalingGroups\[0\].maxSize=$MAX_NODES \
--set podAnnotations."iam\.amazonaws\.com/role"=arn:aws:iam::$ACCOUNT_NUMBER:role/masters.$DNS_ZONE \
--set nodeSelector."node-role\.kubernetes\.io/master"="" \
--set tolerations\[0\].effect=NoSchedule \
--set tolerations\[0\].key="node-role.kubernetes.io/master"
Kubernetes version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-27T00:13:02Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
The cluster is having 15 nodes and one master. There are no much pods installed and I wanted to see if auto scaler will scale down. From the logs.
1 I0609 09:33:13.875968 1 static_autoscaler.go:332] Starting scale down
2 I0609 09:33:13.902816 1 scale_down.go:387] ip-4-5-6-7.region.compute.internal was unneeded for 6m48.344665061s
3 I0609 09:33:13.902839 1 scale_down.go:387] 1.2.3.region.compute.internal was unneeded for 6m58.532568459s
4 I0609 09:33:13.902849 1 scale_down.go:446] No candidates for scale down
5 I0609 09:33:14.243737 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
6 I0609 09:33:16.315108 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
7 I0609 09:33:18.345279 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
8 I0609 09:33:20.352730 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
9 I0609 09:33:22.360454 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
10 I0609 09:33:23.916035 1 static_autoscaler.go:108] Starting main loop
11 I0609 09:33:24.023676 1 static_autoscaler.go:240] Filtering out schedulables
12 I0609 09:33:24.024058 1 static_autoscaler.go:250] No schedulable pods
13 I0609 09:33:24.024076 1 static_autoscaler.go:257] No unschedulable pods
14 I0609 09:33:24.024085 1 static_autoscaler.go:299] Calculating unneeded nodes
15 I.b.c.d:%s/example.ip0609 09:33:24.059105 1 utils.go:399] Skipping ip-6-7-8-9.region.compute.internal - no node group config
16 I0609 09:33:24.059593 1 scale_down.go:175] Scale-down calculation: ignoring 12 nodes, that were unremovable in the last 5m0s
17 I0609 09:33:24.059612 1 scale_down.go:207] Node ip-4-5-6-7.region.compute.internal - utilization 0.162500
18 I0609 09:33:24.059625 1 scale_down.go:207] Node 1.2.3.region.compute.internal - utilization 0.216500
19 I0609 09:33:24.059638 1 scale_down.go:207] Node example.ip.compute.internal - utilization 0.725000
20 I0609 09:33:24.059644 1 scale_down.go:211] Node example.ip.compute.internal is not suitable for removal - utilization too big (0.725000)
21 I0609 09:33:24.101380 1 cluster.go:78] Fast evaluation: 1.2.3.region.compute.internal for removal
22 I0609 09:33:24.101519 1 cluster.go:200] Pod monitoring/prometheus-operator-5d564c684d-x8shd can be moved to ip-172-20-51-4.eu-west-1.compute.internal
23 I0609 09:33:24.101724 1 cluster.go:200] Pod monitoring/kube-prometheus-exporter-kube-state-5cd969f745-td4z9 can be moved to ip-172-20-51-4.eu-west-1.compute.internal
24 I0609 09:33:24.102028 1 cluster.go:109] Fast evaluation: node 1.2.3.region.compute.internal may be removed
25 I0609 09:33:24.102193 1 static_autoscaler.go:314] 1.2.3.region.compute.internal is unneeded since 2018-06-09 09:26:15.199568305 +0000 UTC duration 7m8.716444732s
26 I0609 09:33:24.102216 1 static_autoscaler.go:314] ip-4-5-6-7.region.compute.internal is unneeded since 2018-06-09 09:26:25.387471703 +0000 UTC duration 6m58.528541334s
27 I0609 09:33:24.102226 1 static_autoscaler.go:329] Scale down status: unneededOnly=false lastScaleUpTime=2018-06-06 21:12:31.471420721 +0000 UTC lastScaleDownDeleteTime= 2018-06-06 21:12:31.471421196 +0000 UTC lastScaleDownFailTime=2018-06-06 21:12:31.471421535 +0000 UTC schedulablePodsPresent=false isDeleteInProgress=false
28 I0609 09:33:24.102243 1 static_autoscaler.go:332] Starting scale down
29 I0609 09:33:24.139374 1 scale_down.go:387] ip-4-5-6-7.region.compute.internal was unneeded for 6m58.528541334s
30 I0609 09:33:24.139397 1 scale_down.go:387] 1.2.3.region.compute.internal was unneeded for 7m8.716444732s
31 I0609 09:33:24.139408 1 scale_down.go:446] No candidates for scale down
32 I0609 09:33:24.368897 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
Can some one give me a clue !?
I have reduced number of pods running in kube-system as I read auto scaler will not scale down a node by default if there is a pod on it which is running in kube-system.
Thanks
I see this behaviour from logs. These are settings I am having
Command:
./cluster-autoscaler
--cloud-provider=aws
--namespace=kube-system
--nodes=5:15:nodes.qa-lab.com
--logtostderr=true
--stderrthreshold=info
--v=4
But from logs
I0611 08:09:08.533810 1 factory.go:33] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"0a16f454-6c83-11e8-83c0-0a7944bf8e00", APIVersion:"v1", ResourceVersion:"32365261", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-a-b-c-d.eu-west-1.compute.internal
I0611 08:09:08.540473 1 delete.go:53] Successfully added toBeDeletedTaint on node ip-a-b-c-d.eu-west-1.compute.internal
E0611 08:09:08.617737 1 scale_down.go:641] Problem with empty node deletion: failed to delete ip-a-b-c-d.eu-west-1.compute.internal: ValidationError: Currently, desiredSize equals minSize (15). Terminating instance without replacement will violate group's min size constraint. Either set shouldDecrementDesiredCapacity flag to false or lower group's min size.
status code: 400, request id: b75b2329-6d4e-11e8-8abc-51bdbac6ff0c
E0611 08:09:08.617781 1 static_autoscaler.go:350] Failed to scale down: <nil>
I0611 08:09:08.620770 1 delete.go:106] Releasing taint {Key:ToBeDeletedByClusterAutoscaler Value:1528704548 Effect:NoSchedule TimeAdded:<nil>} on node ip-a-b-c-d.eu-west-1.compute.internal
I0611 08:09:08.625658 1 delete.go:119] Successfully released toBeDeletedTaint on node ip-a-b-c-d.eu-west-1.compute.internal
I0611 08:09:08.625982 1 factory.go:33] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-a-b-c-d.eu-west-1.compute.internal", UID:"05cbc43c-3f1e-11e8-8ca5-0a003cf6c876", APIVersion:"v1", ResourceVersion:"32365311", FieldPath:""}): type: 'Warning' reason: 'ScaleDownFailed' failed to delete empty node: failed to delete ip-a-b-c-d.eu-west-1.compute.internal: ValidationError: Currently, desiredSize equals minSize (15). Terminating instance without replacement will violate group's min size constraint. Either set shouldDecrementDesiredCapacity flag to false or lower group's min size.
Thanks
I think I found the cause. I am using KOPS to create my cluster and in the instance group setting the min size of nodes were set to 15, this could be conflicting when cluster-autoscaler was trying to remove node size.
After I updated the nodes minSize and maxSize in KOPS config to match with that in cluster-autoscaler, scale down happened.
Most helpful comment
I think I found the cause. I am using KOPS to create my cluster and in the instance group setting the min size of nodes were set to 15, this could be conflicting when cluster-autoscaler was trying to remove node size.
After I updated the nodes minSize and maxSize in KOPS config to match with that in cluster-autoscaler, scale down happened.