Autoscaler: How do I gain access to the cluster autoscaler on GKE?

Created on 14 Jun 2018 · 12Comments · Source: kubernetes/autoscaler

I am looking to modify some of the auto scaling options, but this does not seem to be possible on GKE?

It's not clear where to run these 'flags' mentioned in the FAQ or even where these command line flags need to be executed on.

areprovidegcp cluster-autoscaler

Source

chrissound

👍2

Most helpful comment

We spin up expensive high-memory instances on demand as slaves for our integration tests. The load is intermittent, so that extra 10 minutes for 10-30 instances, multiple times a day, gets quite expensive.

joshwand on 7 Jan 2019

👍20

All 12 comments

You're correct. On GKE, Cluster Autoscaler is always configured automatically. If you run your own cluster on GCE and have access to the master machine, you can change them in Cluster Autoscaler pod's manifest.

aleksandra-malinowska on 14 Jun 2018

👎9

Thanks!

chrissound on 14 Jun 2018

It would be good to be able to do some configuration on CA in GKE-- as in the referenced issue, I'd like to reduce the --scale-down-unneeded-time so as to not waste money for 10 minutes of unneeded capacity.

joshwand on 7 Dec 2018

👍14

I wonder if emptying a node completely helps the autoscaler to quickly remove it. For interactive analytic applications, the default value of 10 minutes for --scale-down-unneeded-time seems too large.

glapark on 7 Jan 2019

I wonder if emptying a node completely helps the autoscaler to quickly remove it. For interactive analytic applications, the default value of 10 minutes for --scale-down-unneeded-time seems too large.

It helps by eliminating drain time, and also increases throughput by allowing bulk deletes.

As for default 10 minutes wait, it's a compromise of sorts - we don't want the user to wait for nodes to be added because we removed them too quickly between jobs. This being said, we haven't revised this value for a while, so if you any have feedback regarding this behavior, especially production experience with it, please let us know.

aleksandra-malinowska on 7 Jan 2019

😕1

Thanks for the reply. At the moment, we are still implementing a new service and don't have any production-level experience with it yet (but will publish the result when it is ready).

glapark on 7 Jan 2019

joshwand on 7 Jan 2019

👍20

I wonder if there is any update on the default value of --scale-down-unneeded-time. I think the default value of 10 minutes is fine, but I hope GKE allows users to change the value for their own cluster, because if --scale-down-unneeded-time is set to a new value, the users should know what that actually means.

For us, we would like to implement an autoscaling logic for an analytics system based on Apache Hive, and we would like to remove nodes as soon as possible once the autoscaling logic decides to retire them.

glapark on 28 Jun 2019

👍13 👀3

Would be nice to be able to configure things like skip-nodes-with-system-pods or skip-nodes-with-local-storage, there's tons of config that we can't touch.

Luke-Vear on 15 Jul 2020

👍6

You can now choose predefined config for more aggressive scale-down: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#autoscaling_profiles (which doesn't help for the flags you listed, but it is what is requested in comments above).

MaciekPytel on 15 Jul 2020

I've been using the optimize-utilization profile, and unfortunately as you've said, it doesn't solve this issue. When using Linkerd, similarly to Istio, it creates emptyDir volume on every pod that you have a sidecar on. This prevents the cluster autoscaler from scaling down because of pretty much every application we have in the cluster.

The current workaround I've had to resort to is this: https://github.com/kubernetes/autoscaler/issues/3322

The other solution I've been considering so we don't have to maintain a fork of the autoscaler is building some kind of admission controller to add the safe-to-evict annotation to every pod _unless_ an annotation (unsafe-to-evict?) is present, as in the cluster I'm working within local storage should be an extremely exceptional scenario. Using PDBs for the kube-system pods is good, I'd rather know that those pods are being migrated more gracefully.

Being able to just configure the GKE autoscaler would completely solve this though. Perhaps configuration could be exposed in a ConfigMap instead, allowing the solution to be more platform agnostic.