I am looking to modify some of the auto scaling options, but this does not seem to be possible on GKE?
It's not clear where to run these 'flags' mentioned in the FAQ or even where these command line flags need to be executed on.
Similar issue is brought up here:
https://stackoverflow.com/questions/48963625/where-to-config-the-kubernetes-cluster-autoscaler-on-google-cloud
You're correct. On GKE, Cluster Autoscaler is always configured automatically. If you run your own cluster on GCE and have access to the master machine, you can change them in Cluster Autoscaler pod's manifest.
Thanks!
It would be good to be able to do some configuration on CA in GKE-- as in the referenced issue, I'd like to reduce the --scale-down-unneeded-time so as to not waste money for 10 minutes of unneeded capacity.
I wonder if emptying a node completely helps the autoscaler to quickly remove it. For interactive analytic applications, the default value of 10 minutes for --scale-down-unneeded-time seems too large.
I wonder if emptying a node completely helps the autoscaler to quickly remove it. For interactive analytic applications, the default value of 10 minutes for --scale-down-unneeded-time seems too large.
It helps by eliminating drain time, and also increases throughput by allowing bulk deletes.
As for default 10 minutes wait, it's a compromise of sorts - we don't want the user to wait for nodes to be added because we removed them too quickly between jobs. This being said, we haven't revised this value for a while, so if you any have feedback regarding this behavior, especially production experience with it, please let us know.
Thanks for the reply. At the moment, we are still implementing a new service and don't have any production-level experience with it yet (but will publish the result when it is ready).
We spin up expensive high-memory instances on demand as slaves for our integration tests. The load is intermittent, so that extra 10 minutes for 10-30 instances, multiple times a day, gets quite expensive.
I wonder if there is any update on the default value of --scale-down-unneeded-time. I think the default value of 10 minutes is fine, but I hope GKE allows users to change the value for their own cluster, because if --scale-down-unneeded-time is set to a new value, the users should know what that actually means.
For us, we would like to implement an autoscaling logic for an analytics system based on Apache Hive, and we would like to remove nodes as soon as possible once the autoscaling logic decides to retire them.
Would be nice to be able to configure things like skip-nodes-with-system-pods or skip-nodes-with-local-storage, there's tons of config that we can't touch.
You can now choose predefined config for more aggressive scale-down: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#autoscaling_profiles (which doesn't help for the flags you listed, but it is what is requested in comments above).
I've been using the optimize-utilization profile, and unfortunately as you've said, it doesn't solve this issue. When using Linkerd, similarly to Istio, it creates emptyDir volume on every pod that you have a sidecar on. This prevents the cluster autoscaler from scaling down because of pretty much every application we have in the cluster.
The current workaround I've had to resort to is this: https://github.com/kubernetes/autoscaler/issues/3322
The other solution I've been considering so we don't have to maintain a fork of the autoscaler is building some kind of admission controller to add the safe-to-evict annotation to every pod _unless_ an annotation (unsafe-to-evict?) is present, as in the cluster I'm working within local storage should be an extremely exceptional scenario. Using PDBs for the kube-system pods is good, I'd rather know that those pods are being migrated more gracefully.
Being able to just configure the GKE autoscaler would completely solve this though. Perhaps configuration could be exposed in a ConfigMap instead, allowing the solution to be more platform agnostic.
Also, I'd prefer be able to monitor the cluster autoscaler using Prometheus.
Most helpful comment
We spin up expensive high-memory instances on demand as slaves for our integration tests. The load is intermittent, so that extra 10 minutes for 10-30 instances, multiple times a day, gets quite expensive.