Hello,
Is there a way to disregard daemonsets(or certain pods) when considering the utilization of a node to scale down?
I have tried adding the annotation:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
but the cluster autoscaler still seems to not kill nodes with high utilization due to daemonsets:
<node name> is not suitable for removal - utilization too big (0.631092)
Not really. Annotation marks pod as safe to evict, but the decision to drain the node is based on total utilization.
Thanks for for the response @aleksandra-malinowska !
I do wonder if this should be the way it behaves. It seems like a some of the other logic used to determine if a node can be removed does not consider pods that:
run on all nodes by default
like daemonsets
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
So, would substracting utilization of daemonset-originating pods be a viable solution?
Yes, I think that would solve our issue.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Seeing a similar thing. This largely affects our development staging clusters where scale-to-zero is an appealing way to make sure we at least provision the various MIGs that back the different flavors of node groups we use, while ensuring we don't continuously run 16 node clusters with one userland pod.
With smaller nodes e.g. n1-standard-1, our logging and service mesh pods will bring utilization over 50%. These pods only exist to provide a common substrate and the node would not be otherwise be utilized or necessary if not for these DaemonSet pods.
I think in general, the heuristic mention by @WebSpider is a good one. _Generally_ DaemonSet pods exist to provide this kind of baseline functionality for any node in the cluster and are typically _not_ user-scheduled workloads.
That's quite a backwards incompatible change. Perhaps a new flag to ignore daemonset-induced node utilization is the way to go? I can't think of a case right now for the added complexity, but a more granular solution could be a pod annotation that explicitly tells the autoscaler to ignore utilization induced by a given pod.
We are currently seeing this same issue. We have a large amount of nodes that only have daemonsets on them that are persisting in the cluster and are not being terminated by CA. These daemonsets provide metrics and system functionality, and are not user-scheduled pods.
The solution should either be a flag to ignore daemonset pods in the utilisation or an annotation that can placed on daemonset pods to ignore them.
/remove-lifecycle rotten
/assign @MaciekPytel
@MaciekPytel @aleksandra-malinowska do you think this is feasible?
The proposal looks very reasonable to me. And easy to implement.
With that said I am not sure if we will have time to address that. PRs are very welcome.
I've raised https://github.com/kubernetes/autoscaler/pull/1407 which will add a flag to ignore DaemonSets when performing the resource utilization calculations. Wasn't sure how to test it, but happy for pointers on what tests to add.
This looks really good, although an annotation would be great, so we can use that on managed clusters (Ex: GKE).
Wdyt?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
This is fixed with #1407
Most helpful comment
I've raised https://github.com/kubernetes/autoscaler/pull/1407 which will add a flag to ignore DaemonSets when performing the resource utilization calculations. Wasn't sure how to test it, but happy for pointers on what tests to add.