The Daemonset pod detection is currently done in the code by looking at the OwnerRef.Kind that reference ether a Daemonset, ReplicaSet, StatefulSet.
Internally at Datadog, we implemented thanks to CRD a custom Daemonset that aims to fix several issues that we have with the current Daemonset implementation.
Like the official Daemonset controller, our CustomDaemonset controller is creating and assigning on each Node a Pod, but in our case the OwnerRef.Kind is not Daemonset but CustomDaemonset, which has the consequence that the Autoscaler considers those pods as normal pods, and tries to control them for the cluster scaling and also the bin packing.
To allow the support of our use case, we are open to participating in the project. We see two possible implementations:
1) Add new configuration flag to provide several ApiGroup/Kind to treat as Daemonsets
2) Introduce a new Annotation on Pod to allow the Autoscaler to consider the Pod as a Daemonset pod. cluster-autoscaler.kubernetes.io/is-daemonset-pod: "true" (annotation name TBD)
Daemonset ApiGroup/KindSolution 2 is our preferred solution because it opens the support of more use cases. Also, it seems aligned with other use cases logic already implemented in the Autoscaler like:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"Currently with fix it with the following workaround: https://github.com/kubernetes/autoscaler/commit/25754c89d4d6cdbc84a1e5bf9f5e37b1e97a39ef
Feel free to comment on this issue, if you want more information about our use case, if you see corner cases or you think of another possible solution.
It seems to me that approach 2) is less invasive and I do not see any hard reasons not to implement it. @MaciekPytel WDYT?
If we go for that it would be nice to clean up the getRequiredPodsForNode code. How it uses drain.GetPodsForDeletionOnNodeDrain under the hood is pretty disgusting.
I wouldn't mind solution 2 if the autoscaler could retain its current ability to detect _DaemonSet_-managed pods. The reason is that some daemons we run come from upstream manifests that we'd have to patch鈥攑erhaps using _kustomize_鈥攊n order to place such an annotation on those pods.
Option 2 SGTM. As mentioned there is precedent for using annotations in similar way in CA and I don't really see any downsides to adding it.
edit: I assume that annotation would work in addition to current DS pod detection, not replace it.
I wouldn't mind solution 2 if the autoscaler could retain its current ability to detect _DaemonSet_-managed pods. The reason is that some daemons we run come from upstream manifests that we'd have to patch鈥攑erhaps using _kustomize_鈥攊n order to place such an annotation on those pods.
Yeah - that is the idea. To add to what we have right now. Not replace current logic.
Thanks all for your feedback,
I will start implementing "Option 2" soon, since everyone is agree with it.
Most helpful comment
I wouldn't mind solution 2 if the autoscaler could retain its current ability to detect _DaemonSet_-managed pods. The reason is that some daemons we run come from upstream manifests that we'd have to patch鈥攑erhaps using _kustomize_鈥攊n order to place such an annotation on those pods.