When minReplicaCount is zero for a ScaledObject, the topologySpreadConstraints of the scaled pods are discarded.
The topologySpreadConstraints should be preserved when the target deployment is scaled above zero.
The topologySpreadConstraints does not appear in the pods when they are scaled, leading to improper spreading across the topology key defined by the topologySpreadConstraints.
When minReplicaCount is greater than zero, topologySpreadConstraints is properly preserved.
ScaledObject with minReplicaCount=0 for scaling a deployment containing topologySpreadConstraints in the PodSpec.topologySpreadConstraints.Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-21T14:51:23Z", GoVersion:"go1.14.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T23:18:00Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
@zroubalik, I'm sorry to directly tag you like that, but any idea what would cause this? I tried looking into the code, but I'm not knowledgeable enough in Go and on how an operator such as KEDA interacts with the manifests and the Kubernetes API.
Is this even caused by KEDA or is this an issue with the HPA/Kubernetes?
This issue is preventing a deployment scaled up from zero to properly be spread across topology domains. Instead, the replicas are almost all scheduled on the same node which can be inefficient in term of resources utilization and is preventing high availability topology.
@zroubalik I was able to replicate the behavior above in the current version of KEDA. I believe this is due to the fact that the topologySpreadConstraints was introduced in k8s version 1.16 and the client go library being used did not yet account for that portion of the spec. I tried out the same deployment using KEDA v2 which has a newer client go library and the pods contained the additional topologySpreadConstraints so v2 will fix this issue,
@tbickford thanks a lot for the confirmation! For v2 we have changed the scaling 1 <-> 0 as well a little bit (using the /scale subresource instead of changing the spec and replicas), so I am glad this is solved.
@mboutet sorry for the delay, I was super busy a last few days, this issue won't likely be solved in the v1, but upcoming v2 should do the job. I hope that's ok for you.
Perfect! Looking forward for the v2 of KEDA 馃槂 Thank you. Should we wait until the v2 is released to close the issue?
Let's close this one, once it is verified on some release of v2 (beta, etc), there should be a release pretty soon.
It seems similar issue with my case. I have deployment, it has:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: agentpool
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: service-12345
And I have an ScaledObject of KEDA Prometheus. KEDA version 1.5. The weird thing is the topologySpreadConstraints section in the deployment is missing/disappeared in the deployment or pod when I check them on dashboards. It leads to wrong topology of scaling, that is pods are assigned to wrong nodes.
Does v2.0 solve this issue?
Yes, I tested with v2 and it seems to work.
@mboutet thanks for the confirmation. Closing this then.
Most helpful comment
Yes, I tested with v2 and it seems to work.