Keda: topologySpreadConstraints are discarded when minReplicaCount is zero for rabbitmq scaler (and possibly the other scalers too)

Created on 13 Jun 2020 · 8Comments · Source: kedacore/keda

When minReplicaCount is zero for a ScaledObject, the topologySpreadConstraints of the scaled pods are discarded.

Expected Behavior

The topologySpreadConstraints should be preserved when the target deployment is scaled above zero.

Actual Behavior

The topologySpreadConstraints does not appear in the pods when they are scaled, leading to improper spreading across the topology key defined by the topologySpreadConstraints.

When minReplicaCount is greater than zero, topologySpreadConstraints is properly preserved.

Steps to Reproduce the Problem

Create a ScaledObject with minReplicaCount=0 for scaling a deployment containing topologySpreadConstraints in the PodSpec.
Trigger scale up of the deployment.
Fetch the deployment spec from kubernetes and observe the missing topologySpreadConstraints.

Specifications

KEDA Version: Using the helm chart version 1.4.2
Platform & Version: AKS, v1.18.2
Kubernetes Version:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-21T14:51:23Z", GoVersion:"go1.14.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T23:18:00Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

Scaler(s): RabbitMQ Queue latest version

bug

Source

mboutet

👍1

Most helpful comment

Yes, I tested with v2 and it seems to work.

mboutet on 23 Sep 2020

👍2

All 8 comments

@zroubalik, I'm sorry to directly tag you like that, but any idea what would cause this? I tried looking into the code, but I'm not knowledgeable enough in Go and on how an operator such as KEDA interacts with the manifests and the Kubernetes API.

Is this even caused by KEDA or is this an issue with the HPA/Kubernetes?

This issue is preventing a deployment scaled up from zero to properly be spread across topology domains. Instead, the replicas are almost all scheduled on the same node which can be inefficient in term of resources utilization and is preventing high availability topology.

mboutet on 29 Jun 2020

@zroubalik I was able to replicate the behavior above in the current version of KEDA. I believe this is due to the fact that the topologySpreadConstraints was introduced in k8s version 1.16 and the client go library being used did not yet account for that portion of the spec. I tried out the same deployment using KEDA v2 which has a newer client go library and the pods contained the additional topologySpreadConstraints so v2 will fix this issue,

tbickford on 30 Jun 2020

👍1

@tbickford thanks a lot for the confirmation! For v2 we have changed the scaling 1 <-> 0 as well a little bit (using the /scale subresource instead of changing the spec and replicas), so I am glad this is solved.

@mboutet sorry for the delay, I was super busy a last few days, this issue won't likely be solved in the v1, but upcoming v2 should do the job. I hope that's ok for you.

zroubalik on 30 Jun 2020

Perfect! Looking forward for the v2 of KEDA 😃 Thank you. Should we wait until the v2 is released to close the issue?

mboutet on 30 Jun 2020

Let's close this one, once it is verified on some release of v2 (beta, etc), there should be a release pretty soon.

zroubalik on 30 Jun 2020

👍1

It seems similar issue with my case. I have deployment, it has:

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: agentpool
  whenUnsatisfiable: DoNotSchedule
  labelSelector:
    matchLabels:
      app.kubernetes.io/name: service-12345

And I have an ScaledObject of KEDA Prometheus. KEDA version 1.5. The weird thing is the topologySpreadConstraints section in the deployment is missing/disappeared in the deployment or pod when I check them on dashboards. It leads to wrong topology of scaling, that is pods are assigned to wrong nodes.

Does v2.0 solve this issue?