Keda: topologySpreadConstraints are discarded when minReplicaCount is zero for rabbitmq scaler (and possibly the other scalers too)

Created on 13 Jun 2020  路  8Comments  路  Source: kedacore/keda

When minReplicaCount is zero for a ScaledObject, the topologySpreadConstraints of the scaled pods are discarded.

Expected Behavior

The topologySpreadConstraints should be preserved when the target deployment is scaled above zero.

Actual Behavior

The topologySpreadConstraints does not appear in the pods when they are scaled, leading to improper spreading across the topology key defined by the topologySpreadConstraints.

When minReplicaCount is greater than zero, topologySpreadConstraints is properly preserved.

Steps to Reproduce the Problem

  1. Create a ScaledObject with minReplicaCount=0 for scaling a deployment containing topologySpreadConstraints in the PodSpec.
  2. Trigger scale up of the deployment.
  3. Fetch the deployment spec from kubernetes and observe the missing topologySpreadConstraints.

Specifications

  • KEDA Version: Using the helm chart version 1.4.2
  • Platform & Version: AKS, v1.18.2
  • Kubernetes Version:
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-21T14:51:23Z", GoVersion:"go1.14.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T23:18:00Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
  • Scaler(s): RabbitMQ Queue latest version
bug

Most helpful comment

Yes, I tested with v2 and it seems to work.

All 8 comments

@zroubalik, I'm sorry to directly tag you like that, but any idea what would cause this? I tried looking into the code, but I'm not knowledgeable enough in Go and on how an operator such as KEDA interacts with the manifests and the Kubernetes API.

Is this even caused by KEDA or is this an issue with the HPA/Kubernetes?

This issue is preventing a deployment scaled up from zero to properly be spread across topology domains. Instead, the replicas are almost all scheduled on the same node which can be inefficient in term of resources utilization and is preventing high availability topology.

@zroubalik I was able to replicate the behavior above in the current version of KEDA. I believe this is due to the fact that the topologySpreadConstraints was introduced in k8s version 1.16 and the client go library being used did not yet account for that portion of the spec. I tried out the same deployment using KEDA v2 which has a newer client go library and the pods contained the additional topologySpreadConstraints so v2 will fix this issue,

@tbickford thanks a lot for the confirmation! For v2 we have changed the scaling 1 <-> 0 as well a little bit (using the /scale subresource instead of changing the spec and replicas), so I am glad this is solved.

@mboutet sorry for the delay, I was super busy a last few days, this issue won't likely be solved in the v1, but upcoming v2 should do the job. I hope that's ok for you.

Perfect! Looking forward for the v2 of KEDA 馃槂 Thank you. Should we wait until the v2 is released to close the issue?

Let's close this one, once it is verified on some release of v2 (beta, etc), there should be a release pretty soon.

It seems similar issue with my case. I have deployment, it has:

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: agentpool
  whenUnsatisfiable: DoNotSchedule
  labelSelector:
    matchLabels:
      app.kubernetes.io/name: service-12345

And I have an ScaledObject of KEDA Prometheus. KEDA version 1.5. The weird thing is the topologySpreadConstraints section in the deployment is missing/disappeared in the deployment or pod when I check them on dashboards. It leads to wrong topology of scaling, that is pods are assigned to wrong nodes.

Does v2.0 solve this issue?

Yes, I tested with v2 and it seems to work.

@mboutet thanks for the confirmation. Closing this then.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

aman-bansal picture aman-bansal  路  4Comments

slayer picture slayer  路  4Comments

joskfg picture joskfg  路  4Comments

tomkerkhove picture tomkerkhove  路  4Comments

cwhfa picture cwhfa  路  4Comments