Keda: Keda scaleType Job scales Jobs twice

Created on 2 Jan 2020  路  9Comments  路  Source: kedacore/keda

When using Keda to scale a Job type, KEDA seems to scale the Job twice on first scale. This happens when creating a ScaledObject when there is already a message existing in the queue.
Succeeding scaling works as expected. It only happens on first time scaling.

Expected Behavior

Keda should not scale Jobs twice on first time scaling.

Actual Behavior

Keda scales the Jobs twice.

Steps to Reproduce the Problem

  1. Put one message into a Queue.
  2. Create a ScaledObject of scaleType = job
  3. Watch Keda creates 2 jobs almost at the sametime.

Specifications

  • Version: from master branch
  • Platform: amd64,Ubuntu 18.04
  • Scaler(s): RabbitMQ scaler with scaleType: job

Sample ScaledObject

apiVersion: v1
kind: Secret
metadata:
  name: rabbitmq-consumer
data:
  RabbitMqHost: YW1xcDovL3VzZXI6UEFTU1dPUkRAMTAuMTUyLjE4My4xMDM6NTY3Mg==
---
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: rabbit-queue-scaledobject-jobs
  namespace: default
spec:
  scaleType: job
  pollingInterval: 10
  maxReplicaCount: 0
  jobTargetRef:
    activeDeadlineSeconds: 600
    ttlSecondsAfterFinished: 1
    backoffLimit: 6
    template:
      spec:
        containers:
        - name: pi
          image: perl
          command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(3000);"]
          env:
          - name: RabbitMqHost
            valueFrom:
              secretKeyRef:
                name: rabbitmq-consumer
                key: RabbitMqHost
  triggers:
  - type: rabbitmq
    metadata:
      queueName: hello
      host: RabbitMqHost
      queueLength  : '1'

I think the issue is at this line

https://github.com/kedacore/keda/blob/61d7959ec98f6d91e7d7232c3a0cc8d63a460f73/pkg/handler/scale_loop.go#L14

Would it be better to let the pollingInterval to pass before the scaling happens?
I tried removing the line above, and it seems to prevent it from doubling the scale. But it waits for the pollingInterval to pass before it starts scaling up or down.

bug

Most helpful comment

I tested it on v2 logic. Using the new scaling logic, It doesn't create two jobs against one message. I tested your code with v2 configuration. However, this yaml has an issue. It doesn't any message. Just execute some perl then finished. That means, once polling interval has finished, it is going to execute one more job. (since it doesn't consume the queue). A job should consume the queue (at least one message).

apiVersion: v1
kind: Secret
metadata:
  name: rabbitmq-consumer
data:
  RabbitMqHost: YW1xcDovL3VzZXI6UEFTU1dPUkRAMTAuMTUyLjE4My4xMDM6NTY3Mg==
  LocalHost: YW1xcDovL3VzZXI6UEFTU1dPUkRAMTI3LjAuMC4xOjU2NzI=
---
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: rabbit-queue-scaledobject-jobs
  namespace: default
spec:
  pollingInterval: 10
  jobTargetRef:
    activeDeadlineSeconds: 600
    ttlSecondsAfterFinished: 1
    backoffLimit: 6
    template:
      spec:
        containers:
        - name: pi
          image: perl
          command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(3000);"]
          envFrom:
            - secretRef:
                name: rabbitmq-consumer
  triggers:
  - type: rabbitmq
    metadata:
      queueName: hello
      host: RabbitMqHost
      localhost: LocalHost
      queueLength  : '1'

New Scale Logic you can find on this PR.
https://github.com/kedacore/keda/pull/951

This PR is enable us to test rabbit mq easily.
https://github.com/kedacore/keda/pull/953

I'l also provide samples for updating the rabbit mq sample with scaled job. If you want to test the behavior.
https://github.com/kedacore/sample-go-rabbitmq/pull/15

All 9 comments

@balchua good catch, thanks.

We shouldn't remove the first h.handleScale() call, KEDA should be able to scale imidiately and not wait for the pollingInterval for the first scale and this change would effect deploymentType as well.

This could be a better place where to fix this issue (since it is jobType related problem): https://github.com/kedacore/keda/blob/10c92ff1348b20df14921b156f611b0243924f89/pkg/handler/scale_loop.go#L50

@zroubalik i thought i will be able to prevent this when i prevent the number of Jobs to scale beyond the maxScale in the source scale_jobs.go.
But seems to me that it is happening almost instantaneously. Is this even possible that it is triggered 2x at almost the same time?

@balchua I saw your PR for preventing jobs from exceeding maxScale earlier.

Combining your solution in that PR with preventing more jobs to be created than there are messages seems to fix this scaling issue for me.

This might not be the cause of the problem though...

It seems I spoke too soon! Might have been some funny race condition but when I tried it again, the scaling triggered twice.

I did fetch the logs this time, using the keda:master image:

{"level":"info","ts":1578071613.7561276,"logger":"cmd","msg":"Go Version: go1.13.3"}
{"level":"info","ts":1578071613.7562637,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1578071613.7563248,"logger":"cmd","msg":"Version of operator-sdk: v0.11.0"}
{"level":"info","ts":1578071613.7563548,"logger":"cmd","msg":"Keda Commit: 61d7959ec98f6d91e7d7232c3a0cc8d63a460f73"}
{"level":"info","ts":1578071613.756532,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1578071616.3683953,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1578071616.372165,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1578071618.9748654,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1578071618.975012,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1578071618.9752345,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"scaledobject-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1578071621.5783374,"logger":"cmd","msg":"Could not generate and serve custom resource metrics","error":"discovering resource information failed for ScaledObject in keda.k8s.io/v1alpha1: unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request"}
{"level":"info","ts":1578071624.2016356,"logger":"metrics","msg":"Metrics Service object created","Service.Name":"keda-operator-metrics","Service.Namespace":"keda"}
{"level":"info","ts":1578071626.8036015,"logger":"cmd","msg":"Could not create ServiceMonitor object","error":"no ServiceMonitor registered with the API"}
{"level":"info","ts":1578071626.803761,"logger":"cmd","msg":"Install prometheus-operator in your cluster to create ServiceMonitor objects","error":"no ServiceMonitor registered with the API"}
{"level":"info","ts":1578071626.8037968,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1578071626.8040416,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1578071626.9041996,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"scaledobject-controller"}
{"level":"info","ts":1578071627.0045247,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"scaledobject-controller","worker count":1}
{"level":"info","ts":1578071627.004942,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.004993,"logger":"controller_scaledobject","msg":"Adding Finalizer for the ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.015848,"logger":"controller_scaledobject","msg":"Detecting ScaleType from ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.015923,"logger":"controller_scaledobject","msg":"Detected ScaleType = Job","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.1162179,"logger":"controller_scaledobject","msg":"Deleting jobs owned by the previous version of the ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler","Number of jobs to delete":8}
{"level":"info","ts":1578071627.1163352,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.1163745,"logger":"controller_scaledobject","msg":"Detecting ScaleType from ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.11638,"logger":"controller_scaledobject","msg":"Detected ScaleType = Job","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.116399,"logger":"controller_scaledobject","msg":"Deleting jobs owned by the previous version of the ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler","Number of jobs to delete":8}
{"level":"info","ts":1578071627.1260946,"logger":"scalehandler","msg":"Scalers count","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Count":1}
{"level":"info","ts":1578071627.1261015,"logger":"scalehandler","msg":"Scalers count","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Count":1}
{"level":"info","ts":1578071628.2392795,"logger":"scalehandler","msg":"Active trigger","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"isTriggerActive":true}
{"level":"info","ts":1578071628.2394001,"logger":"scalehandler","msg":"Scaler max value","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"MaxValue":5}
{"level":"info","ts":1578071628.2424963,"logger":"scalehandler","msg":"Active trigger","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"isTriggerActive":true}
{"level":"info","ts":1578071628.2425148,"logger":"scalehandler","msg":"Scaler max value","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"MaxValue":5}
{"level":"info","ts":1578071628.247716,"logger":"scalehandler","msg":"QueueLength Metric value","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"queueLength":1}
{"level":"info","ts":1578071628.2478073,"logger":"scalehandler","msg":"Scaler is active","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{}}
{"level":"info","ts":1578071628.2492247,"logger":"scalehandler","msg":"QueueLength Metric value","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"queueLength":1}
{"level":"info","ts":1578071628.2492595,"logger":"scalehandler","msg":"Scaler is active","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{}}
{"level":"info","ts":1578071628.2533662,"logger":"scalehandler","msg":"Creating jobs","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Number of jobs":1}
{"level":"error","ts":1578071628.2555327,"logger":"scalehandler","msg":"Error updating scaledObject status","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","error":"Operation cannot be fulfilled on scaledobjects.keda.k8s.io \"consumer-scaler\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).updateScaledObjectStatus\n\tkeda/pkg/handler/scale_handler.go:47\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).scaleJobs\n\tkeda/pkg/handler/scale_jobs.go:22\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleJob\n\tkeda/pkg/handler/scale_loop.go:99\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:42\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:14"}
{"level":"info","ts":1578071628.2556405,"logger":"scalehandler","msg":"Creating jobs","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Number of jobs":1}
{"level":"info","ts":1578071628.2740905,"logger":"scalehandler","msg":"Created jobs","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Number of jobs":1}
{"level":"info","ts":1578071628.2803454,"logger":"scalehandler","msg":"Created jobs","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Number of jobs":1}

The same problem also happens with the keda:1.0.0 image.

I added some logging and noticed that startScaleLoop is called twice from the reconcileJobType method, I'm not sure if this might be part of the problem?

I also saw an error Operation cannot be fulfilled..... don't know if this is benign.

@balchua @lee0c may be able to help. I tried to add @stgricci but somehow cant seem to.

@anirudhgarg I can take a look. Not sure why it's not assignable to me.

I tested it on v2 logic. Using the new scaling logic, It doesn't create two jobs against one message. I tested your code with v2 configuration. However, this yaml has an issue. It doesn't any message. Just execute some perl then finished. That means, once polling interval has finished, it is going to execute one more job. (since it doesn't consume the queue). A job should consume the queue (at least one message).

apiVersion: v1
kind: Secret
metadata:
  name: rabbitmq-consumer
data:
  RabbitMqHost: YW1xcDovL3VzZXI6UEFTU1dPUkRAMTAuMTUyLjE4My4xMDM6NTY3Mg==
  LocalHost: YW1xcDovL3VzZXI6UEFTU1dPUkRAMTI3LjAuMC4xOjU2NzI=
---
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: rabbit-queue-scaledobject-jobs
  namespace: default
spec:
  pollingInterval: 10
  jobTargetRef:
    activeDeadlineSeconds: 600
    ttlSecondsAfterFinished: 1
    backoffLimit: 6
    template:
      spec:
        containers:
        - name: pi
          image: perl
          command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(3000);"]
          envFrom:
            - secretRef:
                name: rabbitmq-consumer
  triggers:
  - type: rabbitmq
    metadata:
      queueName: hello
      host: RabbitMqHost
      localhost: LocalHost
      queueLength  : '1'

New Scale Logic you can find on this PR.
https://github.com/kedacore/keda/pull/951

This PR is enable us to test rabbit mq easily.
https://github.com/kedacore/keda/pull/953

I'l also provide samples for updating the rabbit mq sample with scaled job. If you want to test the behavior.
https://github.com/kedacore/sample-go-rabbitmq/pull/15

@TsuyoshiUshio so can we close this one then?

Was this page helpful?
0 / 5 - 0 ratings