When using Keda to scale a Job type, KEDA seems to scale the Job twice on first scale. This happens when creating a ScaledObject when there is already a message existing in the queue.
Succeeding scaling works as expected. It only happens on first time scaling.
Keda should not scale Jobs twice on first time scaling.
Keda scales the Jobs twice.
ScaledObject of scaleType = jobSample ScaledObject
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-consumer
data:
RabbitMqHost: YW1xcDovL3VzZXI6UEFTU1dPUkRAMTAuMTUyLjE4My4xMDM6NTY3Mg==
---
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: rabbit-queue-scaledobject-jobs
namespace: default
spec:
scaleType: job
pollingInterval: 10
maxReplicaCount: 0
jobTargetRef:
activeDeadlineSeconds: 600
ttlSecondsAfterFinished: 1
backoffLimit: 6
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(3000);"]
env:
- name: RabbitMqHost
valueFrom:
secretKeyRef:
name: rabbitmq-consumer
key: RabbitMqHost
triggers:
- type: rabbitmq
metadata:
queueName: hello
host: RabbitMqHost
queueLength : '1'
I think the issue is at this line
Would it be better to let the pollingInterval to pass before the scaling happens?
I tried removing the line above, and it seems to prevent it from doubling the scale. But it waits for the pollingInterval to pass before it starts scaling up or down.
@balchua good catch, thanks.
We shouldn't remove the first h.handleScale() call, KEDA should be able to scale imidiately and not wait for the pollingInterval for the first scale and this change would effect deploymentType as well.
This could be a better place where to fix this issue (since it is jobType related problem): https://github.com/kedacore/keda/blob/10c92ff1348b20df14921b156f611b0243924f89/pkg/handler/scale_loop.go#L50
@zroubalik i thought i will be able to prevent this when i prevent the number of Jobs to scale beyond the maxScale in the source scale_jobs.go.
But seems to me that it is happening almost instantaneously. Is this even possible that it is triggered 2x at almost the same time?
@balchua I saw your PR for preventing jobs from exceeding maxScale earlier.
Combining your solution in that PR with preventing more jobs to be created than there are messages seems to fix this scaling issue for me.
This might not be the cause of the problem though...
It seems I spoke too soon! Might have been some funny race condition but when I tried it again, the scaling triggered twice.
I did fetch the logs this time, using the keda:master image:
{"level":"info","ts":1578071613.7561276,"logger":"cmd","msg":"Go Version: go1.13.3"}
{"level":"info","ts":1578071613.7562637,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1578071613.7563248,"logger":"cmd","msg":"Version of operator-sdk: v0.11.0"}
{"level":"info","ts":1578071613.7563548,"logger":"cmd","msg":"Keda Commit: 61d7959ec98f6d91e7d7232c3a0cc8d63a460f73"}
{"level":"info","ts":1578071613.756532,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1578071616.3683953,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1578071616.372165,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1578071618.9748654,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1578071618.975012,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1578071618.9752345,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"scaledobject-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1578071621.5783374,"logger":"cmd","msg":"Could not generate and serve custom resource metrics","error":"discovering resource information failed for ScaledObject in keda.k8s.io/v1alpha1: unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request"}
{"level":"info","ts":1578071624.2016356,"logger":"metrics","msg":"Metrics Service object created","Service.Name":"keda-operator-metrics","Service.Namespace":"keda"}
{"level":"info","ts":1578071626.8036015,"logger":"cmd","msg":"Could not create ServiceMonitor object","error":"no ServiceMonitor registered with the API"}
{"level":"info","ts":1578071626.803761,"logger":"cmd","msg":"Install prometheus-operator in your cluster to create ServiceMonitor objects","error":"no ServiceMonitor registered with the API"}
{"level":"info","ts":1578071626.8037968,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1578071626.8040416,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1578071626.9041996,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"scaledobject-controller"}
{"level":"info","ts":1578071627.0045247,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"scaledobject-controller","worker count":1}
{"level":"info","ts":1578071627.004942,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.004993,"logger":"controller_scaledobject","msg":"Adding Finalizer for the ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.015848,"logger":"controller_scaledobject","msg":"Detecting ScaleType from ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.015923,"logger":"controller_scaledobject","msg":"Detected ScaleType = Job","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.1162179,"logger":"controller_scaledobject","msg":"Deleting jobs owned by the previous version of the ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler","Number of jobs to delete":8}
{"level":"info","ts":1578071627.1163352,"logger":"controller_scaledobject","msg":"Reconciling ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.1163745,"logger":"controller_scaledobject","msg":"Detecting ScaleType from ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.11638,"logger":"controller_scaledobject","msg":"Detected ScaleType = Job","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler"}
{"level":"info","ts":1578071627.116399,"logger":"controller_scaledobject","msg":"Deleting jobs owned by the previous version of the ScaledObject","Request.Namespace":"keda-sample","Request.Name":"consumer-scaler","Number of jobs to delete":8}
{"level":"info","ts":1578071627.1260946,"logger":"scalehandler","msg":"Scalers count","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Count":1}
{"level":"info","ts":1578071627.1261015,"logger":"scalehandler","msg":"Scalers count","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Count":1}
{"level":"info","ts":1578071628.2392795,"logger":"scalehandler","msg":"Active trigger","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"isTriggerActive":true}
{"level":"info","ts":1578071628.2394001,"logger":"scalehandler","msg":"Scaler max value","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"MaxValue":5}
{"level":"info","ts":1578071628.2424963,"logger":"scalehandler","msg":"Active trigger","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"isTriggerActive":true}
{"level":"info","ts":1578071628.2425148,"logger":"scalehandler","msg":"Scaler max value","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"MaxValue":5}
{"level":"info","ts":1578071628.247716,"logger":"scalehandler","msg":"QueueLength Metric value","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"queueLength":1}
{"level":"info","ts":1578071628.2478073,"logger":"scalehandler","msg":"Scaler is active","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{}}
{"level":"info","ts":1578071628.2492247,"logger":"scalehandler","msg":"QueueLength Metric value","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{},"queueLength":1}
{"level":"info","ts":1578071628.2492595,"logger":"scalehandler","msg":"Scaler is active","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Scaler":{}}
{"level":"info","ts":1578071628.2533662,"logger":"scalehandler","msg":"Creating jobs","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Number of jobs":1}
{"level":"error","ts":1578071628.2555327,"logger":"scalehandler","msg":"Error updating scaledObject status","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","error":"Operation cannot be fulfilled on scaledobjects.keda.k8s.io \"consumer-scaler\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).updateScaledObjectStatus\n\tkeda/pkg/handler/scale_handler.go:47\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).scaleJobs\n\tkeda/pkg/handler/scale_jobs.go:22\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScaleJob\n\tkeda/pkg/handler/scale_loop.go:99\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).handleScale\n\tkeda/pkg/handler/scale_loop.go:42\ngithub.com/kedacore/keda/pkg/handler.(*ScaleHandler).HandleScaleLoop\n\tkeda/pkg/handler/scale_loop.go:14"}
{"level":"info","ts":1578071628.2556405,"logger":"scalehandler","msg":"Creating jobs","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Number of jobs":1}
{"level":"info","ts":1578071628.2740905,"logger":"scalehandler","msg":"Created jobs","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Number of jobs":1}
{"level":"info","ts":1578071628.2803454,"logger":"scalehandler","msg":"Created jobs","ScaledObject.Namespace":"keda-sample","ScaledObject.Name":"consumer-scaler","ScaledObject.ScaleType":"job","Number of jobs":1}
The same problem also happens with the keda:1.0.0 image.
I added some logging and noticed that startScaleLoop is called twice from the reconcileJobType method, I'm not sure if this might be part of the problem?
I also saw an error Operation cannot be fulfilled..... don't know if this is benign.
@balchua @lee0c may be able to help. I tried to add @stgricci but somehow cant seem to.
@anirudhgarg I can take a look. Not sure why it's not assignable to me.
I tested it on v2 logic. Using the new scaling logic, It doesn't create two jobs against one message. I tested your code with v2 configuration. However, this yaml has an issue. It doesn't any message. Just execute some perl then finished. That means, once polling interval has finished, it is going to execute one more job. (since it doesn't consume the queue). A job should consume the queue (at least one message).
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-consumer
data:
RabbitMqHost: YW1xcDovL3VzZXI6UEFTU1dPUkRAMTAuMTUyLjE4My4xMDM6NTY3Mg==
LocalHost: YW1xcDovL3VzZXI6UEFTU1dPUkRAMTI3LjAuMC4xOjU2NzI=
---
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: rabbit-queue-scaledobject-jobs
namespace: default
spec:
pollingInterval: 10
jobTargetRef:
activeDeadlineSeconds: 600
ttlSecondsAfterFinished: 1
backoffLimit: 6
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(3000);"]
envFrom:
- secretRef:
name: rabbitmq-consumer
triggers:
- type: rabbitmq
metadata:
queueName: hello
host: RabbitMqHost
localhost: LocalHost
queueLength : '1'
New Scale Logic you can find on this PR.
https://github.com/kedacore/keda/pull/951
This PR is enable us to test rabbit mq easily.
https://github.com/kedacore/keda/pull/953
I'l also provide samples for updating the rabbit mq sample with scaled job. If you want to test the behavior.
https://github.com/kedacore/sample-go-rabbitmq/pull/15
@TsuyoshiUshio so can we close this one then?
Most helpful comment
I tested it on v2 logic. Using the new scaling logic, It doesn't create two jobs against one message. I tested your code with v2 configuration. However, this yaml has an issue. It doesn't any message. Just execute some perl then finished. That means, once polling interval has finished, it is going to execute one more job. (since it doesn't consume the queue). A job should consume the queue (at least one message).
New Scale Logic you can find on this PR.
https://github.com/kedacore/keda/pull/951
This PR is enable us to test rabbit mq easily.
https://github.com/kedacore/keda/pull/953
I'l also provide samples for updating the rabbit mq sample with scaled job. If you want to test the behavior.
https://github.com/kedacore/sample-go-rabbitmq/pull/15