I’ve been using KEDA Jobs to execute a rather CPU-intensive job based on an Azure Storage Queue trigger. However, due to it needing quite some resources, this sometimes means the cluster itself has to scale by provisioning extra nodes in order to be able to assign the necessary resources and start the job. Because this takes some time (we're talking minutes here), extra jobs for the same message in the queue start to spawn in a pending state; resulting in another scale up of nodes. This goes on until the maximum amount of pods is reached.
I expected KEDA to trigger only once per message added to the queue, resulting in 1 pod per message.
Currently, KEDA spawns a pod each time the pollingInterval has passed. This results in pods being spawned every minute until the maxReplicaCount has been reached. Next to that, due to Azure's Autoscaling feature, additional nodes will be provisioned to handle the extra pods.
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: consumer
namespace: default
spec:
scaleType: job
pollingInterval: 60
maxReplicaCount: 100
cooldownPeriod: 30
jobTargetRef:
parallelism: 1
completions: 1
backoffLimit: 1
template:
spec:
containers:
- name: consumer
image: registry.gitlab.xxx.com/consumer:latest
resources:
requests:
memory: "1024Mi"
cpu: "1000m"
env:
- name: AzureWebJobsStorage
valueFrom:
secretKeyRef:
name: az-queue-secret
key: AzureWebJobsStorage
- name: QUEUE_NAME
value: consumer-test
triggers:
- type: azure-queue
metadata:
queueName: consumer-test
queueLength: '1'
connection: AzureWebJobsStorage # Based on secret
How would I be able to force KEDA to only trigger once per message added to the queue?
Looking at the code, I don't think there is a way. Perhaps the job scaler needs to first check how many Job are not completed/failed before starting a new Job.
This will mean that the job must dequeue the message when it is done and not before.
Will this be a potential feature or should I look into alternatives?
I am trying to work on this one. Should we just set maxScale to be maxScale - runningJobs. This should limit the scaling.
If we want to have one message per job, this would mean that the queueLength must be 1.
Many thanks @balchua, your explanation to resolve this issue sounds completely valid to me. Love to see that you've already submitted a pull request. Looking forward to it being merged :). If I can do anything in terms of documentation or something else; please let me know.
@JoooostB the pr has been merged. Can help test it? It might be in the master tag.
docker pull kedacore/keda:master
@balchua I've been running it for a few days based on a custom build from your repo. The main issue seems to be resolved, no extra pods are spawned on the same message. However, when more messages enter the queue, the amount of pods/jobs never exceeds 1.
The code seems to look at the queueLength and use this as the maximum scale target. But the issue with setting queueLength to a higher value and a low pollingInterval is that whenever the polling occurs, a new job gets deployed until the message has been dequeued. My application takes a rough 30 seconds to start before it dequeues the message, resulting in at least 6 pods (30/5s). An alternative would be setting the pollingInterval to 45-60s, but that results in some massive latency from creating the message until executing. I think we should reconsider the following comment from @Cottonglow https://github.com/kedacore/keda/pull/528#issuecomment-571197975
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: queue-job
namespace: default
spec:
scaleType: job
pollingInterval: 5
maxReplicaCount: 10
cooldownPeriod: 30
jobTargetRef:
parallelism: 10
completions: 1
backoffLimit: 1
template:
spec:
containers:
- name: queue
image: registry.gitlab.xxx.nl/k8s/queue:slim
imagePullPolicy: Always
resources:
requests:
memory: "512Mi"
cpu: "1000m"
env:
- name: AzureWebJobsStorage
valueFrom:
secretKeyRef:
name: az-queue-secret
key: AzureWebJobsStorage
- name: QUEUE_NAME
value: queue-test
imagePullSecrets:
- name: gitlab-registry
triggers:
- type: azure-queue
metadata:
queueName: queue-test
queueLength: '1'
connection: AzureWebJobsStorage # Based on secret
Apologies @JoooostB. Let me visit this again. Maybe playing with the maxScale alone isnt enough.
To prevent any misunderstanding, I'll try to explain it as thorough as possible below 😃
Strictly speaking, I never want more than one job or pod for a single message, I just want it to spawn a new one whenever a new message comes in.
All jobs are long running (multiple days), let me try to explain using an example:
If message A enters the queue, the queueLength becomes 1 and a new pod should be spawned which will work on that message until it completes. To prevent confusion on if the message has already been ‘used’ or not, we instantly destroy the message from the queue. The queueLength becomes 0 again.
longrunning-job-856b464c88-hpnqj (message A)
One minute later another message (message B) enters the queue, so the queueLength becomes 1 again. I now expect it to just spawn a new pod next to the already running pod:
longrunning-job-856b464c88-hpnqj (message A)
longrunning-job-442a356c55-dojaq (message B)
But in the current situation, KEDA will always spawn the amount of pods as defined in the queueLength or it will wait for the previous job to be finished. This is not a wanted solution, I want the separate jobs to be running at the same time but never want more than one pod spawned per job/message in the queue. Would this be possible to implement?
To add one more scenario, if the job creation is slow (for example it takes minutes), and the polling time kicks in, no new pods must be created.
Prior to my pr, jobs are scaled from 0 up to the queueLength. Perhaps i need to bring this change back. @jeffhollan ?
Hi i am going to send a PR to address this one.
The approach i took on this is to check the pod status.
How the number of Jobs to scale is calculated is <number of msgs in queue> - <jobs where all pods are not in "Running" or "Completed"> states. Still limited by the threshold.
Thank you @balchua for your PR, it will help when pulling big images or scaling the cluster to handle the Job requirements.
This will also work for @JoooostB usecase where he directly deletes the message at pod startup.
But his solution is not taking advantage of the visibilityTimeout of the Azure Storage Queue.
If you want to take a look at my issue #636 where I suggest another approach for the remaining bug.
Thanks 😄
Thanks @thomas-lamure the PR is to tackle more general scaling situations like you mentioned. I will be happy if someone else can test the PR.
I also saw your issue you created, and i think it is best that it is handled at the scaler itself.
Most helpful comment
I am trying to work on this one. Should we just set
maxScaleto bemaxScale - runningJobs. This should limit the scaling.If we want to have one message per job, this would mean that the
queueLengthmust be1.