A clear and concise description of what the bug is.
Keda job scaler spawns new pods indefinitely if the spawned pods don't consume the messages in the queue within the next polling Iteration.
When the strategy specified is accurate, It is expected that the Scaler's SDK can return queuelength not including messages that are being processed for queues like AWS-SQS etc. So that the scale depends solely on the queuelength. But in cases when there's a resource crunch in the cluster, the spawned pods might not get placed onto the nodes until the next polling iteration, which means they don't consume the messages from the queue either. During the next poll the scaler again spawns the new pods equal to the queuelength.
Eg:
Trigger : SQS
Available Messages : 100
Messages locked : 50
Pods Running : 50
Poll Duration : 60 seconds
Target Average Value = 1
At polling Iteration T1, 100 new pods are spawned. None of the pods were successfully scheduled. no messages were consumed.
At polling Iteration T2, 100 messages are still available in the queue. but no new pods are created as TotalPodsCurrentlyActive = messagesAvailable + messagesLocked.
At polling Iteration T2, 100 messages are still available in the queue. So 100 new pods are created again assuming the messages were received after the last polling Iteration.
@TsuyoshiUshio PTAL
As we discussed on the closed PR, scheduledJobCount might be help. I'm not sure who we can make it, so that, let me investigate it. Until then, longer polling interval and if the container has 0 message, it will quit. might work.
Thanks @TsuyoshiUshio , I am already using a longer polling Interval and exiting the job if queue is empty as a non-blocker. For our use-case, we want the scaling to be very responsive and not have any delay in the system due to the polling Interval. Will wait for your Update.
Thank you for your feedback. I'm grad people using scaled job for many use cases!
Most helpful comment
Thank you for your feedback. I'm grad people using scaled job for many use cases!