Keda: KEDA scaledjob with accurate strategy doesn't consider messages in flight when calculating Scale.

Created on 12 Nov 2020 · 4Comments · Source: kedacore/keda

A clear and concise description of what the bug is.

 Keda job scaler spawns new pods indefinitely if the spawned pods don't consume the messages in the queue within the next polling Iteration.

When the strategy specified is accurate, It is expected that the Scaler's SDK can return queuelength not including messages that are being processed for queues like AWS-SQS etc. So that the scale depends solely on the queuelength. But in cases when there's a resource crunch in the cluster, the spawned pods might not get placed onto the nodes until the next polling iteration, which means they don't consume the messages from the queue either. During the next poll the scaler again spawns the new pods equal to the queuelength.

Expected Behavior

Eg:

Trigger : SQS
Available Messages : 100
Messages locked : 50
Pods Running : 50
Poll Duration : 60 seconds
Target Average Value = 1

At polling Iteration T1, 100 new pods are spawned. None of the pods were successfully scheduled. no messages were consumed.
At polling Iteration T2, 100 messages are still available in the queue. but no new pods are created as TotalPodsCurrentlyActive = messagesAvailable + messagesLocked.

Actual Behavior

At polling Iteration T2, 100 messages are still available in the queue. So 100 new pods are created again assuming the messages were received after the last polling Iteration.

Specifications

KEDA Version:2.0.0-rc2
Scaler(s): SQS

bug

Source

kiran-bjn

Most helpful comment

Thank you for your feedback. I'm grad people using scaled job for many use cases!

TsuyoshiUshio on 19 Nov 2020

👍3

All 4 comments

@TsuyoshiUshio PTAL

zroubalik on 12 Nov 2020

As we discussed on the closed PR, scheduledJobCount might be help. I'm not sure who we can make it, so that, let me investigate it. Until then, longer polling interval and if the container has 0 message, it will quit. might work.

TsuyoshiUshio on 19 Nov 2020

Thanks @TsuyoshiUshio , I am already using a longer polling Interval and exiting the job if queue is empty as a non-blocker. For our use-case, we want the scaling to be very responsive and not have any delay in the system due to the polling Interval. Will wait for your Update.

kiran-bjn on 19 Nov 2020

👍1

Thank you for your feedback. I'm grad people using scaled job for many use cases!

TsuyoshiUshio on 19 Nov 2020

👍3

Was this page helpful?

0 / 5 - 0 ratings