Keda: Prometheus Scaler - Maintain last known state if prometheus is unavailable

Created on 3 Aug 2020 · 17Comments · Source: kedacore/keda

I am in the process of migrating to Keda. I currently use https://github.com/DirectXMan12/k8s-prometheus-adapter and it has a very useful feature. In the event that Prometheus goes down, prom-adapter maintains the last known state of the metric. This means scaling is not triggered either up or down.

With Keda, if prometheus is not available, my deployments are scaled to zero after the cooldownPeriod has expired regardless of whether the last known value was above 0 or not.

Use-Case

We are using prom adapter to scale google pubsub subscribers and rabbitmq workers. In the unlikely event that prometheus goes down we would want the existing workload to continue processing based on the numbers it knew before prometheus stopped responding.

feature-request needs-discussion

Source

bryanhorstmann

Most helpful comment

"Maintain last known state" - I think this approach has its drawbacks, especially when autoscaling to zero via minReplicaCount: 0. Imagine that you can't wake up your system, because the Keda Operator can't temporarily reach the source of metrics.

I just hit this problem with postgresql trigger. After a security group change in our AWS account, the Keda Operator suddenly couldn't reach our Postgres database and the whole system just scaled down to zero, making the service unavailable.

I propose a new (optional) field onErrorReplicaCount that would serve as a default value when Operator can't read current values, ie.:

apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: my-deployment-autoscaler
spec:
  scaleTargetRef:
    deploymentName: my-deployment
  pollingInterval: 1
  cooldownPeriod: 1200
  minReplicaCount: 0
  maxReplicaCount: 10
  onErrorReplicaCount: 2     # <==  2 pods, in case of a trigger source being unavailable

VojtechVitek on 18 Sep 2020

👍3

All 17 comments

We shouldn't probably limit this feature only to Prometheus scalers, but make it available to all scalers.
To make this work, we will need to store the last know metrics (probably in ScaledObject.Status) and use this one in case of scaler failure?

zroubalik on 3 Aug 2020

Fair ask, but how do we deal with cases where the metric is higher than the threshold and Prometheus goes down. How do we avoid scaling loops on stale data?

What if we keep track of:

When we were able to last connect
What the last known value was
What the instance count was

But we remove the HPA and ensure the workload stays on the instance count of (3). That way it doesn't scale to 0 but you don't have scaling loops as well; otherwise you can flood clusters.

tomkerkhove on 3 Aug 2020

I'm not sure I understand what you mean by scaling loop? If the last known value is above the threshold, you'll have x pods. The last known value will never change so your pods should be static.

Example:

ScaledObject with threshold of 10
Current value is 15. You have 2 pods.
Prometheus goes down. Last known value is still 15.
Even if the queue that should be processed goes to 0 (or 100) the hpa still receives 15 from the metrics server
Pods remain on 2.

bryanhorstmann on 4 Aug 2020

Well if the value is still 15 it will be bigger than 10 so KEDA would add an instance, and another one, and another one.

tomkerkhove on 4 Aug 2020

I might be misunderstanding how the calculations work, but my understanding is the metric value would be 15, but as there are 2 pods, the average value would be 7.5. If the threshold is 10, then no new pods would be scaled in by the HPA.

bryanhorstmann on 4 Aug 2020

Notes from standup yesterday:

Prometheus metric adapter will maintain the same instance count, while KEDA will scale back to 0 if it doesnt’ have the count

HPAs receiving errors from metric servers stop taking any action; this seems liek a safe aproach for us. Otherwise we can create autoscaling loops flooding whole cluster.

This is how it works today so we are good to go

We shouldn’t do 1 -> 0 nor 0 -> 1 scaling if we cannot fetch metrics

Would be nice for 2.0, otherwise afterwards

@bryanhorstmann We just feed current & target metric to the HPA so it will keep on scaling.

tomkerkhove on 7 Aug 2020

Thanks for the feedback @tomkerkhove. I'm glad this is being considered as a nice to have for 2.0.

bryanhorstmann on 7 Aug 2020

@bryanhorstmann do you think this is something that you could contribute?

zroubalik on 7 Aug 2020

Hi @zroubalik,

I'm happy to have a go at it. Will need to do some digging and research as I've not dug too deeply into the code base.

My understanding is that the only part that needs actual work is:

We shouldn’t do 1 -> 0 nor 0 -> 1 scaling if we cannot fetch metrics

bryanhorstmann on 7 Aug 2020

Yes, but it is unfortunately not that trivial. Currently we are checking and marking the status of a scaler in isActive property, in short: this is set to false if there are no metrics for scaling or if the scaler is unavailable. This behavior needs to be changed (and for consistency for all scalers). And then perform the 1 <-> 0 scaling based on this, currently it is being done based on isActive after the cooldownPeriod is over.

There should be settings as well, where users can specify the behaviour of this feature (timeout, enable/disable this feature, etc..)

@bryanhorstmann and no pressure on you, if you don't feel confident enough to do such a big change :)

@ahmelsayed FYI^

zroubalik on 7 Aug 2020

Hi @zroubalik, thank you. I think I'll step away from this one, but will be watching the progress.

bryanhorstmann on 13 Aug 2020

👍1

I propose a new (optional) field onErrorReplicaCount that would serve as a default value when Operator can't read current values, ie.:

apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: my-deployment-autoscaler
spec:
  scaleTargetRef:
    deploymentName: my-deployment
  pollingInterval: 1
  cooldownPeriod: 1200
  minReplicaCount: 0
  maxReplicaCount: 10
  onErrorReplicaCount: 2     # <==  2 pods, in case of a trigger source being unavailable

VojtechVitek on 18 Sep 2020

👍3

@VojtechVitek makes sense. Is it something you are willing to contribute?

zroubalik on 18 Sep 2020

Is there any process for getting a proposal officially accepted?

I'm thinking a better name for the field would be defaultReplicaCount, which would equal to minReplicaCount value by default unless explicitly specified.