Keda: Enable KEDA as a single source of truth on scaler metrics

Created on 22 Oct 2020 · 10Comments · Source: kedacore/keda

Enable KEDA to always output the scaler metrics it determines through a prometheus metrics endpoint for ScaledObjects and ScaledJobs. This allows it to becomes the single source of truth on metrics on scaling.

Use-Case

A very common requirement in a job queuing/event processing use-case is having metrics available for monitoring and alerting. The metrics allow alerting on work backing up and observing progress of work. Even though KEDA has to know this information for its own work it currently does not make it available in a way that allows it to be consistently used for these purposes.

KEDA 2 provides keda_metrics_adapter_scaler_metrics_value but that is only available for ScaledObjects with active deployments as it is produced by the metrics adapter. This means the metric is not available when no work is pending and more importantly never when ScaledJobs are used.

Currently the only way around this is to have another always running service/exporter which replicates the work the KEDA scalers performed to determine its information and use it to generate the metric. This nullifies a lot of the benefit of using KEDA scalers to begin with as instead of making use of the different scalers you could instead directly attach to this metric ensuring what your monitoring sees is consistent with what KEDA uses for scaling.

KEDA is in the ideal position to produce the metric as the single source of truth. It already needs to have the information and is always running. A user adding a ScaledJob/ScaledObject would naturally make the metric for it appear. This enables self-serving of the metric for developers solely based on KEDA CRDs. Also this way dashboards and alerting are based on the same information that was actually used for scaling.

Specification

[ ] Output scaler metrics values also for ScaledJobs
[ ] Make scaler metrics presentation consistent for ScaledObjects and ScaledJobs (tags can be used to differentiate)
[ ] Provide scaler metrics continuously even if no work is pending (e.g. through KEDA Operator)

feature-request needs-discussion

Source

hacst

❤1

Most helpful comment

We've discussed this on our standup and decide to provide metrics for all systems we scrape for metrics so they can be re-used in other systems.

tomkerkhove on 29 Oct 2020

👍3

All 10 comments

I think having a centralized way of pulling metrics (as opposed to having the metrics adapter and operator pull them separately) would be great but that would require some serious rework I believe.

As an interim step maybe it would make sense to look at what it would take to add Prometheus metrics collecting and exporting to the KEDA Operator first?

tbickford on 22 Oct 2020

I have basically no knowledge about keda's internals so I'm definitely not a good candidate to judge implementation complexity.

From what I understand adding a metrics endpoint to the operator would be able to provide ScaledJobs metrics and do so continuously? That would be a valuable thing on its own I think. We could use it.

Does the operator also have continuous knowledge of the ScaledObject metrics? Or does it stop polling those once the deployment becomes active (I kinda assumed it would)? If it polls continuously then the metrics in the operator if provided to the outside could already feel pretty SSOT to a user.

Making keda itself also not query redundantly and guarantee it is consistent internally as a benefit definitely could be a longer term thing. I didn't even consider that aspect when writing this FR.

hacst on 22 Oct 2020

Does the operator also have continuous knowledge of the ScaledObject metrics? Or does it stop polling those once the deployment becomes active (I kinda assumed it would)? If it polls continuously then the metrics in the operator if provided to the outside could already feel pretty SSOT to a user.

It polls continuously to find out whether the scaler is still active or not, so that should be doable.

Btw, KEDA is built on operator-sdk framework, which is based on Kubebuilder. There should be an API to expose metrics, see the docs: https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#metrics

zroubalik on 23 Oct 2020

Sounds good. If you think this could be a good first issue for someone and no one else already plans to tackle this short term I could try taking a stab at an implementation.

hacst on 23 Oct 2020

Added it to our roadmap to consider and our standup to discuss.

I like the idea, but we need to think this through if we won't be trying to solve two problems badly. But I'm not opposed to it.

tomkerkhove on 26 Oct 2020

The reason why I'm saying that is that I also maintain https://github.com/tomkerkhove/promitor and it starts with just adding Prometheus support; until somebody asks for another metric system to support, etc etc.

And then we're not even talking about the performance impact if we start having Prometheus et al poke us very often.

So I'm just trying to avoid that KEDA become a metric scraper rather than focussing on app scaling.

(To be clear - I'm definitely not opposed given I have a solution for scraping Azure Monitor metrics)

tomkerkhove on 26 Oct 2020

It polls continuously to find out whether the scaler is still active or not, so that should be doable.

Btw, KEDA is built on operator-sdk framework, which is based on Kubebuilder. There should be an API to expose metrics, see the docs: sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#metrics

If we just expose the metrics there, I don't see much of a problem to be honest.

tomkerkhove on 26 Oct 2020

Bottom line - We need a design doc on this I think :)

tomkerkhove on 26 Oct 2020

👍1

We've discussed this on our standup and decide to provide metrics for all systems we scrape for metrics so they can be re-used in other systems.

tomkerkhove on 29 Oct 2020

👍3

Bumping this.