Enable KEDA to always output the scaler metrics it determines through a prometheus metrics endpoint for ScaledObjects and ScaledJobs. This allows it to becomes the single source of truth on metrics on scaling.
A very common requirement in a job queuing/event processing use-case is having metrics available for monitoring and alerting. The metrics allow alerting on work backing up and observing progress of work. Even though KEDA has to know this information for its own work it currently does not make it available in a way that allows it to be consistently used for these purposes.
KEDA 2 provides keda_metrics_adapter_scaler_metrics_value but that is only available for ScaledObjects with active deployments as it is produced by the metrics adapter. This means the metric is not available when no work is pending and more importantly never when ScaledJobs are used.
Currently the only way around this is to have another always running service/exporter which replicates the work the KEDA scalers performed to determine its information and use it to generate the metric. This nullifies a lot of the benefit of using KEDA scalers to begin with as instead of making use of the different scalers you could instead directly attach to this metric ensuring what your monitoring sees is consistent with what KEDA uses for scaling.
KEDA is in the ideal position to produce the metric as the single source of truth. It already needs to have the information and is always running. A user adding a ScaledJob/ScaledObject would naturally make the metric for it appear. This enables self-serving of the metric for developers solely based on KEDA CRDs. Also this way dashboards and alerting are based on the same information that was actually used for scaling.
ScaledJobsScaledObjects and ScaledJobs (tags can be used to differentiate)I think having a centralized way of pulling metrics (as opposed to having the metrics adapter and operator pull them separately) would be great but that would require some serious rework I believe.
As an interim step maybe it would make sense to look at what it would take to add Prometheus metrics collecting and exporting to the KEDA Operator first?
I have basically no knowledge about keda's internals so I'm definitely not a good candidate to judge implementation complexity.
From what I understand adding a metrics endpoint to the operator would be able to provide ScaledJobs metrics and do so continuously? That would be a valuable thing on its own I think. We could use it.
Does the operator also have continuous knowledge of the ScaledObject metrics? Or does it stop polling those once the deployment becomes active (I kinda assumed it would)? If it polls continuously then the metrics in the operator if provided to the outside could already feel pretty SSOT to a user.
Making keda itself also not query redundantly and guarantee it is consistent internally as a benefit definitely could be a longer term thing. I didn't even consider that aspect when writing this FR.
Does the operator also have continuous knowledge of the ScaledObject metrics? Or does it stop polling those once the deployment becomes active (I kinda assumed it would)? If it polls continuously then the metrics in the operator if provided to the outside could already feel pretty SSOT to a user.
It polls continuously to find out whether the scaler is still active or not, so that should be doable.
Btw, KEDA is built on operator-sdk framework, which is based on Kubebuilder. There should be an API to expose metrics, see the docs: https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#metrics
Sounds good. If you think this could be a good first issue for someone and no one else already plans to tackle this short term I could try taking a stab at an implementation.
Added it to our roadmap to consider and our standup to discuss.
I like the idea, but we need to think this through if we won't be trying to solve two problems badly. But I'm not opposed to it.
The reason why I'm saying that is that I also maintain https://github.com/tomkerkhove/promitor and it starts with just adding Prometheus support; until somebody asks for another metric system to support, etc etc.
And then we're not even talking about the performance impact if we start having Prometheus et al poke us very often.
So I'm just trying to avoid that KEDA become a metric scraper rather than focussing on app scaling.
(To be clear - I'm definitely not opposed given I have a solution for scraping Azure Monitor metrics)
It polls continuously to find out whether the scaler is still active or not, so that should be doable.
Btw, KEDA is built on operator-sdk framework, which is based on Kubebuilder. There should be an API to expose metrics, see the docs: sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#metrics
If we just expose the metrics there, I don't see much of a problem to be honest.
Bottom line - We need a design doc on this I think :)
We've discussed this on our standup and decide to provide metrics for all systems we scrape for metrics so they can be re-used in other systems.
Bumping this.
Most helpful comment
We've discussed this on our standup and decide to provide metrics for all systems we scrape for metrics so they can be re-used in other systems.