Serving: HPA on concurrency

Created on 8 Feb 2019 · 2Comments · Source: knative/serving

Proposal

HPA-class PodAutoscalers should be able to scale on concurrency as well as CPU. This requires exposing the Knative calculated concurrency metric as a K8s custom metric. And creating a v2beta2 HPA with a pointer to the custom metric. (v1 HPA does not support custom metrics).

The custom metrics server implementation can vary from provider-to-provider. But we will probably want to configure the Prometheus instance (which comes out-of-the-box) to collect the custom metrics and export it with the Promethus adapter.

areautoscale kinfeature

Source

josephburnett

Most helpful comment

I think we should implement a very lightweight version of the custom-metrics-server (https://github.com/kubernetes-incubator/custom-metrics-apiserver). After that's done and it works we can write up guides on how to connect the Knative installation to prometheus + configuring the prometheus custom-metrics adapter to work just as well.

That way we'd prevent people to need to buy into Prometheus (there have been concerns about its "heaviness").