Is your feature request related to a problem? Please describe.
Pulsar is offering many built-in metrics, which is pretty awesome. You can find metrics on any layer, from the pulsar-proxy to Bookkeeper. There are easy to plot within Grafana as most of the rate are already computed, for example pulsar_throughput_in is directly offering the total throughput of the namespace coming into this broker in bytes/second.
However, for some usages, metrics based on rate are not enough. For example, to bill users according to messages produced, you cannot rely on a rate, you need the raw counter associated.
Describe the solution you'd like
I would like to add a new boolean to the configuration exposeTopicLevelMetricsAsCounterInPrometheus, which default is false. By setting it to true, broker will expose new counter metrics, which will be the raw counters used to compute the rates internally. The new metrics could be suffixed by _counter, for example:
and so on. We could also do the same work for any metrics which are a rate, but I think we should start with the topic metrics.
Describe alternatives you've considered
Using the rate to compute the counter is not offering enough precision.
What do you think about it? We can discuss on the name of the parameter and the naming schemas.
Thanks @PierreZ for working on this. it sounds good to me. Add @sijie @merlimat, to see if they have more suggestions.
@PierreZ +1
Can I suggest using names like "pulsar_in_bytes_total" or "pulsar_in_bytes_counter", "pulsar_in_messages_total" or "pulsar_in_messages_counter"?
I like pulsar_in_bytes_total!
I started working on it. From what I see, the rate is computed through Rate, which seems to be hold by Producer which Represents a currently connected producer. The rate is polled frequently. As the classes are representing by connected {Producer, Consumer} and the fact the update is async, we can miss events between two polls if a {Producer, Consumer} disconnects.
I'm wondering if the metrics can be generated from a place less volatile, maybe at the opening of the managed ledger? What do you think?
Found a naive solution, by creating a Counter directly on the AbstractTopic class. I am not sure if this is recommend though. What do you think @sijie?
I don't think we have used Prometheus counter directly.
I would suggest:
Thanks, I will do this 馃憤
Most helpful comment
@PierreZ +1
Can I suggest using names like "pulsar_in_bytes_total" or "pulsar_in_bytes_counter", "pulsar_in_messages_total" or "pulsar_in_messages_counter"?