Victoriametrics: Alert Manager Integration

Created on 24 Jul 2019 · 14Comments · Source: VictoriaMetrics/VictoriaMetrics

There is no alert support in VM because of nonsupporting of remote read.

Need to create an solution which support Prometheus alert rules and does integration with AlertManager.

Basic requirements:

support of Promtheus protocol

Nice to have:

adding new rules on the fly (no annoying HUP signals)

enhancement

Source

tenmozes

👍21

Most helpful comment

I'm in favor of this request to provide integration with AlertManager, rather than use Grafana or promxy as alerting system for VM, or in this regard have (another) own vmalert system. Let me clarify the reasons:

Grafana is only aware of metrics provided by (Prometheus, VM or any other) data sources, therefore writing alerts requires additional care to distinguish data-source outage vs real (metric) issues. For example, if you set an alert on a metric then you should write an additional rules to handle the situation that you back-end will be down otherwise you end-up with firing alerts just because Grafana does not see the metrics due to data-source outage.
- another reason against using Grafana alerts that when number of alerts grow it is much better to write them as any other program (and use unit tests, etc.) rather going through individual graphs, basically at some level alert management dictate to use codebase interface instead of UI.
vmproxy is not an alert middleware, even though it can provide support for alert rules/records, it deals strictly with Prometheus and remote back-ends, but it does not support alerts firing outside of those sources, e.g. I can fire alert in AlertManger from amtool (meaning I can run it in my job, shell script, or simply on command line), and alert will be fired and independent from the data-source, but it still will be valid alert in alert system
if VM developers want to develop a custom vmalert tool, it would be yet another middleware to support and clients who rely on Prometheus/AlertManager will need to support yet another system In my mind if VM provides integration with Prometheus, then it is nice to have similar integration with AlertManager as a common infrastructure.

vkuznet on 8 Jan 2020

👍2

All 14 comments

Is the remote read just needed to read back the rules that are stored?

gregwebs on 24 Dec 2019

Is the remote read just needed to read back the rules that are stored?

I'm afraid I don't understand the question :(

There are plans to create a dedicated service - vmalert, which would accept Prometheus-compatible alerting configs and evaluate PromQL queries from these configs on the configured remote system via /api/v1/query. The vmalert should work with any system with /api/v1/query support such as VictoriaMetrics, Prometheus, Thanos, Cortex, M3DB, etc.

valyala on 25 Dec 2019

I am still learning about the the Prom ecosystem. It seems it should be possible to use an existing open source tool that does this like Ruler from Cortex?

gregwebs on 25 Dec 2019

It looks like Ruler for Cortex speaks custom gRPC instead of Prometheus querying API :(

There is Promxy, which can be used as alerter for VictoriaMetrics.

valyala on 25 Dec 2019

Why not just use promxy then?

gregwebs on 26 Dec 2019

Promxy can be used as alerter for VictoriaMetrics as I already mentioned above. But we want creating our own vmalert service specially dedicated for alerting needs, since the main purpose of Promxy is slightly different than alerting. It merges and de-duplicates data obtained from multiple sources that speak Prometheus querying API.

valyala on 26 Dec 2019

Okay, so it seems like the use case is probably for a deployment that is not already running Promxy. I think that to have an HA deployment Promxy will already be required somewhere. But perhaps less after replication or object storage is implemented.

gregwebs on 27 Dec 2019

👍1

Grafana is only aware of metrics provided by (Prometheus, VM or any other) data sources, therefore writing alerts requires additional care to distinguish data-source outage vs real (metric) issues. For example, if you set an alert on a metric then you should write an additional rules to handle the situation that you back-end will be down otherwise you end-up with firing alerts just because Grafana does not see the metrics due to data-source outage.
- another reason against using Grafana alerts that when number of alerts grow it is much better to write them as any other program (and use unit tests, etc.) rather going through individual graphs, basically at some level alert management dictate to use codebase interface instead of UI.
vmproxy is not an alert middleware, even though it can provide support for alert rules/records, it deals strictly with Prometheus and remote back-ends, but it does not support alerts firing outside of those sources, e.g. I can fire alert in AlertManger from amtool (meaning I can run it in my job, shell script, or simply on command line), and alert will be fired and independent from the data-source, but it still will be valid alert in alert system
if VM developers want to develop a custom vmalert tool, it would be yet another middleware to support and clients who rely on Prometheus/AlertManager will need to support yet another system In my mind if VM provides integration with Prometheus, then it is nice to have similar integration with AlertManager as a common infrastructure.

vkuznet on 8 Jan 2020

👍2

vmproxy is not an alert middleware

Do you mean promxy?

clients who rely on Prometheus/AlertManager will need to support yet another system

Why can't you still send alerts to AlertManager directly? I think the idea is that vmalert will still be sending alerts to AlertManager.

gregwebs on 9 Jan 2020

Yes, I meant promxy.

And, I'll be happy to send alerts to AlertManager, but I also in favor of preserving the same rules/records syntax as Prometheus does, i.e. we write alert rules in Prometheus.

For instance, there are VM specific functions, e.g. recently introduced Hoeffding bound, which I want to set alert on. So far I can only do it in Grafana, but I rather prefer to have programatic way to write/test alerts in a similar way as we do in Prometheus.

vkuznet on 18 Jan 2020

And, if VM team in favor of implementing vmalert tool, do you have any requirement docs? Is there any time-frame to implement this?

vkuznet on 18 Jan 2020

And, if VM team in favor of implementing vmalert tool, do you have any requirement docs? Is there any time-frame to implement this?

There are no formal requirements for vmalert tool yet. Basic requirements are:

it must accept Prometheus configs for alerting;
it must be able to query arbitrary service, which provides /api/v1/query* endpoints from Prometheus querying API;
it must accept MetricsQL queries.

Unfortunately there is no ETA for vmalert tool. We are open to third-party contributions.

valyala on 21 Jan 2020

The initial version of vmalert you may find here https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmalert

tenmozes on 12 Apr 2020

Another issue with promxy is that it doesn't support MetricsQL. This is a limitation for VM HA as well.