Victoriametrics: Alert Manager Integration

Created on 24 Jul 2019  路  14Comments  路  Source: VictoriaMetrics/VictoriaMetrics

There is no alert support in VM because of nonsupporting of remote read.

Need to create an solution which support Prometheus alert rules and does integration with AlertManager.

Basic requirements:

  • support of Promtheus protocol

Nice to have:

  • adding new rules on the fly (no annoying HUP signals)
enhancement

Most helpful comment

I'm in favor of this request to provide integration with AlertManager, rather than use Grafana or promxy as alerting system for VM, or in this regard have (another) own vmalert system. Let me clarify the reasons:

  • Grafana is only aware of metrics provided by (Prometheus, VM or any other) data sources, therefore writing alerts requires additional care to distinguish data-source outage vs real (metric) issues. For example, if you set an alert on a metric then you should write an additional rules to handle the situation that you back-end will be down otherwise you end-up with firing alerts just because Grafana does not see the metrics due to data-source outage.

    • another reason against using Grafana alerts that when number of alerts grow it is much better to write them as any other program (and use unit tests, etc.) rather going through individual graphs, basically at some level alert management dictate to use codebase interface instead of UI.

  • vmproxy is not an alert middleware, even though it can provide support for alert rules/records, it deals strictly with Prometheus and remote back-ends, but it does not support alerts firing outside of those sources, e.g. I can fire alert in AlertManger from amtool (meaning I can run it in my job, shell script, or simply on command line), and alert will be fired and independent from the data-source, but it still will be valid alert in alert system
  • if VM developers want to develop a custom vmalert tool, it would be yet another middleware to support and clients who rely on Prometheus/AlertManager will need to support yet another system In my mind if VM provides integration with Prometheus, then it is nice to have similar integration with AlertManager as a common infrastructure.

All 14 comments

Is the remote read just needed to read back the rules that are stored?

Is the remote read just needed to read back the rules that are stored?

I'm afraid I don't understand the question :(

There are plans to create a dedicated service - vmalert, which would accept Prometheus-compatible alerting configs and evaluate PromQL queries from these configs on the configured remote system via /api/v1/query. The vmalert should work with any system with /api/v1/query support such as VictoriaMetrics, Prometheus, Thanos, Cortex, M3DB, etc.

I am still learning about the the Prom ecosystem. It seems it should be possible to use an existing open source tool that does this like Ruler from Cortex?

It looks like Ruler for Cortex speaks custom gRPC instead of Prometheus querying API :(

There is Promxy, which can be used as alerter for VictoriaMetrics.

Why not just use promxy then?

Promxy can be used as alerter for VictoriaMetrics as I already mentioned above. But we want creating our own vmalert service specially dedicated for alerting needs, since the main purpose of Promxy is slightly different than alerting. It merges and de-duplicates data obtained from multiple sources that speak Prometheus querying API.

Okay, so it seems like the use case is probably for a deployment that is not already running Promxy. I think that to have an HA deployment Promxy will already be required somewhere. But perhaps less after replication or object storage is implemented.

I'm in favor of this request to provide integration with AlertManager, rather than use Grafana or promxy as alerting system for VM, or in this regard have (another) own vmalert system. Let me clarify the reasons:

  • Grafana is only aware of metrics provided by (Prometheus, VM or any other) data sources, therefore writing alerts requires additional care to distinguish data-source outage vs real (metric) issues. For example, if you set an alert on a metric then you should write an additional rules to handle the situation that you back-end will be down otherwise you end-up with firing alerts just because Grafana does not see the metrics due to data-source outage.

    • another reason against using Grafana alerts that when number of alerts grow it is much better to write them as any other program (and use unit tests, etc.) rather going through individual graphs, basically at some level alert management dictate to use codebase interface instead of UI.

  • vmproxy is not an alert middleware, even though it can provide support for alert rules/records, it deals strictly with Prometheus and remote back-ends, but it does not support alerts firing outside of those sources, e.g. I can fire alert in AlertManger from amtool (meaning I can run it in my job, shell script, or simply on command line), and alert will be fired and independent from the data-source, but it still will be valid alert in alert system
  • if VM developers want to develop a custom vmalert tool, it would be yet another middleware to support and clients who rely on Prometheus/AlertManager will need to support yet another system In my mind if VM provides integration with Prometheus, then it is nice to have similar integration with AlertManager as a common infrastructure.

vmproxy is not an alert middleware

Do you mean promxy?

clients who rely on Prometheus/AlertManager will need to support yet another system

Why can't you still send alerts to AlertManager directly? I think the idea is that vmalert will still be sending alerts to AlertManager.

Yes, I meant promxy.

And, I'll be happy to send alerts to AlertManager, but I also in favor of preserving the same rules/records syntax as Prometheus does, i.e. we write alert rules in Prometheus.

For instance, there are VM specific functions, e.g. recently introduced Hoeffding bound, which I want to set alert on. So far I can only do it in Grafana, but I rather prefer to have programatic way to write/test alerts in a similar way as we do in Prometheus.

And, if VM team in favor of implementing vmalert tool, do you have any requirement docs? Is there any time-frame to implement this?

And, if VM team in favor of implementing vmalert tool, do you have any requirement docs? Is there any time-frame to implement this?

There are no formal requirements for vmalert tool yet. Basic requirements are:

Unfortunately there is no ETA for vmalert tool. We are open to third-party contributions.

The initial version of vmalert you may find here https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmalert

Another issue with promxy is that it doesn't support MetricsQL. This is a limitation for VM HA as well.

Was this page helpful?
0 / 5 - 0 ratings