Alertmanager: Feature Request: Prometehus/ Alertmanager HeartBeats

Created on 22 Oct 2018 · 5Comments · Source: prometheus/alertmanager

What did you do?
I need a nice metric or /heartbeat implemented for Prometheus, Alertmanager.

What did you expect to see?
like /status I think it will be nice to have /heartbeat or like an internal metric for Heartbeat.

What did you see instead? Under which circumstances?
http://localhost:9093/api/v1
/status
/silences
/alerts

Environment
CentOS Linux release 7.5.1804 (Core)

System information:
Linux 3.10.0-862.11.6.el7.x86_64 x86_64

Alertmanager version:
alertmanager, version 0.15.2

Prometheus version:
prometheus, version 2.2.1

Source

fchiorascu

Most helpful comment

Hi @fchiorascu

There are several best practice for meta-monitoring (monitoring the health of your monitoring setup). What we do in the kube-prometheus case is:

Setup Prometheus to constantly fire a particular alert (deadmanswitch)
Send Alert to Alertmanager
Configure Alertmanager to direct alert to something like Pagerduties Dead Man's Snitch

This setup enforces the general availability of the pipeline. In addition I would configure Prometheus to monitor Alertmanager and add a couple of alerting rules inside Prometheus (e.g. see here).

Let me know what you think.

mxinden on 29 Oct 2018

👍3

All 5 comments

@fchiorascu I am not entirely sure I understand your feature request. Would you like an endpoint that sends a heartbeat via server-side-events to the requesting client? What is your overall goal?

In case you want to test your monitoring pipeline, I would suggest adding an always firing alerting rule to Prometheus and routing the alert to the above mentioned client via Alertmanager.

mxinden on 23 Oct 2018

👍1

Hi Max,

Yes this is the point for HeartBeat, I was thinking at the second point from begining but I think will be nice to have a Heart Beat.

Alertmanager ==status==> WebHook (Requesting Client). The overall goal is to check on a interval ,time by time from requesting client side the status of Alertmanager in order to not loose the connectivity and to not observe. The WebHook client in fact is a tool that is able to route the alerts received from Alertmanager, e.g pagerduty, victorops, etc.
In the second case setting an alert in firing state all the time like up == 1 for a host or something, do you think is a good idea? Or you have another type of alert in mind? The think was that is best to separate the alerting scenarios of status exposed by Alertmanager and to limit the interactions in case if you have multiple projects will be wise that the requesting client ask Alertmanager for /health status by exposing an separate endpoint.

Let me know what do you think.

Thank you in advance.
Kind Regards,
Florian

fchiorascu on 23 Oct 2018

Hi @fchiorascu

There are several best practice for meta-monitoring (monitoring the health of your monitoring setup). What we do in the kube-prometheus case is:

Setup Prometheus to constantly fire a particular alert (deadmanswitch)
Send Alert to Alertmanager
Configure Alertmanager to direct alert to something like Pagerduties Dead Man's Snitch

This setup enforces the general availability of the pipeline. In addition I would configure Prometheus to monitor Alertmanager and add a couple of alerting rules inside Prometheus (e.g. see here).

Let me know what you think.

mxinden on 29 Oct 2018

👍3

Hi @mxinden

Thank you for this scenarios and for support also for your time.
I'll try to implement this approach.

Thank you,
Florian

fchiorascu on 29 Oct 2018

Closing it for now. @fchiorascu feel free to continue the discussion on the mailing list.