What did you do?
Query multiple Prometheus instances for alertmanager_alerts to get the active alert count for each one.
alertmanager_alerts{state="active"}
What did you expect to see?
I was expecting to see 0 active alerts from Alertmanagers that currently don't have any active, unsilenced alerts, e.g.:
alertmanager_alerts{instance="alertmanager:80",job="alertmanager",state="active"} 0
What did you see instead? Under which circumstances?
From some Alertmanager instances, we get >0 alerts back, even if there are none showing in the web interface.
In this particular case, we did have 2 active but silenced alerts. The numbers still don't make sense - at least to me.
alertmanager_alerts{instance="alertmanager:80",job="alertmanager",state="active"} 3
alertmanager_alerts{instance="alertmanager:80",job="alertmanager",state="suppressed"} 1
If this is how it's supposed to be, can we please also get a metric for the silenced label, so that we could get active - suppressed - silenced = actually active alerts?
Environment
Kubernetes 1.10.5
...
Branch: HEAD
BuildDate: 20180622-11:58:41
BuildUser: root@bec9939eb862
GoVersion: go1.10.3
Revision: 462c969d85cf1a473587754d55e4a3c4a2abc63c
Version: 0.15.0
Version: 2.3.1
Revision: 188ca45bd85ce843071e768d855722a9d9dabe03
Branch: HEAD
BuildUser: root@82ef94f1b8f7
BuildDate: 20180619-15:56:22
GoVersion: go1.10.3
...
...
...
I've seen this discrepancy before too. IIRC the reason is that the metrics are computed based on the in-memory store of alerts which include the resolved ones. Those are deleted only every 30 minutes (by default). That being said, I agree that it would be nice if the numbers include only firing alerts.
FWIW suppressed is the sum of inhibited and silenced alerts.
Right, maybe just an additional metric with the label firing that shows the number of alerts that would be seen on the the index page. Probably easier to implement than changing the whole purging stuff.
:+1:
Seeing the same issue. We have a 3-node Alertmanager cluster. There are none showing in the web interface. But when I query alertmanager_alerts, there are a lot active alerts. This really confused me.
alertmanager_alerts{instance="alertmanager1:9093",job="alertmanager",state="active"} 8
alertmanager_alerts{instance="alertmanager1:9093",job="alertmanager",state="suppressed"} 2
alertmanager_alerts{instance="alertmanager2:9093",job="alertmanager",state="active"} 63
alertmanager_alerts{instance="alertmanager2:9093",job="alertmanager",state="suppressed"} 2
alertmanager_alerts{instance="alertmanager3:9093",job="alertmanager",state="active"} 10
alertmanager_alerts{instance="alertmanager3:9093",job="alertmanager",state="suppressed"} 3
So, any news on this, or a way to go workaround it?
Any update with this?