Alertmanager: Duplicate alerts in UI

Created on 25 Feb 2020  路  10Comments  路  Source: prometheus/alertmanager

Hello,

Since I added multiple routes (with a continue: true) I see my alerts in double in the UI of alertmanager (version 0.20.0)

I find old bugs, but if I understand well that should be fixed already

Edit: If I remove the continue: true they are not duplicated

global:
  resolve_timeout: 5m
  smtp_from: '[email protected]'
  smtp_smarthost: 'smtpintern.example.com:25'
  smtp_require_tls: false

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'email.all'
  routes:
    - receiver: 'email.all'
      continue: true
    - receiver: 'web.Hangout'
receivers:
- name: 'web.Hangout'
  webhook_configs:
  - url: 'http://localhost:6000/create?room_name=Prometheusalerts'
- name: 'email.all'
  email_configs:
  - to: '[email protected]'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
componenui help wanted

Most helpful comment

I discovered the other day that there is a drop-down menu (next to "Silenced" checkbox) to limit the receivers that are shown, this is definitely not obvious

All 10 comments

I agree it's confusing but at the same time, it's kind of expected since alerts matche the 2 groups. The UI could be improved to indicate that the groups are for different receivers.

Side-note, instead of using continue: true you can configure multiple integrations within one receiver like this:

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'receiverA'
receivers:
- name: 'receiverA'
  webhook_configs:
  - url: 'http://localhost:6000/create?room_name=Prometheusalerts'
  email_configs:
  - to: '[email protected]'

I could indeed use that, but now I expanded my route to the following and grouping the receivers like that would not work I guess

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'email.all'
  routes:
    - receiver: 'email.all'
      continue: true
    - receiver: 'web.HangoutPRD'
      match:
        environment: 'PRD'
    - receiver: 'web.Hangout'

You can still avoid continue: true and avoid duplication with YAML anchors.

route:
  receiver: 'notPRD'
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  routes:
    - receiver: 'PRD'
      match:
        environment: 'PRD'
receivers:
- name: 'PRD'
  webhook_configs:
  - url: 'http://localhost:6000/create?room_name=PRD'
  email_configs: &email-all
  - to: '[email protected]'
- name: 'notPRD'
  webhook_configs:
  - url: 'http://localhost:6000/create?room_name=Prometheusalerts'
  email_configs: *email-all

Thanks

Lille remark it seems that the . in email.all is making the parser unhappy

@bigon right, example updated :)

As my colleague @joe-elliott just ran into the same issue, I thought a bit about it:

It is confusing that the UI doesn't mention the receiver anywhere. You just see two completely identical groups because of that. Even the UI itself is confused by it: If I click the _Info_ button on one group, _both_ groups expand.

Perhaps it would be good to show the receiver somewhere. Or to dedup completely identical groups.

Perhaps it would be good to show the receiver somewhere.

It was my idea too.

I discovered the other day that there is a drop-down menu (next to "Silenced" checkbox) to limit the receivers that are shown, this is definitely not obvious

We just hit this issue attempting to upgrade. The duplication did not happen with the version we are on now (0.12.0) with the "all" receiver selected. I agree that this is very confusing and think the grouping should be merged when the "all" receiver is selected.

I'll add to the choir - just spent _way_ too much time figuring this out. Additionally, the 'Receiver' drop-down doesn't scroll down to show you all of the receivers - but you can type and it will auto-complete.

I recently implemented the routing technique @roidelapluie described in https://promcon.io/2019-munich/talks/improved-alerting-with-prometheus-and-alertmanager/ and this issue lead to a lot of head scratching.

I understand _why_ now, but it's not obvious.

Was this page helpful?
0 / 5 - 0 ratings