Alertmanager: Alert severity not correctly set on PagerDuty Alert

Created on 25 Jan 2018  路  13Comments  路  Source: prometheus/alertmanager

I'm trying to migrate to the PagerDuty API v2. I already changed to use the routing_key in the configuration so that AM would use the API v2. What I can not find how to do is passing on to PagerDuty the severity that is declared as a label in the alert.

curl -XPOST -k https://somehost/api/v1/alerts -d '[{
        "status": "firing",
        "labels": {
                "alertname": "TestAlertWarning-13127",
                "service": "my-service",
                "team": "C Team",
                "severity":"warning",
                "instance": "TestAlertWarning-13127.example.net"
        },
        "annotations": {
                "summary": "High latency is high!"
        },
        "generatorURL": "http://prometheus.int.example.net/<generating_expression>"
}]'

What is happening now is that AM is sending a "error" severity to PagerDuty, which looking in the code I could see that is the default behavior when no severity is set.

Not sure if it is a code or documentation issue, would appreciate some help.

  • Alertmanager version:

Branch: HEAD
BuildDate: 20180112-10:32:46
BuildUser: root@d83981af1d3d
GoVersion: go1.9.2
Revision: fb713f6d8239b57c646cae30f78e8b4b8861a1aa
Version: 0.13.0

  • Prometheus version:

Version 1.8.2
Revision 5211b96d4d1291c3dd1a569f711d3b301b635ecb
Branch HEAD
BuildUser root@1412e937e4ad
BuildDate 20171104-16:09:14
GoVersion go1.9.2

  • Alertmanager configuration file:
  - receiver: CTeam-pagerduty-alert
    match:
      team: C Team
...
- name: CTeam-pagerduty-alert
  pagerduty_configs:
  - send_resolved: true
    routing_key: <secret>
    url: https://events.pagerduty.com/v2/enqueue
    client: '{{ template "pagerduty.default.client" . }}'
    client_url: '{{ template "pagerduty.default.clientURL" . }}'
    description: '{{ template "pagerduty.default.description" .}}'
    details:
      firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
      num_firing: '{{ .Alerts.Firing | len }}'
      num_resolved: '{{ .Alerts.Resolved | len }}'
      resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'

Most helpful comment

@swestcott @mjuarez

What worked for me was ".CommonLabels.severity".

We then added this:

{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower }}{{ else }}critical{{ end }}

Which makes the severity label low case, the PagerDuty API seems to like it more. We then force an alert to be critical in case severity is not set, which I think is a good practice.

Full config for reference, where we also added class, component and group, which are fields supported by the pagerduty API.

- name: team-pagerduty-alert
  pagerduty_configs:
  - send_resolved: true
    routing_key: <secret>
    url: https://events.pagerduty.com/v2/enqueue
    client: '{{ template "pagerduty.default.client" . }}'
    client_url: '{{ template "pagerduty.default.clientURL" . }}'
    description: '{{ template "pagerduty.default.description" .}}'
    details:
      firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
      num_firing: '{{ .Alerts.Firing | len }}'
      num_resolved: '{{ .Alerts.Resolved | len }}'
      resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
    severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower
      }}{{ else }}critical{{ end }}'
    class: '{{ .CommonLabels.class }}'
    component: '{{ .CommonLabels.component }}'
    group: {{ if .CommonLabels.environment }}.{{ .CommonLabels.environment }}{{
      end }}{{ if .CommonLabels.region }}.{{ .CommonLabels.region }}{{ end }}{{ if
      .CommonLabels.service }}.{{ .CommonLabels.service }}{{ end }}

For reference on PagerDuty fields: https://v2.developer.pagerduty.com/docs/send-an-event-events-api-v2

All 13 comments

Severity is set in pagerduty_configs:

- name: CTeam-pagerduty-alert
  pagerduty_configs:
  - send_resolved: true
    routing_key: <secret>
    severity: warning

Sorry, this was one of the tests I've done.

If you remove it, it will have the behavior I described. Its not passing on the severity and I can't find in the documentation how to do it.

Ah, I misunderstood, sorry about that. As it is now, the severity can only be defined statically in the receiver.

To pass on the severity in the label, the code needs to be slightly updated here: https://github.com/prometheus/alertmanager/blob/6a3dfaff45fafba1ef8553451e6b1fc0435b6523/notify/impl.go#L509

n.conf.Severity needs to become tmpl(n.conf.Severity). Then you can use the normal templating to access the common severity label.

I would recommend against this, as if an alert group has mixed severity labels (one is severe, one is warning), then that common label isn't there for templating. I'm not sure how you have your grouping configured, though.

I would recommend in your routes defining a match on severity=warning and a separate severity=critical (or whatever the levels are), and having them point to two different receivers. It's more verbose, but perhaps clearer to others reading the config that don't know how the grouping and common labels work.

That having been said, if you want to add templating to the severity in the link above, that would be fine by me (although I would still recommend being explicit).

Thanks for reopening.

So in my use of AM, we have around 10 teams and many more services. Each team will decide on the severity of an alert based on the type of problem and the environment in which the problem is happening. Also, PagerDuty makes use of the Severity to define what is the urgency of a alert and with that decide how to contact on call engineers.

So in my opinion it would be very useful to have a way to pass the severity configured in the alert directly to PagerDuty, even if that is not the default behavior. Looks like that is the alternative you suggested and I'm very much in favor of it.

Creating routing rules for all teams I have and for all four valid seventies in PagerDuty (Info, Warning, Error and Critical) will be a little bit messy. Still, looks like that is the alternative I have for now.

Also, I think offering a template for other fields provided by the PagerDuty API v2, such as Group and Component might be useful, so we can have a more generic configuration file and have information being dynamically set according to what is set in the alert.

@stuartnelson3 I went ahead and created the PR with the solution you mentioned in case you agree with integrating it.

Also, the routing solution works, still it creates a giant config file. Every team will need to have something like this:

  - match:
      team: C Team
    routes:
    - receiver: CTeam-pagerduty-critical
      match:
        severity: critical
    - receiver: CTeam-pagerduty-warning
      match:
        severity: warning
    - receiver: CTeam-pagerduty-info
      match:
        severity: info
    - receiver: CTeam-pagerduty-error
      match:
        severity: error

In the end, more templates would make the configuration simpler.

@stuartnelson3 Please let me know if this is what you are looking for: https://github.com/prometheus/docs/pull/956

I'm trying to use the templated severity field, but the PD-CEF severity field isn't being passed to PD.

Here's my config, am I doing something wrong? I'm running AM 0.15.0-rc.1.

receivers:
- name: pagerduty-poc
  pagerduty_configs:
  - send_resolved: true
    service_key: <secret>
    severity: '{{ .Labels.severity }}'

I'm using the same curl request as above.

Oops, I've corrected _service_key_ to _routing_key_ to use the PD v2 API. AM is now getting "unexpected status code 400" from PD. Hard coding the severity to severity: 'warning' works so I guessing my template config isn't quite right yet.

I too am running into issues using labels with severity. Using AM 0.14.0 and Events v2 API. Tried both {{ .Labels.severity }} and {{$labels.severity}}

Output:

level=error ts=2018-05-16T18:40:52.740921204Z caller=notify.go:303 component=dispatcher msg="Error on notify" err="cancelling notify retry for \"pagerduty\" due to unrecoverable error: unexpected status code 400"
level=error ts=2018-05-16T18:40:53.005475343Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="cancelling notify retry for \"pagerduty\" due to unrecoverable error: unexpected status code 400"

Relevant am config as follows:

- name: nexus
  pagerduty_configs:
  - send_resolved: true
    routing_key: <secret>
    url: https://events.pagerduty.com/v2/enqueue
    client: '{{ template "pagerduty.default.client" . }}'
    client_url: '{{ template "pagerduty.default.clientURL" . }}'
    description: '{{ template "pagerduty.default.description" .}}'
    details:
      firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
      num_firing: '{{ .Alerts.Firing | len }}'
      num_resolved: '{{ .Alerts.Resolved | len }}'
      resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
    severity: '{{$labels.severity}}'

alert rule from prometheus:

alert: GPU
  Acceleration Disabled On Host
expr: gpu_acceleration
  < 1
for: 2m
labels:
  service: httppainters
  severity: warning
annotations:
  summary: GPU Acceleration Disabled On Host

@swestcott @mjuarez

What worked for me was ".CommonLabels.severity".

We then added this:

{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower }}{{ else }}critical{{ end }}

Which makes the severity label low case, the PagerDuty API seems to like it more. We then force an alert to be critical in case severity is not set, which I think is a good practice.

Full config for reference, where we also added class, component and group, which are fields supported by the pagerduty API.

- name: team-pagerduty-alert
  pagerduty_configs:
  - send_resolved: true
    routing_key: <secret>
    url: https://events.pagerduty.com/v2/enqueue
    client: '{{ template "pagerduty.default.client" . }}'
    client_url: '{{ template "pagerduty.default.clientURL" . }}'
    description: '{{ template "pagerduty.default.description" .}}'
    details:
      firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
      num_firing: '{{ .Alerts.Firing | len }}'
      num_resolved: '{{ .Alerts.Resolved | len }}'
      resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
    severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower
      }}{{ else }}critical{{ end }}'
    class: '{{ .CommonLabels.class }}'
    component: '{{ .CommonLabels.component }}'
    group: {{ if .CommonLabels.environment }}.{{ .CommonLabels.environment }}{{
      end }}{{ if .CommonLabels.region }}.{{ .CommonLabels.region }}{{ end }}{{ if
      .CommonLabels.service }}.{{ .CommonLabels.service }}{{ end }}

For reference on PagerDuty fields: https://v2.developer.pagerduty.com/docs/send-an-event-events-api-v2

@dbonatto Thanks! {{ .CommonLabels.severity }} worked for me.

I have a same issue - I am getting line 35: did not find expected key"

This is what I have in my config file

23      - name: 'PagerDuty'
24        pagerduty_configs:
25        - send_resolved: true
26          routing_key: <my_secrete_key>
27          url: https://events.pagerduty.com/v2/enqueue
28          client: '{{ template "pagerduty.default.client" . }}'
29          client_url: '{{ template "pagerduty.default.clientURL" . }}'
30          description: '{{ template "pagerduty.default.description" .}}'
31          details:
32            firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
33            num_firing: '{{ .Alerts.Firing | len }}'
34            num_resolved: '{{ .Alerts.Resolved | len }}'
35            resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
36         severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower
37           }}{{ else }}critical{{ end }}'
38         class: '{{ .CommonLabels.class }}'
39         component: '{{ .CommonLabels.component }}'
40        group: {{ if .CommonLabels.environment }}.{{ .CommonLabels.environment }}{{
41          end }}{{ if .CommonLabels.region }}.{{ .CommonLabels.region }}{{ end }}{{ if
42          .CommonLabels.severity }}.{{ .CommonLabels.severity }}{{ end }}

did i set it up all wrong???

I have a same issue - I am getting line 35: did not find expected key"

This is what I have in my config file

23      - name: 'PagerDuty'
24        pagerduty_configs:
25        - send_resolved: true
26          routing_key: <my_secrete_key>
27          url: https://events.pagerduty.com/v2/enqueue
28          client: '{{ template "pagerduty.default.client" . }}'
29          client_url: '{{ template "pagerduty.default.clientURL" . }}'
30          description: '{{ template "pagerduty.default.description" .}}'
31          details:
32            firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
33            num_firing: '{{ .Alerts.Firing | len }}'
34            num_resolved: '{{ .Alerts.Resolved | len }}'
35            resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
36         severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower
37           }}{{ else }}critical{{ end }}'
38         class: '{{ .CommonLabels.class }}'
39         component: '{{ .CommonLabels.component }}'
40        group: {{ if .CommonLabels.environment }}.{{ .CommonLabels.environment }}{{
41          end }}{{ if .CommonLabels.region }}.{{ .CommonLabels.region }}{{ end }}{{ if
42          .CommonLabels.severity }}.{{ .CommonLabels.severity }}{{ end }}

did i set it up all wrong???

@kchaitu4 because of yaml indent issue...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stuartnelson3 picture stuartnelson3  路  5Comments

stuartnelson3 picture stuartnelson3  路  5Comments

leonerd picture leonerd  路  6Comments

marcan picture marcan  路  4Comments

xixikaikai picture xixikaikai  路  3Comments