I'm trying to migrate to the PagerDuty API v2. I already changed to use the routing_key in the configuration so that AM would use the API v2. What I can not find how to do is passing on to PagerDuty the severity that is declared as a label in the alert.
curl -XPOST -k https://somehost/api/v1/alerts -d '[{
"status": "firing",
"labels": {
"alertname": "TestAlertWarning-13127",
"service": "my-service",
"team": "C Team",
"severity":"warning",
"instance": "TestAlertWarning-13127.example.net"
},
"annotations": {
"summary": "High latency is high!"
},
"generatorURL": "http://prometheus.int.example.net/<generating_expression>"
}]'
What is happening now is that AM is sending a "error" severity to PagerDuty, which looking in the code I could see that is the default behavior when no severity is set.
Not sure if it is a code or documentation issue, would appreciate some help.
Branch: HEAD
BuildDate: 20180112-10:32:46
BuildUser: root@d83981af1d3d
GoVersion: go1.9.2
Revision: fb713f6d8239b57c646cae30f78e8b4b8861a1aa
Version: 0.13.0
Version 1.8.2
Revision 5211b96d4d1291c3dd1a569f711d3b301b635ecb
Branch HEAD
BuildUser root@1412e937e4ad
BuildDate 20171104-16:09:14
GoVersion go1.9.2
- receiver: CTeam-pagerduty-alert
match:
team: C Team
...
- name: CTeam-pagerduty-alert
pagerduty_configs:
- send_resolved: true
routing_key: <secret>
url: https://events.pagerduty.com/v2/enqueue
client: '{{ template "pagerduty.default.client" . }}'
client_url: '{{ template "pagerduty.default.clientURL" . }}'
description: '{{ template "pagerduty.default.description" .}}'
details:
firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
num_firing: '{{ .Alerts.Firing | len }}'
num_resolved: '{{ .Alerts.Resolved | len }}'
resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
Severity is set in pagerduty_configs:
- name: CTeam-pagerduty-alert
pagerduty_configs:
- send_resolved: true
routing_key: <secret>
severity: warning
Sorry, this was one of the tests I've done.
If you remove it, it will have the behavior I described. Its not passing on the severity and I can't find in the documentation how to do it.
Ah, I misunderstood, sorry about that. As it is now, the severity can only be defined statically in the receiver.
To pass on the severity in the label, the code needs to be slightly updated here: https://github.com/prometheus/alertmanager/blob/6a3dfaff45fafba1ef8553451e6b1fc0435b6523/notify/impl.go#L509
n.conf.Severity needs to become tmpl(n.conf.Severity). Then you can use the normal templating to access the common severity label.
I would recommend against this, as if an alert group has mixed severity labels (one is severe, one is warning), then that common label isn't there for templating. I'm not sure how you have your grouping configured, though.
I would recommend in your routes defining a match on severity=warning and a separate severity=critical (or whatever the levels are), and having them point to two different receivers. It's more verbose, but perhaps clearer to others reading the config that don't know how the grouping and common labels work.
That having been said, if you want to add templating to the severity in the link above, that would be fine by me (although I would still recommend being explicit).
Thanks for reopening.
So in my use of AM, we have around 10 teams and many more services. Each team will decide on the severity of an alert based on the type of problem and the environment in which the problem is happening. Also, PagerDuty makes use of the Severity to define what is the urgency of a alert and with that decide how to contact on call engineers.
So in my opinion it would be very useful to have a way to pass the severity configured in the alert directly to PagerDuty, even if that is not the default behavior. Looks like that is the alternative you suggested and I'm very much in favor of it.
Creating routing rules for all teams I have and for all four valid seventies in PagerDuty (Info, Warning, Error and Critical) will be a little bit messy. Still, looks like that is the alternative I have for now.
Also, I think offering a template for other fields provided by the PagerDuty API v2, such as Group and Component might be useful, so we can have a more generic configuration file and have information being dynamically set according to what is set in the alert.
@stuartnelson3 I went ahead and created the PR with the solution you mentioned in case you agree with integrating it.
Also, the routing solution works, still it creates a giant config file. Every team will need to have something like this:
- match:
team: C Team
routes:
- receiver: CTeam-pagerduty-critical
match:
severity: critical
- receiver: CTeam-pagerduty-warning
match:
severity: warning
- receiver: CTeam-pagerduty-info
match:
severity: info
- receiver: CTeam-pagerduty-error
match:
severity: error
In the end, more templates would make the configuration simpler.
@stuartnelson3 Please let me know if this is what you are looking for: https://github.com/prometheus/docs/pull/956
I'm trying to use the templated severity field, but the PD-CEF severity field isn't being passed to PD.
Here's my config, am I doing something wrong? I'm running AM 0.15.0-rc.1.
receivers:
- name: pagerduty-poc
pagerduty_configs:
- send_resolved: true
service_key: <secret>
severity: '{{ .Labels.severity }}'
I'm using the same curl request as above.
Oops, I've corrected _service_key_ to _routing_key_ to use the PD v2 API. AM is now getting "unexpected status code 400" from PD. Hard coding the severity to severity: 'warning' works so I guessing my template config isn't quite right yet.
I too am running into issues using labels with severity. Using AM 0.14.0 and Events v2 API. Tried both {{ .Labels.severity }} and {{$labels.severity}}
Output:
level=error ts=2018-05-16T18:40:52.740921204Z caller=notify.go:303 component=dispatcher msg="Error on notify" err="cancelling notify retry for \"pagerduty\" due to unrecoverable error: unexpected status code 400"
level=error ts=2018-05-16T18:40:53.005475343Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="cancelling notify retry for \"pagerduty\" due to unrecoverable error: unexpected status code 400"
Relevant am config as follows:
- name: nexus
pagerduty_configs:
- send_resolved: true
routing_key: <secret>
url: https://events.pagerduty.com/v2/enqueue
client: '{{ template "pagerduty.default.client" . }}'
client_url: '{{ template "pagerduty.default.clientURL" . }}'
description: '{{ template "pagerduty.default.description" .}}'
details:
firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
num_firing: '{{ .Alerts.Firing | len }}'
num_resolved: '{{ .Alerts.Resolved | len }}'
resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
severity: '{{$labels.severity}}'
alert rule from prometheus:
alert: GPU
Acceleration Disabled On Host
expr: gpu_acceleration
< 1
for: 2m
labels:
service: httppainters
severity: warning
annotations:
summary: GPU Acceleration Disabled On Host
@swestcott @mjuarez
What worked for me was ".CommonLabels.severity".
We then added this:
{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower }}{{ else }}critical{{ end }}
Which makes the severity label low case, the PagerDuty API seems to like it more. We then force an alert to be critical in case severity is not set, which I think is a good practice.
Full config for reference, where we also added class, component and group, which are fields supported by the pagerduty API.
- name: team-pagerduty-alert
pagerduty_configs:
- send_resolved: true
routing_key: <secret>
url: https://events.pagerduty.com/v2/enqueue
client: '{{ template "pagerduty.default.client" . }}'
client_url: '{{ template "pagerduty.default.clientURL" . }}'
description: '{{ template "pagerduty.default.description" .}}'
details:
firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
num_firing: '{{ .Alerts.Firing | len }}'
num_resolved: '{{ .Alerts.Resolved | len }}'
resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower
}}{{ else }}critical{{ end }}'
class: '{{ .CommonLabels.class }}'
component: '{{ .CommonLabels.component }}'
group: {{ if .CommonLabels.environment }}.{{ .CommonLabels.environment }}{{
end }}{{ if .CommonLabels.region }}.{{ .CommonLabels.region }}{{ end }}{{ if
.CommonLabels.service }}.{{ .CommonLabels.service }}{{ end }}
For reference on PagerDuty fields: https://v2.developer.pagerduty.com/docs/send-an-event-events-api-v2
@dbonatto Thanks! {{ .CommonLabels.severity }} worked for me.
I have a same issue - I am getting line 35: did not find expected key"
This is what I have in my config file
23 - name: 'PagerDuty'
24 pagerduty_configs:
25 - send_resolved: true
26 routing_key: <my_secrete_key>
27 url: https://events.pagerduty.com/v2/enqueue
28 client: '{{ template "pagerduty.default.client" . }}'
29 client_url: '{{ template "pagerduty.default.clientURL" . }}'
30 description: '{{ template "pagerduty.default.description" .}}'
31 details:
32 firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
33 num_firing: '{{ .Alerts.Firing | len }}'
34 num_resolved: '{{ .Alerts.Resolved | len }}'
35 resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
36 severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower
37 }}{{ else }}critical{{ end }}'
38 class: '{{ .CommonLabels.class }}'
39 component: '{{ .CommonLabels.component }}'
40 group: {{ if .CommonLabels.environment }}.{{ .CommonLabels.environment }}{{
41 end }}{{ if .CommonLabels.region }}.{{ .CommonLabels.region }}{{ end }}{{ if
42 .CommonLabels.severity }}.{{ .CommonLabels.severity }}{{ end }}
did i set it up all wrong???
I have a same issue - I am getting line 35: did not find expected key"
This is what I have in my config file
23 - name: 'PagerDuty' 24 pagerduty_configs: 25 - send_resolved: true 26 routing_key: <my_secrete_key> 27 url: https://events.pagerduty.com/v2/enqueue 28 client: '{{ template "pagerduty.default.client" . }}' 29 client_url: '{{ template "pagerduty.default.clientURL" . }}' 30 description: '{{ template "pagerduty.default.description" .}}' 31 details: 32 firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}' 33 num_firing: '{{ .Alerts.Firing | len }}' 34 num_resolved: '{{ .Alerts.Resolved | len }}' 35 resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}' 36 severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower 37 }}{{ else }}critical{{ end }}' 38 class: '{{ .CommonLabels.class }}' 39 component: '{{ .CommonLabels.component }}' 40 group: {{ if .CommonLabels.environment }}.{{ .CommonLabels.environment }}{{ 41 end }}{{ if .CommonLabels.region }}.{{ .CommonLabels.region }}{{ end }}{{ if 42 .CommonLabels.severity }}.{{ .CommonLabels.severity }}{{ end }}did i set it up all wrong???
@kchaitu4 because of yaml indent issue...
Most helpful comment
@swestcott @mjuarez
What worked for me was ".CommonLabels.severity".
We then added this:
{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower }}{{ else }}critical{{ end }}Which makes the severity label low case, the PagerDuty API seems to like it more. We then force an alert to be critical in case severity is not set, which I think is a good practice.
Full config for reference, where we also added class, component and group, which are fields supported by the pagerduty API.
For reference on PagerDuty fields: https://v2.developer.pagerduty.com/docs/send-an-event-events-api-v2