https://github.com/prometheus/alertmanager/blob/master/notify/notify.go#L193
results in messages like
2016-05-30_15:24:49.81027 time="2016-05-30T15:24:49Z" level=warning msg="Notify attempt 8 failed: unexpected status code 500" source="notify.go:193"
which isn't really helpful to find out which notification mechanism has failed on which alert.
This is still relevant in v0.5:
2017-04-07_11:40:08.39660 time="2017-04-07T11:40:08Z" level=debug msg="Notify attempt 1 failed: unexpected status code 404" source="notify.go:546"
2017-04-07_11:40:08.39663 time="2017-04-07T11:40:08Z" level=error msg="Error on notify: Cancelling notify retry due to unrecoverable error: unexpected status code 404" source="notify.go:272"
2017-04-07_11:40:08.39666 time="2017-04-07T11:40:08Z" level=error msg="Notify for 1 alerts failed: Cancelling notify retry due to unrecoverable error: unexpected status code 404" source="dispatch.go:265"
(My guess is this particular incident is a Slack notification sent to a non-existent channel. Would be much easier to find out if a meaningful message were logged.)
What should we log then though? There's no upper bound on alerts in a notification. So maybe the grouping labels?
In this case, the URL that triggered the 404 would have been very helpful.
We need to be a little careful with some of the URLs, as they can contain auth tokens.
With --log.level=debug, the logged message currently shows:
level=debug ts=2018-04-05T12:52:10.854506281Z caller=notify.go:629
component=dispatcher msg="Notify attempt failed" attempt=5
integration=webhook receiver=web.hook
err="Post http://127.0.0.1:5001/: dial tcp 127.0.0.1:5001: connect: connection refused"
added in 23f31d7d
@brian-brazil maybe good solution would be to add an option --log.urls=true
@stuartnelson3 --log.level=debug doesn't show an url anymore (the latest Prometheus v2.3.0 and Alertmanager v0.13.0) :(
alertmanager | level=debug ts=2018-06-20T14:19:18.524718619Z caller=dispatch.go:188 component=dispatcher msg="Received alert" alert=up[4b31f10][active]
alertmanager | level=debug ts=2018-06-20T14:19:18.525358278Z caller=dispatch.go:429 component=dispatcher aggrGroup="{}/{severity=\"hipchat\"}:{alertname=\"up\"}" msg=Flushing alerts=[up[4b31f10][active]]
alertmanager | level=debug ts=2018-06-20T14:19:18.690725554Z caller=notify.go:605 component=dispatcher msg="Notify attempt failed" attempt=1 integration=hipchat receiver=demo-hipchat err="unexpected status code 404"
@tkrishtop the URL is currently only logged for the webhook receiver. As @brian-brazil noted above, it could be a concern for some URLs contain tokens which shouldn't end up in the logs.
@simonpasquier thank you.
For the ones which will follow and maybe fall in the same trap:
If your hipchat integration curl is
curl -d '{<body>}' -H 'Content-Type: application/json' https://my.nice.website.com/v2/room/<number>/notification?auth_token=<mynicetoken>
Do not add API version in hipchat API url, i.e. write in alertmanager.yml something like
hipchat_api_url: 'https://my.nice.website.com/'
and not
hipchat_api_url: 'https://my.nice.website.com/v2/'