Alertmanager: [Question/Bug] Multi Alerts(some kind) recover alert bug.

Created on 16 Aug 2017  Â·  6Comments  Â·  Source: prometheus/alertmanager

Hi:
I have host CPU alert.rules like

ALERT HostCPUUsage
IF (100 - (avg by (instance) (irate(node_cpu{mode="idle"}[5m])) * 100)) > 2
FOR 2m
LABELS {
severity="critical"
}
ANNOTATIONS {
summary = "{{$labels.instance}}: High CPU usage detected",
description = "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})",
}

and I met a problem of when more than two hosts have this kind of issue, then it will trigger alert like

[FIRING:10] HostCPUUsage (my-project critical)
summary:
10.10.0.86:9100: High CPU usage detected\n10.10.0.142:9100: High CPU usage detected\n10.10.0.241:9100: High CPU usage detected\n10.10.0.143:9100: High CPU usage detected\n10.10.0.92:9100: High CPU usage detected\n10.10.0.141:9100: High CPU usage detected\n10.10.0.20:9100: High CPU usage detected\n10.10.0.10:9100: High CPU usage detected\n10.10.0.201:9100: High CPU usage detected\n10.10.0.215:9100: High CPU usage detected\n
description:
10.0.0.86:9100: CPU usage is above 80% (current value Show more…

but when one recovers from the abnormal state , It will send a message like

[RESOLVED] HostCPUUsage (my-project critical)
summary:
10.10.0.86:9100: High CPU usage detected\n10.10.0.142:9100: High CPU usage detected\n10.10.0.241:9100: High CPU usage detected\n10.10.0.143:9100: High CPU usage detected\n10.10.0.92:9100: High CPU usage detected\n10.10.0.141:9100: High CPU usage detected\n10.10.0.20:9100: High CPU usage detected\n10.10.0.10:9100: High CPU usage detected\n10.10.0.201:9100: High CPU usage detected\n10.10.0.215:9100: High CPU usage detected\n
description:
10.0.0.86:9100: CPU usage is above 80% (current value Show more…

So, I think it could not reflect the real scenario, It should just print the message concerned with the certain host extremely.

My question is that might I miss something or alert-manager do not support for this?

Thanks a ton!

componennotify kinquestion

All 6 comments

@regardfs I do not understand your question. In case this is a usage question, please reopen it in https://groups.google.com/forum/#!forum/prometheus-users. In case you think you found a bug in Alertmanager or if you want to report a missing feature, please add more details to your question, e.g.:

  • Where are the above blobs copied from?
  • What is your Prometheus and Alertmanager config?

@mxinden , Hi I just want separately alert in slack if several fire triggered at the some time...
You could see that my blob: summary and description combine all some type alerts

  • Where are the above blobs copied from?
    I use slack to receive alert, this is the slack alert message
  • What is your Prometheus and Alertmanager config?
    prometheus.yml import rules files

    rule_files:
    - "alert-rules/zixin-alert.rules"
    - "alert-rules/host-alert.rules"
    - "alert-rules/rabbitmq-alert.rules"

    alertmanager.yml

global:
resolve_timeout: 15s
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
channel: '#alert'
api_url: 'slack-api url'
text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'

@regardfs So if I understand you correctly you want a separate notification per Alert that was send by Prometheus, right?

Can you post a properly formatted Alertmanager.yaml?
https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#code-and-syntax-highlighting

@mxinden, You got it, That is just what i want, separate notification per Alert !
Sorry for pasting the wrong format of config yaml file......

config.yaml

global:
    resolve_timeout: 15s
route:
    receiver: 'slack'
receivers:
    - name: 'slack'
      slack_configs:
          - send_resolved: true
            channel: '#alert'
            api_url: 'https://hooks.slack.com/services/OOXXOOXX'
            text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'

/etc/alertmanager/templates/alertText.tmp

{{ define "slack_summary" }}
{{ range .Alerts }}{{ .Annotations.summary }}
{{ end }}
{{ end }}

{{ define "slack_description" }}
{{ range .Alerts }}{{ .Annotations.description }}
{{ end }}
{{ end }}

{{ define "slack.text" }}summary: {{ template "slack_summary" . }}description: {{ template "slack_description" . }}{{ end }}

@regardfs That is rather surprising as you don't have any alert grouping configured. Could you try to add group_by: [instance] to your route config?

global:
    resolve_timeout: 15s
route:
    receiver: 'slack'
    group_by: [instance]
receivers:
    - name: 'slack'
      slack_configs:
          - send_resolved: true
            channel: '#alert'
            api_url: 'https://hooks.slack.com/services/OOXXOOXX'
            text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'

@mxinden
Thanks a ton, I will try ASAP, I will close this issue now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

leonerd picture leonerd  Â·  6Comments

stuartnelson3 picture stuartnelson3  Â·  5Comments

oryband picture oryband  Â·  3Comments

stuartnelson3 picture stuartnelson3  Â·  5Comments

mattbostock picture mattbostock  Â·  4Comments