Hi:
I have host CPU alert.rules like
ALERT HostCPUUsage
IF (100 - (avg by (instance) (irate(node_cpu{mode="idle"}[5m])) * 100)) > 2
FOR 2m
LABELS {
severity="critical"
}
ANNOTATIONS {
summary = "{{$labels.instance}}: High CPU usage detected",
description = "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})",
}
and I met a problem of when more than two hosts have this kind of issue, then it will trigger alert like
[FIRING:10] HostCPUUsage (my-project critical)
summary:
10.10.0.86:9100: High CPU usage detected\n10.10.0.142:9100: High CPU usage detected\n10.10.0.241:9100: High CPU usage detected\n10.10.0.143:9100: High CPU usage detected\n10.10.0.92:9100: High CPU usage detected\n10.10.0.141:9100: High CPU usage detected\n10.10.0.20:9100: High CPU usage detected\n10.10.0.10:9100: High CPU usage detected\n10.10.0.201:9100: High CPU usage detected\n10.10.0.215:9100: High CPU usage detected\n
description:
10.0.0.86:9100: CPU usage is above 80% (current value Show more…
but when one recovers from the abnormal state , It will send a message like
[RESOLVED] HostCPUUsage (my-project critical)
summary:
10.10.0.86:9100: High CPU usage detected\n10.10.0.142:9100: High CPU usage detected\n10.10.0.241:9100: High CPU usage detected\n10.10.0.143:9100: High CPU usage detected\n10.10.0.92:9100: High CPU usage detected\n10.10.0.141:9100: High CPU usage detected\n10.10.0.20:9100: High CPU usage detected\n10.10.0.10:9100: High CPU usage detected\n10.10.0.201:9100: High CPU usage detected\n10.10.0.215:9100: High CPU usage detected\n
description:
10.0.0.86:9100: CPU usage is above 80% (current value Show more…
So, I think it could not reflect the real scenario, It should just print the message concerned with the certain host extremely.
My question is that might I miss something or alert-manager do not support for this?
Thanks a ton!
@regardfs I do not understand your question. In case this is a usage question, please reopen it in https://groups.google.com/forum/#!forum/prometheus-users. In case you think you found a bug in Alertmanager or if you want to report a missing feature, please add more details to your question, e.g.:
@mxinden , Hi I just want separately alert in slack if several fire triggered at the some time...
You could see that my blob: summary and description combine all some type alerts
What is your Prometheus and Alertmanager config?
prometheus.yml import rules files
rule_files:
- "alert-rules/zixin-alert.rules"
- "alert-rules/host-alert.rules"
- "alert-rules/rabbitmq-alert.rules"
alertmanager.yml
global:
resolve_timeout: 15s
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
channel: '#alert'
api_url: 'slack-api url'
text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'
@regardfs So if I understand you correctly you want a separate notification per Alert that was send by Prometheus, right?
Can you post a properly formatted Alertmanager.yaml?
https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#code-and-syntax-highlighting
@mxinden, You got it, That is just what i want, separate notification per Alert !
Sorry for pasting the wrong format of config yaml file......
config.yaml
global:
resolve_timeout: 15s
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
channel: '#alert'
api_url: 'https://hooks.slack.com/services/OOXXOOXX'
text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'
/etc/alertmanager/templates/alertText.tmp
{{ define "slack_summary" }}
{{ range .Alerts }}{{ .Annotations.summary }}
{{ end }}
{{ end }}
{{ define "slack_description" }}
{{ range .Alerts }}{{ .Annotations.description }}
{{ end }}
{{ end }}
{{ define "slack.text" }}summary: {{ template "slack_summary" . }}description: {{ template "slack_description" . }}{{ end }}
@regardfs That is rather surprising as you don't have any alert grouping configured. Could you try to add group_by: [instance] to your route config?
global:
resolve_timeout: 15s
route:
receiver: 'slack'
group_by: [instance]
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
channel: '#alert'
api_url: 'https://hooks.slack.com/services/OOXXOOXX'
text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'
@mxinden
Thanks a ton, I will try ASAP, I will close this issue now.