Alertmanager: GUI: new silence form without matchers

Created on 13 Feb 2018  路  27Comments  路  Source: prometheus/alertmanager

What did you do?

There is active alert MySQLReplicationNotRunning:
screen shot 2018-02-13 at 14 54 08

I'm clicking Silence next to this alert:

http://alertmanager.example.com/#/silences/new?filter=%7Bseverity%3D%22critical%22%2C%20service%3D%22mysqld-exporter-db-centities-s1%22%2C%20prometheus_cluster%3D%22prometheus-ops%22%2C%20pod%3D%22mysqld-exporter-db-centities-s1-86db8744bf-nlxmb%22%2C%20namespace%3D%22ops%22%2C%20master_uuid%3D%228efcb402-5429-11e6-b7aa-00163e3acc15%22%2C%20master_host%3D%22db-centities-s2%22%2C%20job%3D%22db-centities-s1%22%2C%20instance%3D%2210.200.0.125%3A9104%22%2C%20endpoint%3D%22http-metrics%22%2C%20connection_name%3D%22%22%2C%20channel_name%3D%22%22%2C%20alertname%3D%22MySQLReplicationNotRunning%22%7D

New silence form is opened with empty Matchers Alerts affected by this silence.:
screen shot 2018-02-13 at 14 57 39

What did you expect to see?

New silence form is opened with filled matchers, eg:
screen shot 2018-02-13 at 14 59 17

What did you see instead? Under which circumstances?

New silence form is opened with empty Matchers Alerts affected by this silence.

Environment

  • Alertmanager version: 0.14.0
  • Prometheus version: 2.0.0
  • Alertmanager configuration: generated by prometheus-operator
  • Prometheus configuration file: generated by prometheus-operator
componenui kinbug

Most helpful comment

@stuartnelson3 Correct.

Part of this PR will also include removing empty labels from incoming alerts.

That was a bug that was fixed in Prometheus 2.2, though it's probably wise to do so anyway.

All 27 comments

I can not reproduce this (Alertmanager version: 0.14.0 too)

Can you try this in incognito/private browsing mode? It might be an issue with the javascript being cached. We unfortunately don't version our script.js file, which can lead to this problem. This potentially could be fixed using a unique hash on the script name, and updating index.html to have that correct version "baked in".

@w0rm have you done anything like this on the front end? My immediate dumb solution is to do a variable replace on index.html to change the script source.

In incognito mode the result is the same for me - still empty matchers.

@stuartnelson3 this looks like a bug somewhere in url serializing/parsing code.

The error is that you have empty matchers coming from Prometheus (channel_name="", connection_name=""). Label matchers should not have an empty value.

@brian-brazil @grobie @beorn7 @brancz : I'm talking with @w0rm and we're thinking that invalid matchers should be dropped when clicking the the "silence" button on the alert page. I think we should still display the empty matchers on the alerts page (and they can hopefully realize something is incorrect with their instrumentation), and allow them to link to create a silence that still matches the alert (minus the invalid matchers). The danger here is potentially supplying users with a silence that covers too many alerts, though, so I'm not sure if this is a "good" idea.

We're using the same matcher parser for the silence creation page as we are in the silence and alert filtering, and there we definitely don't want to allow defining a matcher filter with an empty value. This seems to be the cleanest solution codewise.

How do you think we should handle alerts with label matchers missing values? Should we just display a message on the creation page letting the user know they have supplied invalid matchers, and not auto-fill any of the matchers in the silence create form?

channel_name="" is a valid matcher, it says that that label must not exist and creating such silences should be possible.

What does the alerting rule look like?

Alerting rule:

      - alert: MySQLReplicationNotRunning
        expr: mysql_slave_status_slave_io_running == 0 OR mysql_slave_status_slave_sql_running == 0
        for: 3m
        labels:
          severity: critical
        annotations:
          description: Slave replication (IO or SQL) has been down for more than 3 minutes on {{ $labels.job }}
          summary: Slave replication is not running

Can you share your alertmanager configuration and what this alert looks like on the Prometheus alert status page?

alertmanager config:

global:
  resolve_timeout: 3m
  smtp_require_tls: true
  slack_api_url: <secret>
  pagerduty_url: https://events.pagerduty.com/v2/enqueue
  hipchat_api_url: https://api.hipchat.com/
  opsgenie_api_url: https://api.opsgenie.com/
  wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
  victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
  receiver: "null"
  group_by:
  - alertname
  - prometheus_cluster
  routes:
  - receiver: "null"
    match:
      alertname: DeadMansSwitch
  - receiver: slack-k8s
    match:
      prometheus_cluster: prometheus-k8s
  - receiver: slack-ops
    match:
      prometheus_cluster: prometheus-ops
  - receiver: slack-services
    match_re:
      prometheus_cluster: ^(?:prometheus-dev|prometheus-prod)$
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
receivers:
- name: "null"
- name: slack-k8s
  slack_configs:
  - send_resolved: true
    api_url: <secret>
    channel: '#prometheus-alerts'
    username: '{{ template "slack.default.username" . }}'
    color: '{{ if eq .Status "firing" }}{{ if eq .CommonLabels.severity "critical"
      }}danger{{ else if eq .CommonLabels.severity "warning" }}warning{{ else }}danger{{
      end }}{{ else }}good{{ end }}'
    title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing
      | len }}{{ end }}] {{ .CommonLabels.alertname }}'
    title_link: '{{ template "slack.default.titlelink" . }}'
    pretext: '{{ template "slack.default.pretext" . }}'
    text: |-
      *k8s-cluster:* kube-sjc-prod{{ if ne .CommonAnnotations.summary "" }}
      *summary:* {{ .CommonAnnotations.summary }}{{ end }}{{ range .Alerts }}{{ if ne .Annotations.description "" }}
      *description:* {{ .Annotations.description }}{{ end }}{{ end }}
    footer: '{{ template "slack.default.footer" . }}'
    fallback: '{{ template "slack.default.fallback" . }}'
    icon_emoji: ':prom:'
    icon_url: '{{ template "slack.default.iconurl" . }}'
- name: slack-ops
  slack_configs:
  - send_resolved: true
    api_url: <secret>
    channel: '#prometheus-alerts'
    username: '{{ template "slack.default.username" . }}'
    color: '{{ if eq .Status "firing" }}{{ if eq .CommonLabels.severity "critical"
      }}danger{{ else if eq .CommonLabels.severity "warning" }}warning{{ else }}danger{{
      end }}{{ else }}good{{ end }}'
    title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing
      | len }}{{ end }}] {{ .CommonLabels.alertname }}'
    title_link: '{{ template "slack.default.titlelink" . }}'
    pretext: '{{ template "slack.default.pretext" . }}'
    text: |-
      *prom-cluster:* SJC/{{ .CommonLabels.prometheus_cluster }}{{ if ne .CommonAnnotations.summary "" }}
      *summary:* {{ .CommonAnnotations.summary }}{{ end }}{{ range .Alerts }}{{ if ne .Annotations.description "" }}
      *description:* {{ .Annotations.description }}{{ end }}{{ end }}
    footer: '{{ template "slack.default.footer" . }}'
    fallback: '{{ template "slack.default.fallback" . }}'
    icon_emoji: ':prom:'
    icon_url: '{{ template "slack.default.iconurl" . }}'
- name: slack-services
  slack_configs:
  - send_resolved: true
    api_url: <secret>
    channel: '{{ if ne .CommonLabels.slack_channel "" }}{{ .CommonLabels.slack_channel
      }}{{ else }}#services-alerts{{ end }}'
    username: '{{ template "slack.default.username" . }}'
    color: '{{ if eq .Status "firing" }}{{ if eq .CommonLabels.severity "critical"
      }}danger{{ else if eq .CommonLabels.severity "warning" }}warning{{ else }}danger{{
      end }}{{ else }}good{{ end }}'
    title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing
      | len }}{{ end }}] {{ .CommonLabels.alertname }}'
    title_link: '{{ template "slack.default.titlelink" . }}'
    pretext: '{{ template "slack.default.pretext" . }}'
    text: |-
      *cluster:* kube-sjc-prod{{ if ne .CommonAnnotations.summary "" }}
      *summary:* {{ .CommonAnnotations.summary }}{{ end }}{{ range .Alerts }}{{ if ne .Annotations.description "" }}
      *description:* {{ .Annotations.description }}{{ end }}{{ end }}
    footer: '{{ template "slack.default.footer" . }}'
    fallback: '{{ template "slack.default.fallback" . }}'
    icon_emoji: ':prom:'
    icon_url: '{{ template "slack.default.iconurl" . }}'
templates: []

Alert view from Prometheus:
screen shot 2018-02-13 at 23 31 51

So on the Prometheus side I'm not sure where this empty channel_label is coming from, it should really be pruned at some point.

On the Alertmanager side it should also be getting pruned when the new alert comes in.

It probably comes from mysqld_exporter metrics (https://github.com/prometheus/mysqld_exporter).

mysql_slave_status_slave_sql_running:
mysql_slave_status_slave_sql_running{channel_name="",connection_name="",endpoint="http-metrics",instance="10.200.0.102:9104",job="db-specials-s2",master_host="db-specials-s4",master_uuid="51b79d02-af36-11e7-8eb8-00163ead9a23",namespace="ops",pod="mysqld-exporter-db-specials-s2-58d5488b4f-k8sdv",service="mysqld-exporter-db-specials-s2"}
[...]
mysql_slave_status_slave_io_running:
mysql_slave_status_slave_io_running{channel_name="",connection_name="",endpoint="http-metrics",instance="10.200.0.102:9104",job="db-specials-s2",master_host="db-specials-s4",master_uuid="51b79d02-af36-11e7-8eb8-00163ead9a23",namespace="ops",pod="mysqld-exporter-db-specials-s2-58d5488b4f-k8sdv",service="mysqld-exporter-db-specials-s2"}
[...]

@brian-brazil an exporter must render the same set of label for a given metric right? So could these metrics exist in one exporter ?

foo{bar="foo"} 1
foo{bar=""} 1

Or should the exporter "lie" and do:

foo{bar="foo"} 1
foo{bar="<nil>"} 1

My point is that there's something wrong here on the Prometheus end, those labels shouldn't be getting this far.

channel_name="" is a valid matcher, it says that that label must not exist and creating such silences should be possible.

@w0rm I guess this should be valid filtering as well for silences then, too. foo="" should filter out any silences that have a label foo, I'll have to check to see if this is the behavior in the backend.

@brian-brazil this is also valid for filtering alerts?

This is valid for filtering alerts. It's a bit weird, but it's valid.

I am also getting this issue with AM 0.13.0 and Prometheus 2.1.0, with basically exactly the same rule. Is this fixed in one of the new versions? It looks like the PR never got accepted. Alternatively, does the new version of prometheus prevent the empty label from coming in?

EDIT: Nevermind - it looks like the prometheus-side fix is in 2.2.0. Meanwhile, I am not going to be able to upgrade prometheus in our prod environment for a while - is there an AM fix that will be coming? Or should I just patch the source with the closed PR and build from there?

Thanks!

@w0rm do you have any bandwidth to look at this on friday? We need to accept empty matchers in both silences and alerts:

channel_name="" is a valid matcher, it says that that label must not exist and creating such silences should be possible.

On the Alertmanager side it should also be getting pruned when the new alert comes in.

I can add the backend code to remove empty label matchers on ingestions in the same PR.

@TheyDroppedMe I'm hoping to get this into Alertmanager early next week, and I'll port it to v0.14 for a bug release.

@stuartnelson3 sure! What exactly should we do on the frontend? I can modify the parser to allow empty strings for matchers. What about the silence form? Should we allow empty matchers there?

Yeah, we need to allow both filtering by an empty matcher, as well as defining an empty matcher in the silence form.

@stuartnelson3 as @brian-brazil mentioned, an empty label is not the same as an empty matcher. As far as I understand, an empty matcher matches smth that doesn't have a specific label.

This means we cannot create a silence form from an empty label? What should be the behavior in this case?

Any empty matcher (e.g. foo="") will match a silence that has no label foo.

It was my understanding that creating a silence with an empty matcher will only match silences that do NOT have that label. so sticking with foo="", any alert with a label foo will not be matched.

@brian-brazil am I understanding this correctly?

Part of this PR will also include removing empty labels from incoming alerts.

@stuartnelson3 Correct.

Part of this PR will also include removing empty labels from incoming alerts.

That was a bug that was fixed in Prometheus 2.2, though it's probably wise to do so anyway.

@stuartnelson3 I opened PR that updates the UI part to allow empty matchers

@brian-brazil clarifying the behavior of other matchers:

  • foo!="" is valid, and matches any label foo
  • foo=~"" is valid, and matches no label foo
  • foo!~"" is valid, and matches any label foo

confirmed that that's the behavior in prometheus, so will implement as such

closed by #1289

I had same problem and raised the issue on Github just.

Can you guys suggest on this ?

https://github.com/helm/charts/issues/14244

Was this page helpful?
0 / 5 - 0 ratings