Alertmanager: Allow negative (re) matching in silences

Created on 23 Dec 2018  Â·  21Comments  Â·  Source: prometheus/alertmanager

Similar to https://github.com/prometheus/alertmanager/issues/1023, but for silences.

Currently you can only have = and =~ matchers in silences. I recently wanted to silence all jobs from a given Prometheus server except for some specific ones, and being able to do a negative regex match would have been useful there, especially since RE2 doesn't support negative lookaheads.

areusability componensilences kinenhancement kinfriction

Most helpful comment

@stuartnelson3 A == / != / =~ / !~ dropdown between the label name and label value would make sense to me as a first step. Then you could get rid of the regex checkbox to the right.

All 21 comments

Hi @juliusv I'd like to work on this, if it's not a high priority issue.
Is implementing !~ is what we want here, so that the following command could work?

$ amtool silence add job !~".+exporter" instance="someinstance"

@geekodour Awesome!

I was actually talking about the web UI, but then of course amtool should support it as well. I haven't talked to any of the Alertmanager maintainers yet whether there is a fundamental reason why we don't support this yet (can't think of any), but this would probably be good to check first.

I think getting it implemented at the UI level will be difficult and plenty of discussions, whereas getting it done at amtool level should be straightforward.

I haven't talked to any of the Alertmanager maintainers yet whether there is a fundamental reason why we don't support this yet (can't think of any), but this would probably be good to check first.

I think this is most likely an artifact of the original implementation, which only supported direct or regex matches.

Adding this to amtool should be straightforward, as we already have the !~ and != concept from prometheus. How to handle this in the ui would (to me) be more a question of what makes most sense for selecting it. Maybe a simple dropdown is enough?

@stuartnelson3 A == / != / =~ / !~ dropdown between the label name and label value would make sense to me as a first step. Then you could get rid of the regex checkbox to the right.

@juliusv working on this PR
It seems the following documented syntax does not work too:
https://github.com/prometheus/alertmanager/blob/cfc0d9c55882dd501ec881ebe4606cf8d78bf8b6/cli/silence_add.go#L61-L64

The following works:

$ amtool silence add alertmanager=foo node=bar 

will it be fine if add this into the same PR?

bah, it appears that got dropped at some point. I added it back to amtool alert query in https://github.com/prometheus/alertmanager/pull/1575.

It should (hopefully) a quick change, I guess just make sure it's in a separate commit in the PR.

@stuartnelson3

I added it(~will send the PR soon~), but I think the following documentation does not match with the use:
(this part is repeated in multiple files)
https://github.com/prometheus/alertmanager/blob/cfc0d9c55882dd501ec881ebe4606cf8d78bf8b6/cli/silence_add.go#L63-L64

the following align with the documentation:

$ ./amtool silence query foo
is same as
$ ./amtool silence query alertname=foo

and

$ ./amtool silence query ~foo
is same as
$ ./amtool silence query alertname=~foo

the following does not align with the documentation:

$ ./amtool silence query =~foo
amtool: error: bad matcher format: alertname==~foo

and

$ ./amtool silence query =foo
amtool: error: bad matcher format: alertname==foo

So, even if the first argument does contain = or ~= it will be assumed to be the value of the alertname pair. Also there is no check for "if alertname pair is present or not."

I think the correct documentation will be:

If the first argument does not contain a key then it will be assumed to
be the value of the alertname pair. 

Secondly,
Because we are adding the = suffix to alertname in the assumption,
https://github.com/prometheus/alertmanager/blob/cfc0d9c55882dd501ec881ebe4606cf8d78bf8b6/cli/alert_query.go#L88

We cannot do:

$ ./amtool silence query '!~foo'
amtool: error: bad matcher format: alertname=!~foo (code: 400)

but the following works:
$ ./amtool silence query 'alertmanager!~foo'

whereas we can do:

$ ./amtool silence query ~foo
is the same as:
$ ./amtool silence query alertname=~foo

Making the second change seems simple but will probably a breaking change.

@juliusv
I think for adding negative (re) matching in silences the proto files will need changes. I tried generating the proto files, but it generates a lot of new things in the generated file.

I just wanted to add two new enum values. I am using the protoc version 3.5.1 I've no experience with protocol buffer but I guess the files generated should have been similar except minor changes.

is this correct?

https://github.com/prometheus/alertmanager/blob/7078333202f37577d2549aa7e223f6eb98a2d9d1/cli/utils.go#L159-L173

https://github.com/prometheus/alertmanager/blob/7078333202f37577d2549aa7e223f6eb98a2d9d1/types/match.go#L27-L33

https://github.com/prometheus/alertmanager/blob/7078333202f37577d2549aa7e223f6eb98a2d9d1/api/v1/api.go#L706-L710

https://github.com/prometheus/alertmanager/blob/7078333202f37577d2549aa7e223f6eb98a2d9d1/silence/silencepb/silence.proto#L14-L20

@geekodour What you're doing sounds like the correct approach in general. For the generation of protos, you probably will want to use make proto, which calls scripts/genproto.sh, which sets some custom proto options and generation foo. That hopefully explains the differences you're seeing.

@juliusv I used make proto the diff after running make proto looks like this: http://dpaste.com/2Z03ET4
even without making any changes to the proto files.

@geekodour Hmm, this is what the diff looks like for me when I don't change anything else in Alertmanager: https://gist.github.com/juliusv/033a489a93c118be9184457c44b9cc63

I installed protoc 3.5.1 from https://github.com/protocolbuffers/protobuf/releases/tag/v3.5.1 and used that... not sure where the extra XXX methods and fields are coming from in your case, or whether they would even be a problem, but best to minimize changes of course.

@juliusv the different output of make proto should be fixed by #1707 once it gets merged, can continue working on this.

@geekodour That one is merged now, sounds good!

@juliusv I've added this to amtool but facing some issues with elm ui, once I resolve that will send the PR.

@geekodour let me know if you need help with elm.

Is there any feature missing that prevents the crafting of a new 0.22.0 release of alertmanager supporting negative matching in silences that I could help with?

Oops. This issue was auto-closed, but it should not have. Reopening…

In a review comment, I enumerated what's still missing. To summarize, while the "backend" supports silences with negative matchers now, we still have to amend the following "frontend" components:

  1. The web UI (requires to code Elm).
  2. The v2 API (to actually create and retrieve silences with negative matchers).
  3. amtool (to handle silences with negative matchers with this tool).

Help is appreciated, but whoever wants to help should probably first talk to @vladimiroff because I suspect he is already working on the items above.

Sorry, got carried away. I suspect most of that work outlined above could be cherry-picked from #2448 as labels.Matcher is not that different from the now deleted types.Matcher. Especially when it comes to the API.

I will try to allocate some time for it and prepare a PR tomorrow. You should be fairly suspicious about my changes in the UI there, though. Did the bare minimal to pass the CI checks and nothing more than that. Help there would be highly appreciated.

Was this page helpful?
0 / 5 - 0 ratings