Elastalert: feat. request: granular silence for <rule>.<value of query_key>

Created on 1 May 2020  路  11Comments  路  Source: Yelp/elastalert

Hello,

we have been very happily using ElastAlert in our production kubernetes environment for almost two years.

We've been very basic with our approach, and have a catch all rule to alert whenever there is a structured log emitted with the level: error key value.

Recently an issue where noise emitted by a certain workload has led us to look into silencing alerts.

As per https://elastalert.readthedocs.io/en/latest/elastalert_status.html#silence it does not appear that match_body.x.y.z can be used as a parameter to target silences.

Could this please be confirmed? Would the approach then be to break up our catch all level: error alert into many granular alerts based on other fields of the emitted structured logs?

For reference our catch all error, which requires granular silencing (via match_body fields) is:

    name: "Error"
    type: any
    index: "*"
    filter:
    - term:
        level: "error"
    alert:
    - "slack"

An example of how an alert could (ideally, if possible) be targetted for silence:

image

Most helpful comment

To add a document manually you can use the Index API from elasticsearch (https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index.html). The document should have to following format:

{
    "exponent": 0,
    "rule_name": "<rule_name>.<query_key_value>",
    "@timestamp": "<current datetime>",
    "until": "<silence until timestamp>",
}

I think it would be a good feature to add the possibility to enter a query_key_value to the --silence option.

All 11 comments

When you have a query_key in the rule silencing can be done based on the values of this query_key. For example if you have the query_key kubernetes.labels.app_kubernetes_io/component you can silence the rule only for a specific value in this field. Thius happens for the realert timeframe. If you want to do this by hand, you have to add a document to the elasticsearch silence index manually because this is not supported by the --silence option. The silence_key (in the elasticsearch document saved in the rule_name field) is then <rule_name>.<query_key_value>.

Thanks @JasperJuergensen!

With regards to this, I realized our logger does emit a field name (less verbose than using a kubernetes label metadatum such as kubernetes.labels.app_kubernetes_io/component) for the instrumented service which I will leverage as follows in the catch-all rule:

    name: "Error"
    type: any
    index: "*"
    filter:
    - term:
        level: "error"
    query_key: name
    alert:
    - "slack"

I will then confirm that I can granularly silence alert depending on the value of their name field at which point I will resolve this issue.

If you want to do this by hand, you have to add a document to the elasticsearch silence index manually because this is not supported by the --silence option. The silence_key (in the elasticsearch document saved in the rule_name field) is then ..

How does one add a document manually?
Would it be feasible to allow --silence to work with <rule_name>.<query_key_value> ?

To add a document manually you can use the Index API from elasticsearch (https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index.html). The document should have to following format:

{
    "exponent": 0,
    "rule_name": "<rule_name>.<query_key_value>",
    "@timestamp": "<current datetime>",
    "until": "<silence until timestamp>",
}

I think it would be a good feature to add the possibility to enter a query_key_value to the --silence option.

Thanks @JasperJuergensen.
Realert + exponential realert on a per qk basis is worlds better than what we had prior. I've also added the instruction for manual doc creation to explicitly silence one rule.qk just to our wiki just in case (if this is worth adding to ElastAlert docs please let me know).

I see that you are already working on qk based silences! 馃殌

Oops accidentally closed, will reopen in case this is used for tracking silence qk feature.

Re: silence qk feature,

--silence_qk_value=foo will have to be used in combination with --rule which points to the rules yaml file?

Would this UX be feasible?

--silence_rule_qk(or a better name)=<rule>.<qk>

--silence_qk_value=foo will have to be used in combination with --rule which points to the rules yaml file?

Yes

Would this UX be feasible?

`--silence_rule_qk(or a better name)=.

I think this would make it a bit more complicated because internally the --rule option is needed to load the correct rule (and only this rule). Also the rule name can be very long with space while the filename can be autocompleted by the shell :D. And if there is only the --silence_rule_qk option neither the rule name nor the query key can contain a dot because otherwise you can't distinguish rule and qk.

Sounds good, thanks for explaining.
Another UX question:

If you have multiple rules in your yaml file, and want to silence only a subset (or even just one) of these rules for a given qk, but leave the other rules unsilenced for a given qk, would this be possible?

If you have multiple rules in your yaml file, and want to silence only a subset (or even just one) of these rules for a given qk, but leave the other rules unsilenced for a given qk, would this be possible?

I don't think you can have multiple rules in one yaml file. At least not with the current FileRulesloader implementation. And in general the --rule option is not only the rule file name, but a rule identifier for the used rules loader. So the --rule option should always identify only one rule, which should prevent this problem from arising in the first place.

Thanks @JasperJuergensen my impression that rules were bunched into the same yaml was wrong and due to my lack of knowledge of elastalert internals, just confirmed (I set up 2 rules):

/opt/rules # ls
error.yaml  fatal.yaml

Thanks. Looking forward to qk silence feature! 馃殌

Was this page helpful?
0 / 5 - 0 ratings