Elastalert: Alert based on % of a query related to total number of events

Created on 21 Oct 2015 · 7Comments · Source: Yelp/elastalert

Hello,

Currently I'm trying to generate an alert to generate an email when the access success rate on a website is below xx%. The data I'm indexing is based on apache logs, and I'm interested on the field that has the HTTP code (200,404, 500, etc).

I was able to generate a dashboard that shows the access success rate, based on the following queries:
response: [ 0 TO 499 ]
response: [ 500 TO * ]

The graph shows basically which percentage of calls have HTTP code <=500 compared to the rest of the events. It looks something like this:

For this I had to select the percentage option on the view:

I'm using the filter type frequency, and on the query currently I have this

query:
query_string:
query: "response:[500 TO *]"

What this does is to count the number of 500+ requests, also the Cardinality option would work, because it would aggregate different HTTP codes (200,404,500,etc) and not in a range [0 to 499] and [500 to *].

My question is, if there is any option to do this using the query from the kibana dashboard and this way have an alert sending mails when in the last 5 minutes the threshold passed below 95% for example.

Thanks for the help

Source

ricardojdsilva87

Most helpful comment

I had the same problem, and i did something like this, following the custom rules part of the doc :

import dateutil.parser
from elastalert.ruletypes import RuleType
from elastalert.util import ts_to_dt

class PercentRule(RuleType):
    required_options = set(['error_ratio'])

    def add_data(self, data):
        count = 0.0
        for document in data:
            if document['response'] >= 500:
                count = count + 1
        size = len(data)
        self.error_ratio = (count/size)*100
    if self.error_ratio > self.rules['error_ratio']:
            for document in data:
                if document['response'] >= 500:
                    self.add_match(document)
            break
    def get_match_str(self, match):
        #Yes I'm french
        return "L'index %s a atteint %s pourcents d'erreurs " % (self.rules['index'], self.error_ratio)
    def garbage_collect(self, timestamp):
        pass

(i modified my file to match your needs)
This function only requires the "error_ratio" paramter, I hope it could help you !
It is a simple one : no timeframe (using the default one), etc... but it works fine for me.

redhelling21 on 4 Feb 2016

👍3

All 7 comments

Unfortunately there is not currently any % based rule types. You can either use a fixed threshold, with frequency, or alert if the number of 5XX responses increases by X%, using spike. The cardinality rule could also do something like "alert if more than 10 users receive 5XX responses within an hour".

I would really like to add this feature for ratios of results of different queries.

Qmando on 21 Oct 2015

Hello, thanks for the quick reply,

I think I can work with the spike option, it's just a matter of checking the "normal" number of 5XX events to define the threshold.

I think that the % option would be a great improvement to the tool. Keep the great work.
Should I close the issue? Or do you want to keep it to be an enhancement? I changed the title also to something more related to the question.

Thanks again

ricardojdsilva87 on 22 Oct 2015

@Qmando / @MattyKuzyk / @ricardojdsilva87 : Do you know if there was any progress on this feature? It is something that we would love to use as well.. ;)

stefanrehm on 17 Jan 2016

Any progress on this feature. I would love to use something like this

sumitkumarMBK on 4 Feb 2016

I had the same problem, and i did something like this, following the custom rules part of the doc :

import dateutil.parser
from elastalert.ruletypes import RuleType
from elastalert.util import ts_to_dt

class PercentRule(RuleType):
    required_options = set(['error_ratio'])

    def add_data(self, data):
        count = 0.0
        for document in data:
            if document['response'] >= 500:
                count = count + 1
        size = len(data)
        self.error_ratio = (count/size)*100
    if self.error_ratio > self.rules['error_ratio']:
            for document in data:
                if document['response'] >= 500:
                    self.add_match(document)
            break
    def get_match_str(self, match):
        #Yes I'm french
        return "L'index %s a atteint %s pourcents d'erreurs " % (self.rules['index'], self.error_ratio)
    def garbage_collect(self, timestamp):
        pass

redhelling21 on 4 Feb 2016

👍3