Elastalert: Problem understanding the flatline rule.

Created on 26 Dec 2018  ·  8Comments  ·  Source: Yelp/elastalert

Following is my rule.yaml where I want to alert when the number of documents in a period (500 minutes) are less than a threshold( 300).

es_host: XXXX
es_port: 9200
run_every:
   minutes:5
name: No data logs

type: flatline

index: XXXX

timeframe:
  minutes: 500

threshold: 300

#query_key: tags

#use_count_query: true

#doc_type: doc

filter:
- term:
      tags: "rsa"

alert:
- "debug"

The output is

INFO:elastalert:Queried rule No data RSA logs from 2018-12-26 07:34 IST to 2018-12-26 15:59 IST: 147 / 147 hits
elastalert_status - {'hits': 147, 'matches': 0, '@timestamp': datetime.datetime(2018, 12, 26, 10, 29, 44, 765974, tzinfo=tzutc()), 'rule_name': 'No data logs', 'starttime': datetime.datetime(2018, 12, 26, 2, 4, 41, 438924, tzinfo=tzutc()), 'endtime': datetime.datetime(2018, 12, 26, 10, 29, 41, 438924, tzinfo=tzutc()), 'time_taken': 2.361548900604248}

I am not able to figure out why there are no matches in this case. I think it has something to do with the 'run_every', 'timeframe', 'use_count_query', 'query_key' properties in the configuration.
Would really appreciate the help as I am stuck in this for a couple of days now.

Most helpful comment

Just ran into this problem too. I would suggest mentioning that there is a minimum elapsed time (and how long that is?) in the spike and and flatline section of the docs. Otherwise users will be confused as to why alerts don't get sent out when they run elastalert-test-rule myFlatlineAlert.yaml --alert

All 8 comments

Not sure why flatline if behaving like this, but as a workaround, in the meantime, try using cardinality rule type, setting appropriate value for min_cardinality key to emulate the flatline behavior. Make sure to set the cardinality_field to something like document_id because cardinality rule type works on unique values
https://elastalert.readthedocs.io/en/latest/ruletypes.html#cardinality

I tried your suggestion as follows

es_host: XXXX
es_port: 9200
run_every:
   minutes:5
name: No data logs2

type: cardinality

index: XXXX

timeframe:
  minutes: 500

cardinality_field: "_id"
min_cardinality: 300

filter:
- term:
      tags: XXX

alert:
- "debug"

The ouput is

INFO:elastalert:Queried rule No data logs from 2018-12-26 08:27 IST to 2018-12-26 16:52 IST: 13 / 13 hits

elastalert_status - {'hits': 13, 'matches': 0, '@timestamp': datetime.datetime(2018, 12, 26, 11, 23, 0, 565346, tzinfo=tzutc()), 'rule_name': 'No data logs2', 'starttime': datetime.datetime(2018, 12, 26, 2, 57, 58, 404092, tzinfo=tzutc()), 'endtime': datetime.datetime(2018, 12, 26, 11, 22, 58, 404092, tzinfo=tzutc()), 'time_taken': 1.2378339767456055}

Still there are no matches.

@PraneetKhandelwal

I've run into similar issues and to figure out what was going on I had to put some debug prints in ruletypes.py... I understood the issue with cardinality, I will also check if flatline has similar behavior.

In function check_for_match() of class cardinalityRule:

time_elapsed = lookup_es_key(event, self.ts_field) - self.first_event.get(key, lookup_es_key(event, self.ts_field))
timeframe_elapsed = time_elapsed > self.timeframe

IMVHO it makes no sense , but it is pretty much obvious that if your timeframe is let's say 10 minutes, and you look for events that happened in the last 10 minutes, timeframe_elapsed will _always_ evaluate to False:

Successfully loaded just_bugs

INFO:elastalert:Queried rule just_bugs from 2019-01-14 16:43 CET to 2019-01-14 16:54 CET: 12 / 12 hits
('time_elapsed ', '0:00:00')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.013000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.035000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.049000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.062000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.074000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.085000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.097000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.112000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.124000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.135000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:00:00.148000')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')
('time_elapsed ', '0:04:14.686867')
('self.timeframe ', '0:10:00')
('timeframe_elapsed ', 'False')

Which will also make this whole condition to evaluate to False (therefore no match) because of the timeframe_elapsed value:

if (len(self.cardinality_cache[key]) > self.rules.get('max_cardinality', float('inf')) or
                (len(self.cardinality_cache[key]) < self.rules.get('min_cardinality', float('-inf')) and timeframe_elapsed)):

The only way I could get the cardinality working then, was to pass the --start parameter and let elastalert match events older than the timeframe defined in the configuration.

('time_elapsed ', '0:08:57.911815')
('self.timeframe ', '0:05:00')
('timeframe_elapsed ', 'True')
INFO:elastalert:Alert for just_bugs at 2019-01-14T15:58:47.713815Z:
INFO:elastalert:just_bugs

Less than 100 unique values occurred since last alert or between 2019-01-14 16:53 CET and 2019-01-14 16:58 CET

@timestamp: 2019-01-14T15:58:47.713815Z
num_hits: 12
num_matches: 1


Would have written the following documents to writeback index (default is elastalert_status):

silence - {'rule_name': 'just_bugs', '@timestamp': datetime.datetime(2019, 1, 14, 15, 58, 47, 729404, tzinfo=tzutc()), 'exponent': 0, 'until': datetime.datetime(2019, 1, 14, 15, 59, 47, 729396, tzinfo=tzutc())}

elastalert_status - {'hits': 12, 'matches': 1, '@timestamp': datetime.datetime(2019, 1, 14, 15, 58, 47, 731024, tzinfo=tzutc()), 'rule_name': 'just_bugs', 'starttime': datetime.datetime(2019, 1, 14, 15, 47, tzinfo=tzutc()), 'endtime': datetime.datetime(2019, 1, 14, 15, 58, 47, 713815, tzinfo=tzutc()), 'time_taken': 0.013551950454711914}

Some rule types, such as spike and flatline require a minimum elapsed time before they begin alerting, based on their timeframe. So, if are testing the rule (using the elastalert-test script) you will not get any matches as such, although you may get hits.
Documentation:
screenshot from 2019-01-14 21-58-20

So, as @lc4nt and @abhishekjiitr point out, there is a "minimum elapsed time" before you will get an alert.

Mostly it's useful for flatline with query_key. I wanted to support "Alert when a new value appears and then disappears". If we didn't have a elapsed time check, and our threshold was 2, it would generate an alert immediately after seeing a single event appear.

For non query_key flatlines, it makes less sense, but it also helps keep the code simple. We could drop this check and have each rule scan backwards over it's timeframe when it starts up, but then we'd need separate code for frequency and flatline, since for frequency type this could generate duplicate alerts. So, yes, it could be removed, but I don't think it's an unreasonable feature.

Regarding

time_elapsed = lookup_es_key(event, self.ts_field) - self.first_event.get(key, lookup_es_key(event, self.ts_field))
timeframe_elapsed = time_elapsed > self.timeframe

"it is pretty much obvious that if your timeframe is let's say 10 minutes, and you look for events that happened in the last 10 minutes, timeframe_elapsed will always evaluate to False"

For flatline rules, first_event gets populated with an empty "placeholder" timestamp after the first query is made. So, it's guaranteed to get populated after the first query.

For cardinality rule, you are right, if there are never any hits, min_cardinality rule will never trigger. I think you're right that it doesn't quite make sense in this scenario. I'll try to get it changed at some point.

Hi @Qmando ,

thanks a lot for your explanation!

I believe the use case that both me and @PraneetKhandelwal are trying to address is, to detect lines that disappears all of a sudden (example: devices sending logs which stops sending data). This doesn't necessarily require two time windows to be compared, because everything you need is a query at time T, a time window and a reference value (threshold) below which you want to fire an alert.

I believe that another possible solution for achieving this, could be a configuration option that reverts the any module from a positive logic (if query gives any result => alert(s) ) to a negative one (if query gives no result / documents count is below a threshold => alert). Maybe this would be an easier change which doesn't add complexity to flatline and cardinality. Just an idea :)

@lc4nt Yeah I have been trying to make rules for the example you mentioned. Currently, because I require the rule to work for a long time, I am making the use of Flatline rule, which should start giving me alerts once the elapsed_time has passed.
Speaking about the problem with the cardinality rule, I had raised the same issue with the variable always evaluating to False here #2055 . Maybe we can merge them. Anyways thanks a lot @abhishekjiitr @lc4nt @Qmando

Just ran into this problem too. I would suggest mentioning that there is a minimum elapsed time (and how long that is?) in the spike and and flatline section of the docs. Otherwise users will be confused as to why alerts don't get sent out when they run elastalert-test-rule myFlatlineAlert.yaml --alert

Was this page helpful?
0 / 5 - 0 ratings