Elastalert: No "MATCH" occurred when it should

Created on 15 Nov 2016  Â·  32Comments  Â·  Source: Yelp/elastalert

I have an issue of " Match" when I running a Frequency rule. It should have a “Match" there but not.
Here is my frequency.yaml

es_host : localhost

es_port : 9200

name: Example rule

type: frequency

index: time

num_events: 4

timeframe:
hours: 0.2

filter:

  • term:
    rslt: "1"

alert:

if the first time "rslt: 1" show up and in that timeframe also the num_events reach 4 times (in my case), the "match" will show up and the alert also would be sent.

if the filter term "resl:1" occurred, but less than 4 times in the timeframe first, after that, any events reach 4 times happened in any timeframe would not trigger "match".

and just ONE "match" show up in the alert even if there are more than once "match" should exist in the dataset in buffering time.

Most helpful comment

When it hit's num_events, the count gets reset. I completely understand why it would make sense to continue for .2 hours after that instead of starting to count again immediately. I've been considering whether to change the behavior, because that does make sense.

All 32 comments

I can't really understand what you mean, do you have logs? Add `--verboseif you haven't already. If you seeIgnoring match for silence rule Example Rule`` you can add

realert:
  minutes: 0

I did add --verbose
python -m elastalert.elastalert --verbose --rule example_frequency.yaml
Let me give my example here:
My data looks like this:

`April 3rd 2016, 21:29:18 rlst: 1`
`April 3rd 2016, 21:30:19 rlst: 1`
`April 3rd 2016, 21:31:18 rlst: 1`
`April 1st 2016, 21:31:10 rlst: 1`
`April 1st 2016, 21:31:12 rlst: 1`
`April 1st 2016, 21:31:13 rlst: 1`
`April 1st 2016, 21:31:14 rlst: 1`
`April 1st 2016, 21:31:15 rlst: 1`
`April 1st 2016, 21:31:18 rlst: 1`
`April 1st 2016, 21:31:19 rlst: 1`
`April 1st 2016, 21:31:20 rlst: 1`
`April 1st 2016, 21:31:21 rlst: 1`
`April 1st 2016, 21:31:22 rlst: 1`
`April 1st 2016, 21:31:23 rlst: 1`
`March 23rd 2016, 21:31:18 rlst: 1`
`March 23rd 2016, 21:31:18 rlst: 1`

If I set the example_frequency.yaml as:

es_host : localhost
es_port : 9200
name: Example rule
type: frequency
index: time

num_events: 4

timeframe:
hours: 0.2

filter:
term:
rslt: "1"
alert:
"email"
email:
"[email protected]"

For the data, there is a match should be ONE "match" returned because there are 11 events ocurred in the timeframe in April 1st, but not. All 16 hits can be recognized. The out put as:

INFO:elastalert:Queried rule Example rule from 2014-11-16 13:58 CST to 2016-11-15 13:58 CST: 16 / 16 hits
INFO:elastalert:Ran Example rule from 2014-11-16 13:58 CST to 2016-11-15 13:58 CST: 16 query hits, 0 matches, 0 alerts sent

If I change to num_event : 3, TWO "match" should be returned for events in April 3rd and April 2nd. But only one match occurred.All 16 hits can be recognized.

INFO:elastalert:Queried rule Example rule from 2014-11-16 13:57 CST to 2016-11-15 13:57 CST: 16 / 16 hits
INFO:elastalert:Sent email to ['[email protected]']
INFO:elastalert:Ran Example rule from 2014-11-16 13:57 CST to 2016-11-15 13:57 CST: 16 query hits, 1 matches, 1 alerts sent

If I change to num_event : 2, Three "match" should be returned for events in April 3rd, April 2nd ,and March 23rd. But only ONE match occurred. All 16 hits also can be recognized.

INFO:elastalert:Queried rule Example rule from 2014-11-16 14:36 CST to 2016-11-15 14:36 CST: 16 / 16 hits
INFO:elastalert:Sent email to ['[email protected]']
INFO:elastalert:Ran Example rule from 2014-11-16 14:36 CST to 2016-11-15 14:36 CST: 16 query hits, 1 matches, 1 alerts sent

It only depends on how the events("rslt=1") enter the timeframe in the very beginning. If the events enter the timeframe as a match, the system will return 1 match, and no matter how many matches happen after the first match, it will still return 1, which is undesirable.
If the events enter the timeframe not as a match (say the event happens less than "num_events" times within a timeframe), the system will return 0 match, and no matter how many matches happen after that, there will still be 0 match.

Whats your config.yaml? And are you using --start 2014-11-16 to get it to query 2 years at a time?

Let me try to recreate this behavior

Can you show the full JSON for one of these documents? Is it possible @timestamp is not the timestamp you are looking at?

I did not use --start 2014-11-16, but I set it at config.yaml

rules_folder: example_rules
run_every:
  minutes: 100
buffer_time:
  minutes: 1051200
es_host: 127.0.0.1
es_port: 9200
es_url_prefix: localhost
writeback_index: elastalert_status
alert_time_limit:
  days: 200

I do change the @timestamp. I changed my @timestamp at logstash.
I need to review the data that I have had, but not the real-time data. I changed the @timestamp to the data index of the dataset. The @timestamp what I have changed is the one that I need.
The date in my data set is the UNIX format. I made the mapping in logstash, to replace my @timestamp.
This is my logstash filter of example.conf

filter {
  csv {
      separator => ","
      columns => ["time","rslt","username"]
  }
  date {
  match => ["time", "UNIX"]
  target => "@timestamp"
  }
}

You can also use

timestamp_format: unix
timestamp_field: time

to make elastalert use those fields without needing to convert

Oh, I know what the issue is! It basically comes from the fact that your buffer_time is so big compared to the timeframe. What elastalert does is

  1. Make query for buffer_time worth of events
  2. Add all events from this query to the eventwindow
  3. Resize event window so that anything older than (latest event - timeframe) is removed
  4. Check to see if eventwindow.size > num_events

In this case, the 3 events on April 3rd get added, then all of the events from April 1 get removed because they are now too old. This is an optimization IF the timeframe is longer than buffer_time, which usually is the case.

The bug was introduced here https://github.com/Yelp/elastalert/commit/af88f5155aa72082c8c5c2e9344127998e393931 line 212.

 +        # Only check for match _once_ after adding all the events discovered in this run
 +        # If the current running count is 1, and the threshold is 10, and we go to add 8 entries
 +        # it should only count as a single match, and not 8 consecutive ones.

Do you mean that, in order to make the buffer time smaller, instead of changing @timestamp in Logstash, I can change the timeframe I used in example_frequency.yaml?

I changed the filter in Logstash, I removed the date{} , so the @timestamp is the time that I configure the logstash:

filter {
  csv {
      separator => ","
      columns => ["time","rslt","username"]
  }
}

Then I reduce the buffer time to 30 mins and my config.yaml is:

rules_folder: example_rules
run_every:
  minutes: 100
buffer_time:
  minutes: 30
es_host: 127.0.0.1
es_port: 9200
es_url_prefix: localhost
writeback_index: elastalert_status
alert_time_limit:
  days: 200

My example_frequency.yaml as:

es_host : localhost
es_port : 9200
name: Example rule
type: frequency
index: norltime

timestamp_format: unix
timestamp_field: time
num_events: 2
timeframe:
hours: 0.5

filter:
term:
rslt: "1"
alert:
"email"
email:
"[email protected]"

After running python -m elastalert.elastalert --verbose --rule example_frequency.yaml, the output is:

user@mylenovo:~/elastalert$ python -m elastalert.elastalert --verbose --rule example_frequency.yaml
INFO:elastalert:Starting up
INFO:elastalert:Queried rule Example rule from 2016-11-16 10:23 CST to 2016-11-16 10:53 CST: 0 / 0 hits
INFO:elastalert:Ran Example rule from 2016-11-16 10:23 CST to 2016-11-16 10:53 CST: 0 query hits, 0 matches, 0 alerts sent
INFO:elastalert:Sleeping for 5999 seconds

I used the same data set. Even the hit cannot be found this time. It should have 16 hits and 3 matches there.

Is there anything I can do for above steps?

Or is there anything I can do without changing the buffer time smaller?
Should I expect a fix to this bug any time soon?

The timestamp thing had nothing to do with it. It's not getting hits because it no long queries 1051200 minutes. In order to hit the documents from April, you need to force a start time.

python -m elastalert.elastalert --verbose --rule example_frequency.yaml --start 2016-03-01 --end 2016-04-04

It works for the dataset which I changed @timestamp in Logstash by adding "--start" and "--end".
However, the problem still exists in my case.
I think you mean that the buffer time cannot be too long, and the timeframe can not be longer than buffer.
But the fixed "--start, --end" does not work for my situation because it has to be done manually, for example, if I can access the dataset weekly, the "--start, --end" time should change every week.

I have two questions,
First, how short the buffer time should be.
I made up a dataset for testing. I first made a data set with @timestamp in 24 hrs and there are 3 or more than 3 time events occurred every hour. So I set config.yaml:

buffer_time:
  minutes: 1440

the example_frequency.yaml:

num_events: 3
timeframe:
  hours: 1

The output also only depends on the events("rslt=1") enter the timeframe in the very beginning. For the case, there is only one match.

Then I change the buffer time smaller,

buffer_time:
  minutes: 120

for the case, two matches should be reported, but only one.

Secondly,
Does the buffer time in config.yaml depends on the @timestamp?
For previous case, I did not replace the @timestamp with "time" column of the dataset in logstash, so
that there is only one @timestamp, which is the time that I configure the logstash.conf. So I think all the data is requested because it just has one @timestamp.
Does "timestample_format:unix, timestamp_field:time" in example_frequency.yaml indicates that we change the procedure to calculate the frequency based on the "time" column but not the @timestamp?

timestamp_format: unix
timestamp_field: time
num_events: 2
timeframe:
hours: 0.5

I really appreciate your help.

First, let me say that this IS a bug that I will fix soon. So this won't matter after I fix it. But, in the meantime:

Change buffer_time to be equal to timeframe or smaller. This should fix all the issues you have.

Secondly:

timestamp_field and timestamp_format don't affect buffer_time or timeframe or anything. That's just how elastalert will format timestamps when making queries. Internally, elastalert uses python datetime objects, and elasticsearch uses DateOptionalTime or some other internal format. If you have both @timestamp AND time fields, it won't make any difference.

Now I got the @timestamp part :) and thank you for fixing this bug :)
If the buffer_time is equal to timeframe or smaller, only ONE match will be returned. For example, buffer_time is one hour, and also we need to find if certain number events occurred in one hour, the return is either 1 match or 0 match (found "match" or not), there is no anther one hour that we can use to find the second match.

@Gracewan this should be fixed by https://github.com/Yelp/elastalert/pull/809

I have update the elastalert to 0.1.4, however the same problem still exist...

You still have a very large buffer_time, it hits the documents, but no alert happens? Can you please give the full output from the run showing 16 query hits AND the JSON from one of the documents (You can remove irrelevant fields) AND the full rule yaml if it has changed from your earlier post?

I did not change anything of my config.yaml and I still have a large buffer time:

rules_folder: example_rules
run_every:
  minutes: 100
buffer_time:
  minutes: 1051200
es_host: 127.0.0.1
es_port: 9200
es_url_prefix: localhost
writeback_index: elastalert_status
alert_time_limit:
  days: 200

example_frequency.yaml

es_host : localhost
es_port : 9200
name: Example rule
type: frequency
index: time

num_events: 2

timeframe:
hours: 0.2

filter:
term:
rslt: "1"
alert:
"email"
email:
"[email protected]"

The output is:

user@mylenovo:~/elastalert$ python -m elastalert.elastalert --verbose --rule example_frequency.yaml
INFO:elastalert:Starting up
INFO:elastalert:Queried rule Example rule from 2014-11-22 13:00 CST to 2016-11-21 13:00 CST: 16 / 16 hits
INFO:elastalert:Sent email to ['[email protected]']
INFO:elastalert:Ran Example rule from 2014-11-22 13:00 CST to 2016-11-21 13:00 CST: 16 query hits, 1 matches, 1 alerts sent

Can you show the JSON of some of the documents? Ideally two pairs that would have triggered it? I want to verify that

`April 1st 2016, 21:31:13 rlst: 1`
`April 1st 2016, 21:31:14 rlst: 1`
`April 1st 2016, 21:31:15 rlst: 1`
`April 1st 2016, 21:31:18 rlst: 1`
`April 1st 2016, 21:31:19 rlst: 1`
`April 1st 2016, 21:31:20 rlst: 1`

are the actual timestamps being used.

As I said, I changed my @timestamp by adding a filer in the logstash.config my data format is csv

filter {
  csv {
      separator => ","
      columns => ["time","rslt","username"]
  }
  date {
  match => ["time", "UNIX"]
  target => "@timestamp"
  }
}

Then you can see that the time column of "time" is UNIX that I changed in logstash.config.

time,          rslt,   usernamer
1479238201, 1,  admin
1479238202,     1,  admin
1479238203, 1,  admin
1479238204, 1,  admin

I just list a piece of it, please let me know if more info is needed.
There should have more 3 Matches for my rule

Can you run it with this patch?

diff --git a/elastalert/ruletypes.py b/elastalert/ruletypes.py
index f53d2f4..62c251e 100644
--- a/elastalert/ruletypes.py
+++ b/elastalert/ruletypes.py
@@ -209,6 +209,7 @@ class FrequencyRule(RuleType):

             # Store the timestamps of recent occurrences, per key
             self.occurrences.setdefault(key, EventWindow(self.rules['timeframe'], getTimestamp=self.get_ts)).append((event, 1))
+            print "Got %s, count: %s" % (event['@timestamp'], self.occurrences['all'].count())
             self.check_for_match(key, end=False)

I found that my ruletypes.py was not updated. And I added it manually, in my local ruletypes.py file.

print "Got %s, count: %s" % (event['@timestamp'], self.occurrences['all'].count())
self.check_for_match(key, end=False)

The output is an error:

user@mylenovo:~/elastalert$ python -m elastalert.elastalert --verbose --rule example_frequency.yaml
INFO:elastalert:Starting up
INFO:elastalert:Queried rule Example rule from 2014-11-22 17:23 CST to 2016-11-21 17:23 CST: 16 / 16 hits
Got 2015-07-08 08:47:34+00:00, count: 1
ERROR:root:Traceback (most recent call last):
  File "/home/yu/elastalert/elastalert/elastalert.py", line 743, in run_all_rules
    num_matches = self.run_rule(rule, endtime, self.starttime)
  File "/home/yu/elastalert/elastalert/elastalert.py", line 532, in run_rule
    if not self.run_query(rule, rule['starttime'], endtime):
  File "/home/yu/elastalert/elastalert/elastalert.py", line 393, in run_query
    rule_inst.add_data(data)
  File "elastalert/ruletypes.py", line 216, in add_data
    self.check_for_match(key, end=False)
TypeError: check_for_match() got an unexpected keyword argument 'end'

ERROR:root:Uncaught exception running rule Example rule: check_for_match() got an unexpected keyword argument 'end'
INFO:elastalert:Rule Example rule disabled
INFO:elastalert:Sleeping for 5999 seconds

I don't think you actually have 0.1.4
That error is because this line doesn't have the changes I made: https://github.com/Yelp/elastalert/blob/master/elastalert/ruletypes.py#L219

I use pip show to check the version:

user@mylenovo:~/elastalert$ pip show elastalert
Name: elastalert
Version: 0.1.4
Summary: Runs custom filters on Elasticsearch and alerts on matches
Home-page: UNKNOWN
Author: Quentin Long
Author-email: [email protected]
License: Copyright 2014 Yelp
Location: /usr/local/lib/python2.7/dist-packages
Requires: argparse, elasticsearch, jira, jsonschema, mock, python-dateutil, PyStaticConfiguration, pyyaml, simplejson, boto, botocore, blist, croniter, configparser, aws-requests-auth, texttable

Is /elastalert a clone of the code? It looks like you have elastalert installed in two places. /elastalert and /usr/local/lib/python2.7/dist-packages. You can either change directory so it will pick up the system package, or run git pull, or delete /elastalert

I added it manually, and the problem is fixed so far!!!
Thank you so much!!!

Hi, I have a new question about this issue. I used the fixed code, my example_frequency.yaml as:

num_events: 3
timeframe:
  hours: 0.2

and the output as below:

user@mylenovo:~/elastalert$  python -m elastalert.elastalert --verbose --rule example_frequency.yaml
INFO:elastalert:Starting up
INFO:elastalert:Queried rule Example rule from 2016-11-09 10:56 CST to 2016-11-29 10:56 CST: 27 / 27 hits
Got 2016-11-15 19:30:01+00:00, count: 1
Got 2016-11-15 19:30:02+00:00, count: 2
Got 2016-11-15 19:30:03+00:00, count: 3
Got 2016-11-15 19:30:04+00:00, count: 1
Got 2016-11-15 19:30:05+00:00, count: 2
Got 2016-11-15 20:32:01+00:00, count: 3
Got 2016-11-15 20:32:02+00:00, count: 1
Got 2016-11-15 20:32:03+00:00, count: 2
Got 2016-11-15 22:32:01+00:00, count: 3
Got 2016-11-15 22:32:02+00:00, count: 1
Got 2016-11-15 22:32:03+00:00, count: 2
Got 2016-11-16 00:32:20+00:00, count: 1
Got 2016-11-16 00:32:21+00:00, count: 2
Got 2016-11-16 00:32:22+00:00, count: 3
Got 2016-11-16 02:32:10+00:00, count: 1
Got 2016-11-16 02:32:11+00:00, count: 2
Got 2016-11-16 10:31:12+00:00, count: 1
Got 2016-11-16 10:31:13+00:00, count: 2
Got 2016-11-16 10:31:14+00:00, count: 3
Got 2016-11-16 10:31:15+00:00, count: 1
Got 2016-11-16 10:31:33+00:00, count: 2
Got 2016-11-16 10:31:34+00:00, count: 3
Got 2016-11-16 10:31:35+00:00, count: 1
Got 2016-11-16 19:52:05+00:00, count: 1
Got 2016-11-16 19:52:06+00:00, count: 2
Got 2016-11-16 19:52:07+00:00, count: 3
Got 2016-11-16 19:52:08+00:00, count: 1
INFO:elastalert:Sent email to ['[email protected]']
INFO:elastalert:Ignoring match for silenced rule Example rule
INFO:elastalert:Ignoring match for silenced rule Example rule
INFO:elastalert:Ignoring match for silenced rule Example rule
INFO:elastalert:Ignoring match for silenced rule Example rule
INFO:elastalert:Ignoring match for silenced rule Example rule
INFO:elastalert:Ignoring match for silenced rule Example rule
INFO:elastalert:Ran Example rule from 2016-11-09 10:56 CST to 2016-11-29 10:56 CST: 27 query hits, 7 matches, 1 alerts sent
INFO:elastalert:Sleeping for 5999 seconds

ALL the 27 hits are listed and my question is about the hits that were found in

2016-11-16 10:31:12

Got 2016-11-16 10:31:12+00:00, count: 1
Got 2016-11-16 10:31:13+00:00, count: 2
Got 2016-11-16 10:31:14+00:00, count: 3
Got 2016-11-16 10:31:15+00:00, count: 1
Got 2016-11-16 10:31:33+00:00, count: 2
Got 2016-11-16 10:31:34+00:00, count: 3
Got 2016-11-16 10:31:35+00:00, count: 1

As the rule I set, I need to find if any hits over three times in 0.2 hour, and a match will be reported. From above we can find that in
2016-11-16 10:31:12
this time period, there are 7 hits, which is over the rule (3 hits), one match should be reported, however two matches are reported because the number of total hits(7 hits) in this time period is more than twice the number of hits I set in the rule (3 hits).

When it hit's num_events, the count gets reset. I completely understand why it would make sense to continue for .2 hours after that instead of starting to count again immediately. I've been considering whether to change the behavior, because that does make sense.

You added --start NOW so it doesn't look back in time? Is that not what you expect to happen?

It was added by accident and I did not review carefully. Thanks for your patience!!!

Hi @Qmando , I do not know if I can keep posting my new problem here. If not please let me know and I will open a new question.

My new question is about the error: AttributeError: 'NoneType' object has no attribute 'access_key'

I check the previous problem, it might come from the ALERT, but I did not use SNS, I use email as usual, but this error occurred

user@tuolenovo:~/elastalert$ python -m elastalert.elastalert --verbose --rule rule_tests/example_any.yaml
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 164, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 74, in _run_code
    exec code in run_globals
  File "/home/user/elastalert/elastalert/elastalert.py", line 1409, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/home/user/elastalert/elastalert/elastalert.py", line 1404, in main
    client = ElastAlerter(args)
  File "/home/user/elastalert/elastalert/elastalert.py", line 115, in __init__
    self.writeback_es = elasticsearch_client(self.conf)
  File "elastalert/util.py", line 260, in elasticsearch_client
    boto_profile=es_conn_conf['boto_profile'])
  File "elastalert/auth.py", line 40, in __call__
    aws_access_key_id = str(aws_credentials.access_key)
AttributeError: 'NoneType' object has no attribute 'access_key'

My example_any.yaml:

es_host : localhost
es_port : 9200
name: Example rule any for jiangsu01
type: any
index: ip_date_fixed
filter:
- term:
    rslt: "1"
alert:
- "email"
email: "[email protected]"

I am looking forward to seeing your reply

Hmm. This should only happen if aws_region is set. Why don't you apply this diff and tell me what it prints out

diff --git a/elastalert/util.py b/elastalert/util.py
index 7ade3ce..e6c7a7f 100644
--- a/elastalert/util.py
+++ b/elastalert/util.py
@@ -266,6 +266,7 @@ def elasticsearch_client(conf):
     """ returns an Elasticsearch instance configured using an es_conn_config """
     es_conn_conf = build_es_conn_config(conf)
     auth = Auth()
+    print "es_conn_conf:", es_conn_conf
     es_conn_conf['http_auth'] = auth(host=es_conn_conf['es_host'],
                                      username=es_conn_conf['es_username'],

First, thank you for the replying :)
I added print "es_conn_conf:", es_conn_conf in my util.py
then get the print out after running the rule

user@tuolenovo:~/elastalert$ python -m elastalert.elastalert --verbose --rule rule_tests/example_any.yaml
es_conn_conf: {'es_port': 9200, 'http_auth': None, 'aws_region': 'us-west-2', 'es_password': None, 'boto_profile': None, 'es_url_prefix': '', 'es_username': None, 'send_get_body_as': 'GET', 'use_ssl': False, 'es_conn_timeout': 10, 'es_host': '127.0.0.1', 'verify_certs': True}
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 164, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 74, in _run_code
    exec code in run_globals
  File "/home/user/elastalert/elastalert/elastalert.py", line 1409, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/home/user/elastalert/elastalert/elastalert.py", line 1404, in main
    client = ElastAlerter(args)
  File "/home/user/elastalert/elastalert/elastalert.py", line 115, in __init__
    self.writeback_es = elasticsearch_client(self.conf)
  File "elastalert/util.py", line 261, in elasticsearch_client
    boto_profile=es_conn_conf['boto_profile'])
  File "elastalert/auth.py", line 40, in __call__
    aws_access_key_id = str(aws_credentials.access_key)
AttributeError: 'NoneType' object has no attribute 'access_key'

Do you have a file config.yaml file that has aws_region set in it? That default value isn't coming from elastalert, at least not on the most recent version.

Was this page helpful?
0 / 5 - 0 ratings