Elastalert: Getting 0 query hits in real runs, but getting hits in test run

Created on 5 Dec 2017  路  4Comments  路  Source: Yelp/elastalert

/etc/elastalert/rules2/TestRule.yaml

name: Test Percentage
type: percentage_match

es_host: localhost
es_port: 9200

index: logstash-*
description: "100% of all TestPercentage should be successful"

filter:
- query:
    query_string:
      query: "source_id: 4"

buffer_time:
    hours: 24


match_bucket_filter:
    - query_string:
        query: "execution_status: complete"

min_percentage: 90

alert:
  - "slack"

slack_webhook_url: XXXXX

/etc/elastalert/config2.yaml

rules_folder: /etc/elastalert/rules2

run_every:
  minutes: 15

buffer_time:
  minutes: 30

es_host: localhost
es_port: 9200

writeback_index: elastalert_status

python -m elastalert.test_rule --config /etc/elastalert/config2.yaml /etc/elastalert/rules2/TestRule.yaml

Successfully loaded Test Percentage

Got 1363 hits from the last 1 day

Available terms in first hit:
    source_name
    @timestamp
    ....... (etc)

INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.
                To send them but remain verbose, use --verbose instead.
INFO:elastalert:Alert for Test Percentage at 2017-12-05T22:26:32.469972Z:
INFO:elastalert:Test Percentage

Percentage violation, value: 49.4497432135 (min: 90 max : None)

@timestamp: 2017-12-05T22:26:32.469972Z
num_hits: 1363
num_matches: 1
percentage: 49.4497432135


Would have written the following documents to writeback index (default is elastalert_status):

silence - {'rule_name': 'Test Percentage', '@timestamp': datetime.datetime(2017, 12, 5, 22, 26, 32, 483288, tzinfo=tzutc()), 'exponent': 0, 'until': datetime.datetime(2017, 12, 5, 22, 27, 32, 483278, tzinfo=tzutc())}

elastalert_status - {'hits': 1363, 'matches': 1, '@timestamp': datetime.datetime(2017, 12, 5, 22, 26, 32, 483912, tzinfo=tzutc()), 'rule_name': 'Test Percentage', 'starttime': datetime.datetime(2017, 12, 4, 22, 26, 32, 469972, tzinfo=tzutc()), 'endtime': datetime.datetime(2017, 12, 5, 22, 26, 32, 469972, tzinfo=tzutc()), 'time_taken': 0.00942087173461914}

python -m elastalert.elastalert --verbose --config /etc/elastalert/config2.yaml

INFO:elastalert:Starting up
INFO:elastalert:Ran Test Percentage from 2017-12-05 20:00 UTC to 2017-12-05 22:28 UTC: 0 query hits (0 already seen), 0 matches, 0 alerts sent
INFO:elastalert:Sleeping for 899.990209 seconds

Why is it returning 0 query hits when I run it for real and returning a ton of matches when I run the test?
I also can correlate the test matches to those I see in Kibana.
Running ELK 5.6

Most helpful comment

elastalert-test-rule is a special script that does a bunch of things that elastalert doesn't normally. Showing you the count for the last 24 hours is one. As is showing the available fields. It also shows the "logs" it would write back to Elasticsearch, though that can often be a bit confusing.

It's similar to running with --debug rather than --verbose, which will also prevent logs from being uploaded to Elasticsearch, and will always print your alert to stdout rather than use the real alerter.

All 4 comments

Try running it again but forcing it to start 24 hours back:

python -m elastalert.elastalert --verbose --config /etc/elastalert/config2.yaml --start 2017-12-05

It could be that between 20:00 and 22:28, there were no documents. It should normally start with a 24 hour window, but if you run it a little then stop and restart it, it picks up where it last left off.

python -m elastalert.elastalert --verbose --config /etc/elastalert/config2.yaml --start 2017-12-05

INFO:elastalert:Starting up
INFO:elastalert:Ran Test Percentage from 2017-12-05 00:00 UTC to 2017-12-05 22:56 UTC: 0 query hits (0 already seen), 0 matches, 0 alerts sent
INFO:elastalert:Sleeping for 899.985943 seconds
^CINFO:elastalert:SIGINT received, stopping ElastAlert...

:(

Interestingly enough ...
python -m elastalert.elastalert --verbose --config /etc/elastalert/config2.yaml --start 2017-12-04

INFO:elastalert:Starting up
INFO:elastalert:Alert sent to Slack
INFO:elastalert:Ran Test Percentage from 2017-12-04 00:00 UTC to 2017-12-05 22:58 UTC: 1172 query hits (0 already seen), 1 matches, 1 alerts sent
INFO:elastalert:Sleeping for 899.859449 seconds

It does look it's related to the previous time ran perhaps? This is definitely pointing me in the right direction. I still don't understand why the test would show hits while the actual run wouldn't though.

elastalert-test-rule is a special script that does a bunch of things that elastalert doesn't normally. Showing you the count for the last 24 hours is one. As is showing the available fields. It also shows the "logs" it would write back to Elasticsearch, though that can often be a bit confusing.

It's similar to running with --debug rather than --verbose, which will also prevent logs from being uploaded to Elasticsearch, and will always print your alert to stdout rather than use the real alerter.

Was this page helpful?
0 / 5 - 0 ratings