Hi, I'm clearly missing something simple but:
my_rule.yaml
type: flatline
index: index-*
threshold: 1
timeframe:
hours: 24
use_count_query: true
doc_type: doc
filter:
- query:
query_string:
query: "application:\"nonsense\""
I would expect that: elastalert-test-rule my_rule.yaml would say something like "An abnormally low number of events .." because obviously there are no events with field and value "nonsense" and there will never be. However, if I change timeframe to hours:1, strangely it hits the rule and says "An abnormally low ...". To make sure I created that original rule and left for days, but still no alerts...
Same issue here, left it running, still no alerts unless i set it to 1-23 hour. When set to 24 hours no alerts happen. My logstash indexes are in the following format, it is probably related to that: logstash-%Y.%m.%d
Does it trigger an alert immediately if you add --start 2018-05-28 where that date is 24+ hours ago?
I don't think there is anything special about 24 hours exactly, but I guess it's possible. Are you sure you waited a full 24 hours and there was no matching documents?
But I left that alert for days and nothing or missing something ?
@Qmando , If i set it to 23 hours it triggers immediately, when i change it to 24 hours it no longer triggers. Waited a weekend otherwise for it to trigger without success. And I remember this used to be working in the past (same version and config) then it stopped working. I deleted the index and metadata without luck though.
You are on version 0.1.31? I'll try to reproduce this and get back to yall.
Not the latest version, i will upgrade tomorrow and post my rule configuration, thanks.
@Qmando I upgraded to 0.1.31, still having the same error
Here is my configuration:
type: flatline
index: logstash-%Y.%m.%d
use_strftime_index: true
threshold: 1
timeframe:
minutes: 1445 # 24hours + 5 minutes
use_count_query: true
realert:
days: 3
filter:
- query:
match:
type: "audit"
doc_type: "audit"
If I set time-frame to 23 hours, it works right away, 24 hours nothing.
I tried removing use_count_query, realerting config, doc_type, used index matching like logstash-* without any difference.
I can't reproduce this using your rule config:
$ python -m elastalert.elastalert --rule test.yaml --debug --start 2018-05-29
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.
To send them but remain verbose, use --verbose instead.
INFO:elastalert:Starting up
INFO:elastalert:Queried rule dffgdfgd from 2018-05-28 17:00 PDT to 2018-05-28 18:00 PDT: 0 hits
.....
INFO:elastalert:Queried rule dffgdfgd from 2018-05-30 11:00 PDT to 2018-05-30 11:08 PDT: 0 hits
INFO:elastalert:Skipping writing to ES: {'rule_name': u'dffgdfgd.all', '@timestamp': '2018-05-30T18:08:43.241762Z', 'exponent': 0, 'until': '2018-06-02T18:08:43.241754Z'}
INFO:elastalert:Alert for dffgdfgd at 2018-05-30T02:00:00Z:
INFO:elastalert:dffgdfgd
An abnormally low number of events occurred around 2018-05-29 19:00 PDT.
Between 2018-05-28 18:55 PDT and 2018-05-29 19:00 PDT, there were less than 1 events.
@timestamp: 2018-05-30T02:00:00Z
count: 0
key: all
num_hits: 0
num_matches: 36
INFO:elastalert:Ignoring match for silenced rule dffgdfgd.all
...
Same exact thing if I try 23 hours. Is this not what you are doing? Can you show logs from when your 23 hour timeframe works right away and 24 hours doesnt?
Hi,
So i was using elastalert test when the 23 hour timeframe setting succeeded, otherwise i see it does not work just like with 24 hours.
1. When I use 23 or 24 hours without specifying --start:
INFO:elastalert:Queried rule [int] No xyz Logs arrive from 2018-06-01 14:03 CEST to 2018-06-01 14:06 CEST: 0 / 0 hits
INFO:elastalert:Ran [int] No xyz Logs arrive from 2018-06-01 14:03 CEST to 2018-06-01 14:06 CEST: 0 query hits (0 already seen), 0 matches, 0 alerts sent
2. When I specify --start 2018-05-30 then SOMETIMES it says it is silenced but only sometimes
....
INFO:elastalert:Ignoring match for silenced rule [int] No xyz Logs arrive.all
INFO:elastalert:Ignoring match for silenced rule [int] No xyz Logs arrive.all
INFO:elastalert:Ran [int] No AdnIDM Logs arrive from 2018-05-30 02:00 CEST to 2018-06-01 14:03 CEST: 0 query hits (0 already seen), 149 matches, 0 alerts sent
3. Sometimes (randomly) it says "no matches" found instead of saying the rules is silenced (which it should not be because there were no alerts whatsoever).
4. When I run "elastalert-test-rule rules/xyz_logs_flatline.yaml --alert --config config.yaml" with 23 hours
Would have written the following documents to writeback index (default is elastalert_status):
silence - {'rule_name': u'[int] No AdnIDM Logs arrive.all', '@timestamp': datetime.datetime(2018, 6, 1, 12, 20, 35, 719367, tzinfo=tzutc()), 'exponent': 0, 'until': datetime.datetime(2018, 6, 1, 15, 20, 35, 719357, tzinfo=tzutc())}
elastalert - {'alert_info': {'type': 'email', 'recipients': ['xyz']}, 'alert_sent': True, 'match_body': {'count': 0, 'num_hits': 0, '@timestamp': '2018-06-01T11:35:35.385220Z', 'key': 'all', 'num_matches': 4}, 'rule_name': '[int] No xyz Logs arrive', 'match_time': '2018-06-01T11:35:35.385220Z', 'alert_time': datetime.datetime(2018, 6, 1, 12, 20, 35, 719526, tzinfo=tzutc())}
elastalert_status - {'hits': 0, 'matches': 4, '@timestamp': datetime.datetime(2018, 6, 1, 12, 20, 36, 358059, tzinfo=tzutc()), 'rule_name': '[int] No xyz Logs arrive', 'starttime': datetime.datetime(2018, 5, 31, 12, 20, 35, 385220, tzinfo=tzutc()), 'endtime': datetime.datetime(2018, 6, 1, 12, 20, 35, 385220, tzinfo=tzutc()), 'time_taken': 0.9462141990661621}
5. When I run "elastalert-test-rule rules/xyz_logs_flatline.yaml --alert --config config.yaml" with 24 hours
Would have written the following documents to writeback index (default is elastalert_status):
elastalert_status - {'hits': 0, 'matches': 0, '@timestamp': datetime.datetime(2018, 6, 1, 12, 30, 5, 675743, tzinfo=tzutc()), 'rule_name': '[int] No xyz Logs arrive', 'starttime': datetime.datetime(2018, 5, 31, 12, 30, 5, 353628, tzinfo=tzutc()), 'endtime': datetime.datetime(2018, 6, 1, 12, 30, 5, 353628, tzinfo=tzutc()), 'time_taken': 0.2949259281158447}
Ty for the info. I'll take a look at this again.
Has anyone found a resolution to this? My rule is set up like so:
name: {{ region_env_name }} Number of calls (5 min)
type: flatline
index: callflows-*
threshold: 1
timeframe:
minutes: 5
use_count_query: true
doc_type: doc
filter:
- query:
query_string:
query: "_exists_:CVPAppName AND CVPAppName:APP AND (CallType:7526 OR 7525)"
alert:
- "sns"
sns_topic_arn: "SNS_ARN"
From reading the documents I would think Flatline would send an alert because there are no hits matching the query above, however the elastalert if not firing.
Can you post logs? Run elastalert with --verbose for at least 5 minutes.
Hi Qmando, it actually ended up working, I just didn't give it enough time for elastalert to process the alert I guess. Thanks for following up.
Same issue here.
pip show elastalert
Name: elastalert
Version: 0.0.75
Version upgraded to 0.1.29 makes the same.
```# --- End Global Rule Configuration ---
scan_entire_timeframe: true
type: flatline
timeframe:
days: 7
run_every:
minutes: 10
threshold: 1
use_count_query: true
```
No matches were found even having not reached the threshold.
The index pattern we doing is index: index-*
Have checked few things already like the buffer_time
Letting elastalert daemon configured with defaults doesn't alert either. Other flatline alerts are doing fine
Update:
python -m elastalert.elastalert --rule test.yaml --debug --start 2019-06-08 works fine but doesn't when it is running as a daemon.
How is the start affecting to the search? Is it same as timeframe set in the config file?
timeframe:
days: 7
Wow what a coincidence. We missed an outage this week because our dirt simple flatline rule refused to fire as per the documentation says it should. Now the client is asking why we missed it.
While I appreciate the author work and willingness open source elastalert this is very frustrating.
I have wasted hours re-testing this with elastalert-test-rule and it only fires if --start is set for over 36 hours prior but it won't fire when running in production.
If my look back is set to 24 hours and my threshold is 3 and there have been no indexes logged for 12 hours prior then should have fired and should refire if I erase all the elastalert metadata and restart, but it doesn't. At least that is how the documentation describes it. It should be that simple.
Maybe this rule should marked as being an incomplete or broken so people don't trust being alerted to production system outage by using it until it get fixed. Because it is definitely broken.
Also the results of running elastalert-test-rule and the actual running in production should be exactly the same.
I know the documentation mentions that the results could be different, but if that the case then running elastalert-test-rule is worthless because it not a valid test unless it produces the exact same result just like a unit test
name: "Testing 123 Potential System Outage"
http_post_static_payload:
alert_name: "Testing 123 Potential System Outage"
alert_form: "PotentialSystemOutageAlarm"
http_post_all_values: True
verify_certs: False
index: agent-publicipaddress-*
type: flatline
query_key: _index
doc_type: publicipaddress
threshold: 1
timeframe:
hours: 12
alert:
Most helpful comment
Wow what a coincidence. We missed an outage this week because our dirt simple flatline rule refused to fire as per the documentation says it should. Now the client is asking why we missed it.
While I appreciate the author work and willingness open source elastalert this is very frustrating.
I have wasted hours re-testing this with elastalert-test-rule and it only fires if --start is set for over 36 hours prior but it won't fire when running in production.
If my look back is set to 24 hours and my threshold is 3 and there have been no indexes logged for 12 hours prior then should have fired and should refire if I erase all the elastalert metadata and restart, but it doesn't. At least that is how the documentation describes it. It should be that simple.
Maybe this rule should marked as being an incomplete or broken so people don't trust being alerted to production system outage by using it until it get fixed. Because it is definitely broken.
Also the results of running elastalert-test-rule and the actual running in production should be exactly the same.
I know the documentation mentions that the results could be different, but if that the case then running elastalert-test-rule is worthless because it not a valid test unless it produces the exact same result just like a unit test