Hi Quentin,
When working with ElastAlert, we may find that there are some issues as bellows:
From the ElastAlert documents when setting "run_every" = 2mins and "timeframe" = 15mins. We expected that every 2 minutes, ElastAlert will query the last 15mins (timeframe threshold).โ Please see the following configuration:
config.yaml
run_every:
minutes: 2
example.frequency.yaml
type: frequency
num_events: 80
timeframe:
minutes: 15
However, when starting the service with command : "pyhthon -m elastalert.elastalert --verbose", it scan only the last 2mins (value of "run_every") instead of 15mins (value of "timeframe") as expected.
There is also a similar problem when working with Cardinality rule
config.yaml
run_every:
minutes: 2
example.frequency.yaml
type: cardinality
max_cardinality: 10
timeframe:
minutes: 15
I see that instead of querying to the last 15 minutes (timeframe), every 2 minutes, ElastAlert just makes a query to last 2 minutes as mentioned problem.
--> To workaround, it is forced to set these values with equal value. It means that I configure "run_every" = "timeframe" for frequency rule and "buffer_time"="timeframe" for Cardinality rule.
Btw, I don't know for sure that I had a correct understanding about how to use these rules or not. Therefore, please correct me or help me solve this issue.
Thanks a lot,
Khanh
This is expected behavior. The 15 minutes of events for the timeframe are stored in memory. This minimized bandwidth used and gives us better accuracy and latency. It should also only occur when use_count_query or use_terms_query is set, otherwise each query is buffer_time wide.
Are there any negative affects of filling up the timeframe with multiple queries? Does that workaround _actually_ change the behavior at all?
From http://elastalert.readthedocs.io/en/latest/running_elastalert.html
buffer_time is the size of the query window, stretching backwards from the time each query is run. This value is ignored for rules where use_count_query or use_terms_query is set to true
@Qmando, If I understand it correctly the rule would keep 15 minutes of events in memory and will keep on updating it, e.g. on the next run it would add two minutes of new data and remove the oldest 2 minutes of data from the 15 minute sliding window of events. That is what you mean, right?
In that case, I think it is confusing if you just look at the logs one may feel that it is querying only last 2 minutes of data, maybe we can improve the documentation log messages to something better to differentiate between the data over which rule was queried Vs the actual query it is making to Elasticsearch?
Most helpful comment
@Qmando, If I understand it correctly the rule would keep 15 minutes of events in memory and will keep on updating it, e.g. on the next run it would add two minutes of new data and remove the oldest 2 minutes of data from the 15 minute sliding window of events. That is what you mean, right?
In that case, I think it is confusing if you just look at the logs one may feel that it is querying only last 2 minutes of data, maybe we can improve the documentation log messages to something better to differentiate between the data over which rule was queried Vs the actual query it is making to Elasticsearch?