Elastalert: Monitor if process is down

Created on 15 Jun 2016 · 8Comments · Source: Yelp/elastalert

Im using topbeat to gather system information including the processes running in it and storing the same to elasticsearch after parsing through logstash.
Im trying to configure elastalert to monitor if specifc process say, "nginx" is down in server. I tried using flatline but it wasnt working. Below are relevant parts of the configuration:

**name: Process not running
type: flatline
threshold: 1
use_count_query: true
doc_type: process

index: topbeat-*

timeframe:
minutes: 5
filter:

and:
- term:
  
  host: "prod-app01"
- term:
  
  proc.name: "nginx"**

And the debug logs:
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent. To send them, use --verbose.
INFO:elastalert:Starting up
INFO:elastalert:Queried rule Process not running from 2016-06-15 16:03 ART to 2016-06-15 16:08 ART: 1 hits
INFO:elastalert:Skipping writing to ES: {'hits': 1, 'matches': 0, '@timestamp': '2016-06-15T19:08:26.347513Z', 'rule_name': 'Process not running', 'starttime': '2016-06-15T19:03:26.337785Z', 'endtime': '2016-06-15T19:08:26.337785Z', 'time_taken': 0.0097169876098632812}
INFO:elastalert:Ran Process not running from 2016-06-15 16:03 ART to 2016-06-15 16:08 ART: 1 query hits, 0 matches, 0 alerts sent
INFO:elastalert:Sleeping for 299 seconds
INFO:elastalert:Queried rule Process not running from 2016-06-15 16:08 ART to 2016-06-15 16:13 ART: 0 hits
INFO:elastalert:Skipping writing to ES: {'hits': 0, 'matches': 0, '@timestamp': '2016-06-15T19:13:25.450825Z', 'rule_name': 'Process not running', 'starttime': '2016-06-15T19:08:26.337785Z', 'endtime': '2016-06-15T19:13:25.438628Z', 'time_taken': 0.012176036834716797}
INFO:elastalert:Ran Process not running from 2016-06-15 16:08 ART to 2016-06-15 16:13 ART: 0 query hits, 0 matches, 0 alerts sent
INFO:elastalert:Sleeping for 299 seconds

Source

vikas8190

Most helpful comment

@phermann1988 : Yes you can do it.
The disk space statistics are available as a "filesystem" type record from topbeat.
A filter similar to below should work for diskspace utilization between 90% to 100%.

filter:

and:
- term:
  
  _type: "filesystem"
- range:
  
  fs.used_p:
  
  from: 0.9
  
  to: 1

vikas8190 on 30 Jun 2016

👍3

All 8 comments

Seems to work after i comment out:

use_count_query: true

vikas8190 on 15 Jun 2016

Hi @vikas8190, thx for sharing this is also relevant for my altering system! do you think (or even know) if it's possible to write an elastalert rule for tobeat related data and get an alarm if a volume/hdd's space is more than 90% full?

phermann1988 on 30 Jun 2016

filter:

and:
- term:
  
  _type: "filesystem"
- range:
  
  fs.used_p:
  
  from: 0.9
  
  to: 1

vikas8190 on 30 Jun 2016

👍3

fantastic!! thank you @vikas8190. could you maybe post your whole working Process not running rule? if i try it with the snippet you posted above, it always get the following error:

elastalert.util.EAException: Error loading file myrulesprocess_flatline.yaml: C
ould not parse file myrulesprocess_flatline.yaml: mapping values are not allowe
d in this context
in "myrulesprocess_flatline.yaml", line 14, column 19

line 14, column 19 is the ":" before host (..term: host: "server01"..), so maybe sth is missing in my rule? (just coppied your code and changed the host and process name). thank you :-)

phermann1988 on 30 Jun 2016

Well its a yaml file. So you have to format it properly(Direct copy paste of above wont work). Format it properly and try.

vikas8190 on 30 Jun 2016

Hi i want to measure the data from topbeat
Whenever range: fs.used_p: from: 0.9 to: 1
it alerts me with the details, even if single event occurs

Which rule should i use and what should be my default parameters

Pransh20 on 13 Jul 2016

Try out adding below parameters to configuration file:
aggregation:
minutes: 1
realert:
minutes: 1

Change the values as u need. Read about what its used for in elastalert documentation

vikas8190 on 13 Jul 2016

@Pransh20 : I configured it using 'any' type rule
Measuring high disk space usage by percentage.

es_host: localhost
es_port: 9200
name: Disk space monitoring
type: any
index: topbeat-*
filter:
    - range:
        mem.used_p:
            from: 0.9
            to: 1
realert:
    minutes: 30
alert:
- "email"
email:
- "[email protected]"