Elastalert: Elastalert HA

Created on 15 Apr 2016 · 7Comments · Source: Yelp/elastalert

Hi,

I read already here that it is not a good idea to run elastalert in different servers at same time. We are looking for different possibilities to make it high available. But I was wondering if there is any autorecovery configuration already in place.
We have install it as a windows service and with the autorecovery options but we are thinking how to handle a second instance that can act as a backup in case the autorecovery is not enough.

Thanks and kind regards,
Ruth

Source

abiruth84

👍12

Most helpful comment

ElastAlert saves all of it's state to Elasticsearch so it can be stopped, moved, restarted, etc without losing data. There are also some mechanisms to avoid catastrophic failure if some code raises an exception, Elasticsearch goes down, or similar errors, but obviously it's not perfect.

Service monitoring, autorestarting, failover, and testing changes before deploying them is not something Elastalert is responsible for at this time.

ElastAlert is still very lightweight and adding it's own mechanism of distributed discovery and availability is a large task, though potentially it will support these in the future.

Qmando on 17 Jun 2017

👍6

All 7 comments

This is for sure something I'd also like to know. Relying on a single instance of ElastAlert is not really good enough in my opinion.

mrosterm on 4 May 2016

👍6

Don't suppose anyone has gotten any good responses on this yet?

teranth on 15 Jul 2016

So one idea might be... have the service run on N number of elasticsearch nodes but always check for the elected "master" before committing to an action.

This way if a esearch node falls over, the newly elected master could also resume alerting responsibilities.

mlosapio on 22 Mar 2017

👍1

@Qmando Any suggestions on this?

wittyameta on 16 May 2017

👍1

If you run with a service discovery mechanism this can be trivially done.
Likewise if you run on an orchestration service..

bearrito on 17 Jun 2017

👍1

Service monitoring, autorestarting, failover, and testing changes before deploying them is not something Elastalert is responsible for at this time.

ElastAlert is still very lightweight and adding it's own mechanism of distributed discovery and availability is a large task, though potentially it will support these in the future.

Qmando on 17 Jun 2017

👍6

ElastAlert saves all of it's state to Elasticsearch so it can be stopped, moved, restarted, etc without losing data.
I think this depends on the rule being used. Some of them, like frequency, I think, keep counts in memory. This data is lost if ElastAlert is restarted. So, I think the solution that @mlosapio describes would take this into account.