Elastalert: Elastalert HA

Created on 15 Apr 2016  Â·  7Comments  Â·  Source: Yelp/elastalert

Hi,

I read already here that it is not a good idea to run elastalert in different servers at same time. We are looking for different possibilities to make it high available. But I was wondering if there is any autorecovery configuration already in place.
We have install it as a windows service and with the autorecovery options but we are thinking how to handle a second instance that can act as a backup in case the autorecovery is not enough.

Thanks and kind regards,
Ruth

Most helpful comment

ElastAlert saves all of it's state to Elasticsearch so it can be stopped, moved, restarted, etc without losing data. There are also some mechanisms to avoid catastrophic failure if some code raises an exception, Elasticsearch goes down, or similar errors, but obviously it's not perfect.

Service monitoring, autorestarting, failover, and testing changes before deploying them is not something Elastalert is responsible for at this time.

ElastAlert is still very lightweight and adding it's own mechanism of distributed discovery and availability is a large task, though potentially it will support these in the future.

All 7 comments

This is for sure something I'd also like to know. Relying on a single instance of ElastAlert is not really good enough in my opinion.

Don't suppose anyone has gotten any good responses on this yet?

So one idea might be... have the service run on N number of elasticsearch nodes but always check for the elected "master" before committing to an action.

This way if a esearch node falls over, the newly elected master could also resume alerting responsibilities.

@Qmando Any suggestions on this?

If you run with a service discovery mechanism this can be trivially done.
Likewise if you run on an orchestration service..

ElastAlert saves all of it's state to Elasticsearch so it can be stopped, moved, restarted, etc without losing data. There are also some mechanisms to avoid catastrophic failure if some code raises an exception, Elasticsearch goes down, or similar errors, but obviously it's not perfect.

Service monitoring, autorestarting, failover, and testing changes before deploying them is not something Elastalert is responsible for at this time.

ElastAlert is still very lightweight and adding it's own mechanism of distributed discovery and availability is a large task, though potentially it will support these in the future.

ElastAlert saves all of it's state to Elasticsearch so it can be stopped, moved, restarted, etc without losing data.
I think this depends on the rule being used. Some of them, like frequency, I think, keep counts in memory. This data is lost if ElastAlert is restarted. So, I think the solution that @mlosapio describes would take this into account.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wjk1982 picture wjk1982  Â·  3Comments

MaximilianKaltner picture MaximilianKaltner  Â·  3Comments

PMDubuc picture PMDubuc  Â·  3Comments

Eyad87 picture Eyad87  Â·  4Comments

abhishekjiitr picture abhishekjiitr  Â·  3Comments