Hi,
I read already here that it is not a good idea to run elastalert in different servers at same time. We are looking for different possibilities to make it high available. But I was wondering if there is any autorecovery configuration already in place.
We have install it as a windows service and with the autorecovery options but we are thinking how to handle a second instance that can act as a backup in case the autorecovery is not enough.
Thanks and kind regards,
Ruth
This is for sure something I'd also like to know. Relying on a single instance of ElastAlert is not really good enough in my opinion.
Don't suppose anyone has gotten any good responses on this yet?
So one idea might be... have the service run on N number of elasticsearch nodes but always check for the elected "master" before committing to an action.
This way if a esearch node falls over, the newly elected master could also resume alerting responsibilities.
@Qmando Any suggestions on this?
If you run with a service discovery mechanism this can be trivially done.
Likewise if you run on an orchestration service..
ElastAlert saves all of it's state to Elasticsearch so it can be stopped, moved, restarted, etc without losing data. There are also some mechanisms to avoid catastrophic failure if some code raises an exception, Elasticsearch goes down, or similar errors, but obviously it's not perfect.
Service monitoring, autorestarting, failover, and testing changes before deploying them is not something Elastalert is responsible for at this time.
ElastAlert is still very lightweight and adding it's own mechanism of distributed discovery and availability is a large task, though potentially it will support these in the future.
ElastAlert saves all of it's state to Elasticsearch so it can be stopped, moved, restarted, etc without losing data.
I think this depends on the rule being used. Some of them, like frequency, I think, keep counts in memory. This data is lost if ElastAlert is restarted. So, I think the solution that @mlosapio describes would take this into account.
Most helpful comment
ElastAlert saves all of it's state to Elasticsearch so it can be stopped, moved, restarted, etc without losing data. There are also some mechanisms to avoid catastrophic failure if some code raises an exception, Elasticsearch goes down, or similar errors, but obviously it's not perfect.
Service monitoring, autorestarting, failover, and testing changes before deploying them is not something Elastalert is responsible for at this time.
ElastAlert is still very lightweight and adding it's own mechanism of distributed discovery and availability is a large task, though potentially it will support these in the future.