Hi. I've recently started working with Orleans and would like to clarify a couple of issues:
According to the presentation, it's managed automatically. However, I don't quite get how a silo would be restarted if it completely fails. There is no central monitoring process to manage it, is it?
Implement health checks so k8s/swarm may restart the container. Then silo will automatically join the cluster and will start hosting newly activated grains. During node downtime, actors will be spawned on still-alive silos.
According to the presentation, it's managed automatically.
Orleans runtime automatically respond to changes in the hosting environment (silos/nodes added or removed) and reconfigures cluster accordingly. However, it does not try, and cannot really, restart anything.
So, you still need a hosting solution that would ensure that enough nodes are running at any point in time, and that they get restarted in case of a failure. Initially, Orleans was built with Azure Cloud Services in mind. Nowadays, k8s seems to be the most popular choice.
Does it make sense to monitor the health of the grains? I suppose it's not needed as Orleans will automatically deactivate them if they fail and we'll get an error in the logs.
Generally speaking, grains never fail. But a silo where a grain in activated might. In that case, that grain get automatically reactivated in another silo upon a next call to it. Because of that, instead of monitoring an individual grain, people usually monitor health of their service as a whole, by executing synthetic transaction against it or otherwise.
@yevhen
@sergeybykov
Thank you very much. I believe the issue can be considered resolved as the questions have been answered.
Most helpful comment
Implement health checks so k8s/swarm may restart the container. Then silo will automatically join the cluster and will start hosting newly activated grains. During node downtime, actors will be spawned on still-alive silos.