Reminder service gets started even before loading boot strap providers while starting the silo.
From the documentation,i got that bootstrap providers are meant for pre-start tasks even before a silo starts up and hence I have added Dependency registration tasks in boot strap provider.
Now since reminder service gets started before boot strap, in a silo restart condition, the grains(reminded by reminder service) will get activated even before Dependency registration tasks are completed by boot strap providers, resulting in exceptions,
Any workaround?
This is essentially the same issue I warned about in the past:
https://github.com/dotnet/orleans/issues/1110#issuecomment-162738141
I think we need to separate Boostrap provider into 2 calls: before silo started and after.
The InitBeforeSiloStarted is not allowed to send msgs to grains or use any other runtime services (such as subscribe to streams or to reminders). It can interact with other providers, like storage provider for example, via direct reference. InitAfterSiloStarted can so anything.
Essentially, it is a chicken and egg problem:
1) In one scenario we need the reminder service to start before bootstrap, since in this scenario we need to create reminders from the bootstrap provider. Or more generally, we need t send msgs from the bootstrap provider. This cannot happen before silo became active in the membership.
2) In your scenario we need to inject dependencies from bootstrap before reminders may tick or msg from other silo arrive. Or more generally, we need to do something in the silo BEFORE ANY msg can arrive, which can happen the moment we call BecomeActive in the membership.
I really see no other way as to separate Boostrap provider into 2 calls.
Workarounds before this is fixed/addressed: ugly - put a conditional Task in the grains code to await the Dependency registration task - basically delay the grain msg execution until Dependency is registered. Yes, very ugly.
Yes, for some boot config scenarios (especially DI), i can see where silo "pre-boot" app hooks could be needed / useful.
Maybe call some POCO interface(s) here in SiloHost.StartOrleansSilo?
https://github.com/dotnet/orleans/blob/master/src/OrleansRuntime/Silo/SiloHost.cs#L155
Don't most DI systems have mechanisms to pre-configure and/or auto-wire the dependencies though?
I think the hook point of current "bootstrap providers" [after grain messaging is possible] is still important, but there is also the distinction between silo states "running locally" and "has joined cluster".
Maybe it would be better to expose the full "silo lifecycle model" through a callback listener interface for all these stages?
We already have internal SiloStatus model for these, including: Created, Joining, Active
https://github.com/dotnet/orleans/blob/master/src/Orleans/Runtime/SiloStatus.cs#L6
Any more opinions here? @sergeybykov ? @jason-bragg ? @jdom ?
I am of the opinion that bootstrap providers should stay as a moral equivalent to "autoexec.bat" - they run when the system is fully operational, grains can be activated, messages can be sent, etc.
Any system initialization prior to that, I think, should be done through a different mechanism in order to avoid confusion. I don't know if it should be yet another category of a provider or simply hooks around Silo or SiloHost as @jthelin seems to allude to.
I'm afraid if we try to put both modes (and their sub-phases) into a single notion of a bootstrap provider, it will quickly become a mess difficult to figure out for simple "autoexec.bat" app scenarios.
Almost independent of that, I think we need to make DI work in the runtime itself. @jdom is passionate to look into that.
I had no doubt you will object.
Clearly, we have users that need some way to configure the silo prio to it joining the cluster.
@jthelin came with some good ideas, and I had at least one.
Would you like to outline your alternative concrete proposal? Otherwise, this will not get implemented and the scenario will not be achievable. I raised that issue a month ago, and it would be a shame to get asked about that again in a month.
I happened to have a conversation with @jdom just last Friday about DI. That's why I mentioned him. I expect between him and @attilah, who is back to the DI question, to produce a concrete proposal for runtime DI.
I expect that such a proposal will cover these scenarios by configuring DI much earlier than it is done today, so that pieces of the runtime could use injectable components.
@jdom and @attilah don't need a proposal from me, as they are much more experienced with DI than I am.
Got it. Sounds like you got a plan.
@gabikliot wrote:
Workarounds before this is fixed/addressed: ugly - put a conditional Task in the grains code to await the Dependency registration task - basically delay the grain msg execution until Dependency is registered. Yes, very ugly.
Today I faced the same problem. My workaround for initialization tasks which must take place before silo gets active in membership is to register a "fake storage provider" for simply doing initialization stuff like DI registrations etc. The reason why this works is that storage providers are initialized before silo gets active in membership (see https://github.com/dotnet/orleans/issues/1110#issuecomment-162036097). All other initialization stuff which has to make grain calls is placed into a regular bootstrap provider.
But I have another question about bootstrap providers. Say you have 5 silos which have different startup times (for whatever reason). Say silo 1 is the fastest concerning startup, and the remaining 4 are still "warming up" and not active in membership at that point: what happens if silo 1 instantiates some grains in its bootstrap provider? Are these grains placed all in silo 1?
If so: how to avoid this? How to give all starting silos a chance for a proper "rendezvous" during bootstrap phase?
what happens if silo 1 instantiates some grains in its bootstrap provider? Are these grains placed all in silo 1?
Yes
"how to avoid this?"
I know of no 'good' means of handling this, but one of the more common solutions I've seen is by adding a startup delay to the bootstrap action. Say there are 100000 grains your bootstrap program needs to call at startup, and you want them distributed over the entire cluster. Instead of the bootstrap making these calls, the bootstrap makes a single call to a grain that will do this work after a period of time, say 2 minutes. Each silo's bootstrap makes this grain call, but all the calls go to the same grain, so all but the first are ignored. After the initial startup delay (the period of time it should take the cluster to start) the grain timer fires, creating grains across the entire cluster. This is not ideal, as it delays startup time, but it does prevent all the grains from being placed on the first silo that starts.
Ok. You mean to create some bootstrap grain (not bootstrap provider) with, say, just an empty Init() method to get the grain activated. In the OnActivateAsync() method I set a timer, and in the timer callback method I do my actual bootstrapping?
Since this grain can only be active in a single silo, the other silos executing the same code some seconds later and calling Init() will not instantiate duplicate grains.
Is this right?
Seems like a good idea. Thank you!
Yes. Something like that is what I was proposing. Something to be mindful of here is silo restarts. Say your cluster starts and everything is happy, but a day later one of the silos cycles for some reason. The new silo will call this same Init() call, but the bootstrap grain will have been deactivated by then, so it will run the startup logic even though the cluster is still up. This means that startup logic will need to be written in a way that if it is called more than once, it will do no harm.
In environments that use this pattern they test this scenario by having the bootstrap call the bootstrap grain with a random grain id if one is not configured. This causes each silo to run its own bootstrap grain and the bootstrap logic runs per silo. This of course is only done in test environments to verify that duplicate calls are safe, in production they configure a single bootstrap grain id in which ensures a single run of the logic at cluster startup.
Thank you for this useful and detailed information, @jason-bragg. There are always some edge cases to be considered, yes. Bootstrapping the way you described it should be idempotent of course.
Addressed by silo lifecycle