Today, timer callbacks interleave with grain methods (at await points) for both reentrant and non-reentrant grains. This is a source of confusion because developers, naturally, do not expect interleaving in non-reentrant grains. We need to make timer callbacks respect reentrancy of grain classes and not interleave for non-reentrant grains.
Heads-up: this is a breaking change to an actively used feature(/bug? :-) )
We are actively relying on timer callback reentrancy. This being said, I agree that the behavior is confusing and should be changed. I think a straightforward migration path would be to call a reentrant grain method (specified using AlwaysInterleave or MayInterleave) from the timer callback.
@cata, thanks for the heads-up. I didn't realize people rely on this behavior.
This timer reentrancy initially surprised me as well, but now our code has also come to rely on it. The non-reentrant methods in our grains often control flags used by the timer callback.
Would be happy to migrate, but this "loophole" is currently the only way to have a partially reentrant grain - a useful feature that would be sad to lose.
Maybe we should make the behavior configurable then, to explicitly allow for both reentrant and non-reentrant execution of timer callbacks?
@sergeybykov - configurable behavior (an extra parameter on a RegisterTimer overload?) would work great for us.
The default behavior could be changed to match the grain's default reentrancy policy, thus making it less surprising.
I think that is exactly what we agreed upon and planned to do a long time ago. That is at least what I remember.
@sergeybykov @gabikliot Please can I clarify if this behavior is applicable to Reminders too?
I ask because the Reminders are implemented via Timers underneath. I'm thinking something I've implemented ontop of GrainServices may be subject to this interleaving which I did not anticipate would be an issue.
If someone can explain the proposed method of resolving this I am happy to have a go.
@jamescarter-le Reminders are different in behavior from timers because their ticks are delivered via true grain calls that follow reentrancy/interleaving rules.
It's very important for timers callback not to interleave. Optional configuration (to allow re-entrant) is also welcomed.
When do you plan releasing it?
Please consider also an option to register a timer as such it wont allow Orleans deactivating the Grain.
Thanks.
We don't have concrete plans for this. But we are open to contributions.
In the meanwhile, what is the correct way to 'self invoke' from a timer method to avoid interleaving?
using this.AsReference<IFoo>() ?
is it possible to achieve this without exposing the invoked method in the interface?
Yes, as reference.
No, it has to be exposed via interface. But you can define a different interface, it can be internal in the grain impl. assembly.
Ok, got it, thanks
To avoid the reentrancy of timers - I've used in my timer this.AsReference<IFoo>() to self-invoke a different method and by that it respects the non-reentrancy definition of the grain.
Now, I've seen that sometimes, when there are other methods which takes long time, then this method is executed multiple times sequentially and not in a 'timed' manner.
The timer only invokes the other method which get its turn scheduled, and sometimes, due to other long methods turns, it can be scheduled sequentially, and by that it doesn't respect the expected 'timer' intervals (i.e. run every x seconds).
Now, I can add a defense to avoid it running sequentially by checking the last time it was executed, but I was wondering if you have a better solution.
FYI - this is the method I used to self-invoke a method with the Grain context scheduler:
public static class GrainExtensions
{
/// <summary>
/// SelfInvokeAfter
/// invoke self method in next 'turn'
/// this is how it's done to avoid interleaving and re-entrant issues
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="grain"></param>
/// <param name="action"></param>
public static void SelfInvokeAfter<T>(this Grain grain, Func<T, Task> action)
where T : IGrain
{
var g = grain.AsReference<T>();
var factory = new TaskFactory(TaskScheduler.Current);
factory.StartNew(async () => await action(g));
}
}
@shlomiw I may be misunderstanding what you're trying to do here, but if the goal is to ensure that work triggered by the timer conforms to grain call semantics (ie, not reentrant) you need only make await the grain call.
public static Task SelfInvokeAfter<T>(this Grain grain, Func<T, Task> action)
where T : IGrain
{
var g = grain.AsReference<T>();
return action(g));
}
Assuming "action" contains a call to the grain, awaiting SelfInvokeAfter in the timer callback will prevent the next timer call from being scheduled until the first one is complete. By calling factory.StartNew(async () => await action(g)) and not waiting the completion of the operation, the timer callback is free to return immediately and begin counting down to the next trigger, before current work is complete.
If I can await the self-invoke from the timer then it sure makes sense (I thought I have to schedule it for next turn, to avoid 'deadlock', but it should be possible since the timer is reentrant!).
Thanks! you perfectly understood me. I'll validate it and let you know.
@jason-bragg - thanks! it works correctly as you suggested :)
@shlomiw, Glad that helped.
That's a common pattern used by those that want to schedule periotic work that respects Orleans call semantics.
It should be noted that since the timer is generating a grain call, assuming the timer period is shorter than the grain collection time, grains using this pattern will never be deactivated due to idleness, because the timer is always generating calls, which is interpreted as activity.
I don't know your service so I won't assume this is an issue for you, but wanted you to be aware of this side effect.
@jason-bragg - I'm aware of it, thanks for bringing that up!
I still think you should let the timers be configurable about their re-entrancy, I would also say that the default should be same as the Grain. This can be very confusing and cause unexpected results.
@jason-bragg -
It should be noted that since the timer is generating a grain call, assuming the timer period is shorter than the grain collection time, grains using this pattern will never be deactivated due to idleness, because the timer is always generating calls, which is interpreted as activity.
Now I have a scenario where I want to use a timer for grain local maintenance, which still respects the re-entrancy (i.e. - non-entrant), but I don't want it to keep the grain alive.
Any idea how?
I am sometimes using await loops with delays:
while (! done)
{
do_maintenance_work();
await Task.Delay(period);
}
As far as I know, this does the right thing if the grain deactivates (i.e. it will no longer resume at the await) but I am not 100% sure.
@sebastianburckhardt - tnx for your reply!
I'm not sure I understand. In what you suggested, if my grain is non-reentrant (as in my case) it would never get outside requests.
btw - instead of using timers, I was thinking about using grain call filter (interceptor) and execute my maintenance after I got a request (i.e. every once in a while).
@gabikliot, @sergeybykov, @jason-bragg,
I just tested timer invocation with the suggested workaround and I still see interleaving calls. I'm using Orleans 2.0*.
The grain implementation
```c#
using System;
using System.Threading.Tasks;
using OrleansGrainInterfaces;
using Orleans;
using Microsoft.Extensions.Logging;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using Orleans.Concurrency;
namespace OrleansGrains
{
public class ValueGrain : Grain, IValueGrain
{
private readonly ILogger _logger;
private readonly Random _rnd;
IDisposable _timer;
public ValueGrain(ILogger<ValueGrain> logger) {
_logger = logger;
_rnd = new Random();
}
public async Task<string> Get()
{
Console.WriteLine($"***Start***");
await Task.Delay(3000);
Console.WriteLine($"***Finish***");
return this.GetPrimaryKeyString();
}
public async Task OnTimer(object data)
{
await this.AsReference<IValueGrain>().Update();
}
public async Task Update()
{
Console.WriteLine($"***StartTimer***");
var waitPeriod = _rnd.Next(5000,15000);
await Task.Delay(waitPeriod);
Console.WriteLine($"***FinishTimer***");
}
public override Task OnActivateAsync()
{
_timer = RegisterTimer(this.OnTimer, null, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(10));
return Task.CompletedTask;
}
public override Task OnDeactivateAsync()
{
_timer?.Dispose();
return Task.CompletedTask;
}
}
}
</details>
Output produced:
Start
StartTimer <-- should not start, since the Get method was not finished yet
Finish
Start <-- new Get invocations during a timer invocation
Finish
Start
FinishTimer
Finish
```
I understand that the cals are (supposedly?) interleaved between await, but the documentation explicitly states (emphasis mine):
Each invocation of asyncCallback is delivered to an activation on a separate turn and will never run concurrently with other turns on the same activation. Note however, asyncCallback invocations are not delivered as messages and are thus not subject to message interleaving semantics. This means that invocations of asyncCallback should be considered to behave as if running on a reentrant grain with respect to other messages to that grain.
I guess I'm not understanding this two sentences correctly, as they seem contradictory to me.
So, timer callbacks will always interleave (based on testing), but the workaround, by using this.AsReference<IValueGrain>().Update() should honor the turn-based method invocation, but it does not in practice.
I'd appreciate an explanation, as I'm probably just misunderstanding some concepts here.
@sergeybykov
I know people is now relying on this _bug_ as previous comments but, it is indeed not the correct behaviour.
We are having issues now with this behaviour...
Will this be ever fixed?
@skyflyer
I am having this same issue after upgrading to Orleans 2.0 and the call chain reentrancy is the issue. Call chain reentrancy will interleave for self calls, A->B->A calls, and stream messages.
Despite the documentation, timers have never respected the grain scheduler and with 2.0 we no longer have a viable workaround.
My only option (unless someone has a better idea) is to disable the call-chain reentrancy so my original workaround functions again. Fortunately, since we wrote that vast majority of our code prior to 2.0 we already accounted for the deadlock issues present without call chain reentrancy.
@skyflyer - wow, actually scarry stuff, this could lead to unpredictable issues in my current implementation. Thanks for the update!
I'm still using orleans 2.0-beta3 (wanted to upgrade soon). I'll test it as well.
@skyflyer
One very ugly hack you can try - use the new HostedClient introduced in 2.1 to call your grain interface from the timer method. It should respect the reentrancy. I don't know about the performance hit implications and if it's relevant for you.
@shlomiw you mean inject IClusterClient (after enable the hosted client implementation with .EnableDirectClient()) on a grain and invoke it?
If that is an yes, then yeah... That is a _VERY_ ugly hack heheheh
We have a lot of Orleans.Storage.InconsistentStateException errors in non reentrancy grains with timer under heavy load.
Looks like the timer interleaving issue
```c#
Error from storage provider AdoNetGrainStorage.Server.Grains.ProjectGrain
during WriteState for grain
Type=Server.Grains.ProjectGrain Pk=grn/9955FAF8/0000000000000000000000000000001703ffffff9955faf8-0xFBF1FF54
Id=GrainReference:grn/9955FAF8/00000017 Error=
Exc level 0: Orleans.Storage.InconsistentStateException: Version conflict (WriteState):
ServiceId=pickpack-app ProviderName=Default GrainType=Server.Grains.ProjectGrain GrainId=23 ETag=185.
at Orleans.Storage.AdoNetGrainStorage.
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Orleans.Core.StateStorageBridge`1.
```
@sergeybykov Please change this interleaving bug to feature (fix documentation ) and add the flag that will make timer callbacks respect reentrancy of grains.
FYI - with orleans 2.0-beta3 the AsReference() trick does respect the re-entrancy. When we'll find time to upgrade to 2.2, I'll test it again.
Until then - I believe it would be better for all that timer will just respect the re-entrancy of the grains as default. I think it's far more dangerous otherwise..
Perhaps this is specific to 2.0-beta3 or perhaps I'm not using it right, but I tried AsReference in callback handler as well as in timer registration and my timer calls still interleave with client calls. I was testing this with 2.1.2.
@skyflyer You need to revert to the old 1.x behavior by disabling call chain reentrancy
.Configure<SchedulingOptions>(options => { options.AllowCallChainReentrancy = false; })
After you do that it will work as expected again using the AsReference trick.
Our team is in a tough spot. Our code has come to rely on call chain reentrancy but we'd like to execute code periodically (time span of seconds). Timers seem like a good fit except for the disrespect for reentrancy. We too would like to have the behavior stated in the original post:
We need to make timer callbacks respect reentrancy of grain classes and not interleave for non-reentrant grains.
As others have noted, the AsReference-trick doesn't work if AllowCallChainReentrancy = true (default).
Is there some trick to execute code from an async timer callback and have it not interleave? (While still keeping AllowCallChainReentrancy = true)
Has there been any progress with this? I'm narrowing down a deadlock and the only thing in common is my timer code gets blocked with something else.
@jonas-johansson Is your fork working? What are the effects of your changes?
@Rohansi: We're using my fork in production and it works for our case. The fork allows the call chain reentrance behavior to be configured so that self-reentrancy is not allowed (A-A), while still allowing call chain reentrancy when calling another grain (A-B-A). This has the desired effect that we in a timer callback can execute this.AsReference<IMyGrain>(...).DoSomething() and DoSomething will be enqueued and not interleave.
Task.Run seems to reset the call chain so to workaround this with AllowCallChainReentrancy = true we just need to use the AsReference hack in Task.Run.
private async Task TimerCallback(object arg)
{
await Task.Run(() => this.AsReference<TGrainInterface>().ActualTimerCallback());
}
The actual call will occur on the proper Orleans worker thread still, not the thread pool.
@Rohansi if this works you made my day. After months wrestling with getting timers to behave correctly, this would be finaly a good enough solution for me.
I'm just not sure how it works and if its reliable enough, considering Task.Run is leaving the orleans context and scheduler entirely and I though grain calls wouldn't be possible in this case.
@Rohansi and @Xanatus: When I invoke the method like you did (from the action in Task.Run) I get the following exception:
System.InvalidOperationException: 'Support for accessing the Orleans runtime from a non-Orleans context is not enabled. Enable support using the ISiloHostBuilder.EnableDirectClient() method.'
If I enable DirectClient the desired behavior is accomplished. That's great, but I need to read up on the implications of DirectClient, and why it's disabled by default.
Have you enabled DirectClient @Rohansi?
@jonas-johansson Yeah, I have it enabled. I think OrleansDashboard enables it if anyone is using it.
@jonas-johansson
System.InvalidOperationException: 'Support for accessing the Orleans runtime from a non-Orleans context is not enabled. Enable support using the ISiloHostBuilder.EnableDirectClient() method.'
You need to invoke the grain reference via Orleans Task Scheduler. Task.Run will invoke it in the .Net ThreadPool Task Scheduler context.
i.e. something like that:
// run this inside Orleans Context
var g = grain.AsReference<T>();
// TaskScheduler.Current is Orleans TaskScheduler
var factory = new TaskFactory(TaskScheduler.Current);
factory.StartNew(async () => await action(g));
Since Orlenas timers are run inside Orleans Context, the only problem they don't respect the reentrancy, so this is what I simply usually do:
// inside OnActivateAsync()
_maintenanceTimer = RegisterTimer(OnMaintenance, null, TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(2));
// ...
// note - orleans timer has REENTRANT behavior
private async Task OnMaintenance(object arg)
{
// invoke self to avoid REENTRANT
await this.AsReference<IGameGrain>().Maintenance();
}
// this method inherits from IGameGrain
public async Task Maintenance()
{
//...
}
Note that the reason I use the second approach (without the the factory.StartNew() task), is so timers wont be invoked immediately one after another, in case my Maintenance() took too much time.
I'm awaiting it inside OnMaintenance(), so only when it's done the next timer interval will start.
@shlomiw your approach isn't working with AllowCallChainReentrancy, which is what the discussion was about.
@Xanatus - oh yes, sorry, now I see.
After reading the previous comments - is this true that if you use grain AsReference through the ThreadPool context then DirectClient (if enabled) will invoke the call back on Orleans context?
@sergeybykov - can you please reconsider this issue again? let timers respect the reentrancy? (make it configurable)..
As you can see it's a source of a lot of confusion, unexpected behaviour and weird workarounds.
@shlomiw By "reconsider" do you mean "raise the priority" of this issue?
@sergeybykov I use orleans for over 2 years now and for me timers always have been the number 1 source of problems, making development extremely difficult and hacky for something very basic that really shouldn't be so complicated. Especially now that we don't have a proper workaround anymore, I believe this issue would deserves some top priority.
This thread also makes clear that the issue is neither new, nor rare. It just feels like the issue is beeing ignored for the wrong reasons. ;)
@shlomiw By "reconsider" do you mean "raise the priority" of this issue?
@sergeybykov - oh yes please 馃憤
Can't promise anything in the near term as we have quite a bit of work scheduled already. Contributions, of course, are very welcome.
Just reporting in to say that this issue caused me 2 work days of debugging and testing before I found out this thread, and then found out there's no sensible fix without being hacky, to my sorrow.
Hope this gets fixed soon.
I just wanted to give an update: our team is now running with DirectClient enabled and we're using the trick from @Rohansi:
Task.Runseems to reset the call chain so to workaround this withAllowCallChainReentrancy = truewe just need to use theAsReferencehack inTask.Run.private async Task TimerCallback(object arg) { await Task.Run(() => this.AsReference<TGrainInterface>().ActualTimerCallback()); }The actual call will occur on the proper Orleans worker thread still, not the thread pool.
It works great for us!
yo is there any update about this issue ? I still cannot make it working properly with provided solutions :(
I just wanted to give an update: our team is now running with DirectClient enabled and we're using the trick from @Rohansi:
Task.Runseems to reset the call chain so to workaround this withAllowCallChainReentrancy = truewe just need to use theAsReferencehack inTask.Run.private async Task TimerCallback(object arg) { await Task.Run(() => this.AsReference<TGrainInterface>().ActualTimerCallback()); }The actual call will occur on the proper Orleans worker thread still, not the thread pool.
It works great for us!
yea it works but is it really safe from perfomance perspective :) ?
Just leaving a note that some patterns, e.g. reactive caching / reactive polling, depend on timers being re-entrant, so others calls are not blocked by long-running background tasks. While I generally agree that timers should be non-reentrant by default, if this now becomes true, then please allow some opt-in mechanism for the current always-reentrant behaviour. The current behaviour is very useful in the appropriate context, and allows for very sleek asynchronous algorithms.
Most helpful comment
@sergeybykov I use orleans for over 2 years now and for me timers always have been the number 1 source of problems, making development extremely difficult and hacky for something very basic that really shouldn't be so complicated. Especially now that we don't have a proper workaround anymore, I believe this issue would deserves some top priority.
This thread also makes clear that the issue is neither new, nor rare. It just feels like the issue is beeing ignored for the wrong reasons. ;)