Orleans: Invoke Grain's Method in 2 Threads,does the Method need Lock?

Created on 25 Apr 2016 · 26Comments · Source: dotnet/orleans

GrainA and GrainB

GrainA has a Method like this:

public Task addFriend(int id)
{
.....
m_friendList.Add(id);
......
}

Client invoke the interface like this:

var ga = GrainClient.GrainFactory.GetGrain< IGrainA >(0);
ga.addFriend(111);

GrainB has a Thread like this:

void Thread1()
{
while(true)
{
...
var ga = GrainFactory.GetGrain< IGrainA >(0);
ga.addFriend(112);
...
}
}

GrainA and GrainB running in a silo，does the Method addFriend running in 2 Threads or not ？

thank you very much ！

question

Source

liuchangtao

Most helpful comment

For that case you can indeed use a thread in the thread pool. That would require synchronization between the grain code and the thread code, just like you have originally asked.
But a much better approach is to break this computation into smaller chunks and do all in the grain context. See more details in the last link I shared above.

gabikliot on 25 Apr 2016

🎉1 👍1

All 26 comments

Grains don't have threads. Grain can only implement grain methods or subscribe to timers callbacks or stream callbacks.

gabikliot on 25 Apr 2016

http://dotnet.github.io/orleans/Getting-Started-With-Orleans/Grains

gabikliot on 25 Apr 2016

In my Grain ,I have a job List to process,I try to use Timer,but I Found it block the main Thread which will Add the Responds time of other Method called from Client.
so I start a Thread when Grain Activate,and use the Thread do the job.
can I ?

liuchangtao on 25 Apr 2016

thank you,i see~~

liuchangtao on 25 Apr 2016

No, you should not start arbitrary threads in the grain. As long as you use asynchronous io everywhere in the grain, you should be fine. The only exclusion is if you have a long running (like milliseconds long) CPU intensive operation. Is that your case?

gabikliot on 25 Apr 2016

I have a long cpu operation may be cause 100 ms,so I use sleep(100) to simulate in a timer that set 50ms ,it will delay the next timer to come and i guess that will delay client invoke too.
so I start a new thread to doing such hardwork

liuchangtao on 25 Apr 2016

Never use thread. Sleep, use task.await instead, as it blocks the thread. I would start without any special tricks and measure/evaluate with real workload. You might not have any problem. But if you have to, here are the details of how to offload to thread pool: http://dotnet.github.io/orleans/Advanced-Concepts/External-Tasks-and-Grains

gabikliot on 25 Apr 2016

i mean that:

grain code like this:

class GrainA : IGrainA,Grain
{

public override async Task OnActivateAsync()
{
...
//100ms
RegisterTimer(Timer,this,new TimeSpan(0),new TimeSpan(1000*1000)
...
}
public Task< int > getValue()
{
 ....
}

public Task Timer( object obj)
{
   ...
    doingsometing();//need about 100ms
  ...
}

}

client code like this:

var ga = GrainClient.GrainFactory.GetGrain< IGrainA>(0);
var aaa = ga.getValue();

I found that when I call getValue(),I cannot get the response immediatly,
so i guess,in the grain side,code run as this:

{
...
Timer(...);//need 100ms
getValue();
Timer(...);//need 100ms
...
}

so , I need a thread to do the Timer()'s work.i Need other Method respond as soon as possible.

liuchangtao on 25 Apr 2016

What we do in order to offload "generic" work in Orleans:

[Unordered]
public interface IOffloadProcessingGrain : IGrainWithIntegerKey
{
    Task Do(Immutable<Func<Task>>> taskFunc);
}

[StatelessWorker, Reentrant]
public class OffloadProcessingGrain : Grain, IOffloadProcessingGrain 
{
    public Task Do(Immutable<Func<Task>>> taskFunc)
    {
        return taskFunc.Value();
    }
}

Essentially, it queues a Task on the Orleans task scheduler. Keep in mind that when you call it, you don't 'await' the result and the passed lambda must not throw. so in your case:

public Task Timer( object obj)
{
   ...
    GrainFactory.GetGrain<IOffloadProcessingGrain>(0).Do(new Immutable(
    ()=>{
            doingsometing();//need about 100ms;
            return TaskDone.Done;
        }));
   ...
}

I hope this solved your issue.

shayhatsor on 25 Apr 2016

🎉1 👍1

gabikliot on 25 Apr 2016

🎉1 👍1

Re Shay idea: you do not want to execute any code that can block CPU for 100milis on Orleans threads. That will quickly block and exhaust them all. Off load to thread pool instead.

gabikliot on 25 Apr 2016

To second @gabikliot's point, it is not technically illegal to execute a 100ms CPU intensive task on a grain thread. However, by doing that you are trading responsiveness and stability for simplicity. With 10 requests per second per core, you can easily block all grain threads, and get an unresponsive system.

At the same (or even higher) request rate but with the CPU intensive tasks offloaded to the thread pool the system will stay fully responsive even if you overload the thread pool with too many workitems and some requests start timing out.

Even if today the code runs reasonable fine with the request rate it handles, somebody may later changes the code, legitimately or by mistake, that would increase the task execution time from 100ms to say 500ms. With the offload approach everything will be fine with requests timing out when the rate of them exceeds the CPU capacity. The system with grain threads executing the long running tasks will suddenly become unresponsive or unstable.

sergeybykov on 25 Apr 2016

@gabikliot: How would breaking up the work into smaller pieces that run in the grain's context help? The grain would still be busy running all those steps and won't be responsive to other requests.

Allon-Guralnek on 25 Apr 2016

@Allon-Guralnek Between those smaller pieces of execution the grain would give up its thread back to the scheduler, which will allow other requests to be executed in between for better responsiveness.

sergeybykov on 25 Apr 2016

@sergeybykov: From my understanding of the 'await' semantics, if an awaited task is completed synchronously (i.e. there was no I/O), then the thread is not relinquished back to the scheduler, instead, the execution continues on the same thread. This means that if all the smaller pieces were CPU-only work, then other requests wouldn't be allowed to execute in between because the whole thing would be executed synchronously, despite having 'await' boundaries.

Allon-Guralnek on 25 Apr 2016

@gabikliot and @sergeybykov, I agree with your points. From my understanding of the question, "doingsometing()" doesn't run frequently. We use this solution to run <1ms CPU bound work, so in our case it yields higher throughput than offloading to the thread-pool because of less context switches. In this case, and as a rule of thumb offloading to the threadpool is the right solution.

shayhatsor on 25 Apr 2016

@Allon-Guralnek I don't believe that's how it works with a custom scheduler like ours.

sergeybykov on 25 Apr 2016

@sergeybykov: I don't think you're correct. From what I remember it has to do with the compiler-generated/awaiter. If the task returned is completed, the TaskScheduler is not even involved.

Even if you are correct, no tasks would be inserted between the smaller pieces unless the grain was marked as Reentrant (which is against your recommendation). Other grains would be responsive, but that grain would still appear to stall.

Allon-Guralnek on 25 Apr 2016

@Allon-Guralnek We override TryExecuteTaskInline in https://github.com/dotnet/orleans/blob/master/src/OrleansRuntime/Scheduler/ActivationTaskScheduler.cs#L80, and hence I believe we get called before a task would get executed inline. It would be an interesting test to verify the actual behavior.

Even if you are correct, no tasks would be inserted between the smaller pieces unless the grain was marked as Reentrant (which is against your recommendation). Other grains would be responsive, but that grain would still appear to stall.

Yes, the immediate goal here is for other grains and system targets to stay responsive.

sergeybykov on 25 Apr 2016

Reentrant is legit settings, when needed. A lot of times it's not needed and then non reentrant is better, but when it's needed - no problem.

Our scheduler guarantees FIFO. Pretty strong guarantee for any scheduler BTW. Fifo between any future non inlined tasks and new calls.

In this case I meant to break the timer callback to do 1/10 of work but 10 times more frequently. Of course not just do 10 inlined awaits.

gabikliot on 25 Apr 2016

@gabikliot: Basically your suggestion is to implement preemptive multitasking by yourself. For a simple algorithm, it might be possible. But for anything even slightly complex it would probably entail converting the algorithm into a state machine, and then the timer would step the state machine to the next step at 10 times the rate, where each step would execute 1/10 of the work. This sounds like an enormous undertaking which probably is not worth it just to stay withing the Grain's context.

Allon-Guralnek on 25 Apr 2016

@Allon-Guralnek is right. @sergeybykov look here:
https://books.google.co.il/books?id=EvXX03I5iLYC&lpg=PA593&ots=jGhe0DzNA3&dq=compiler%20optimized%20c%23%20await%20completed%20task&pg=PA592#v=onepage&q&f=false

shayhatsor on 25 Apr 2016

@sergeybykov: I just did a small test and it doesn't appear that overriding TryExecuteTaskInline would not help in the case of awaiting a completed task. I took the simple LimitedConcurrencyLevelTaskScheduler from this MSDN page and modified it to show any use of QueueTask and TryExecuteTaskInline (via Console.WriteLine). When awaiting a completed task, neither QueueTask nor TryExecuteTaskInline is called.

I've created a full repro for you: https://gist.github.com/Allon-Guralnek/d4cd6e4a3dc5439dd4d004385461dc90

Furthermore, you can see clearly see in the generated IL that the compiler emits code that checks to see if the the task is completed, and if it is, it doesn't involve the TaskScheduler at all (this is an important optimization introduced by the async/await feature):

IL_001A:  ldstr       "=== Awaiting a non-completed task ==="
IL_001F:  call        System.Console.WriteLine
IL_0024:  nop         
IL_0025:  ldc.i4.s    64 
IL_0027:  call        System.Threading.Tasks.Task.Delay
IL_002C:  callvirt    System.Threading.Tasks.Task.GetAwaiter
IL_0031:  stloc.1     
IL_0032:  ldloca.s    01 
IL_0034:  call        System.Runtime.CompilerServices.TaskAwaiter.get_IsCompleted
IL_0039:  brtrue.s    IL_007E
IL_003B:  ldarg.0     
IL_003C:  ldc.i4.0    
IL_003D:  dup         
IL_003E:  stloc.0     
IL_003F:  stfld       UserQuery+<Do>d__1.<>1__state
IL_0044:  ldarg.0     
IL_0045:  ldloc.1     
IL_0046:  stfld       UserQuery+<Do>d__1.<>u__1
IL_004B:  ldarg.0     
IL_004C:  stloc.2     
IL_004D:  ldarg.0     
IL_004E:  ldflda      UserQuery+<Do>d__1.<>t__builder
IL_0053:  ldloca.s    01 
IL_0055:  ldloca.s    02 
IL_0057:  call        System.Runtime.CompilerServices.AsyncTaskMethodBuilder.AwaitUnsafeOnCompleted
IL_005C:  nop         
IL_005D:  leave       IL_0134
IL_0062:  ldarg.0     
IL_0063:  ldfld       UserQuery+<Do>d__1.<>u__1
IL_0068:  stloc.1     
IL_0069:  ldarg.0     
IL_006A:  ldflda      UserQuery+<Do>d__1.<>u__1
IL_006F:  initobj     System.Runtime.CompilerServices.TaskAwaiter
IL_0075:  ldarg.0     
IL_0076:  ldc.i4.m1   
IL_0077:  dup         
IL_0078:  stloc.0     
IL_0079:  stfld       UserQuery+<Do>d__1.<>1__state
IL_007E:  ldloca.s    01 
IL_0080:  call        System.Runtime.CompilerServices.TaskAwaiter.GetResult
IL_0085:  nop         
IL_0086:  ldloca.s    01 
IL_0088:  initobj     System.Runtime.CompilerServices.TaskAwaiter

As you can see, using the await keyword caused the compiler to emit the above IL. The important part is that it checks that IsCompleted is true (at IL_0034), if it jumps past the whole TaskScheduler part (AwaitUnsafeOnCompleted) if it is true (at IL_0039).

Allon-Guralnek on 25 Apr 2016

Not sure what you guys are arguing about. How Orleans scheduler handles inlined tasks is interesting but totally unrelated topic to this issue. This issue author had a problem with long running CPU intensive tasks and I gave him a couple of ways to solve it. Inlined tasks were not part of any of my suggestions.

If you are interested in discussing how inlined tasks work in Orleans, you can open a new issue. Let's not confuse the author of that one.

gabikliot on 25 Apr 2016

@Allon-Guralnek @shayhatsor Thanks for pointing this out and the repro.

@gabikliot You are right - this is tangential at best to the original question of this issue.

sergeybykov on 25 Apr 2016

thanks a lot for every one~~~

liuchangtao on 26 Apr 2016

Was this page helpful?

0 / 5 - 0 ratings