Orleans: Add locking support similar to Azure durable functions 2.0

Created on 18 Jul 2019 · 6Comments · Source: dotnet/orleans

Add locking support similar to one described here https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-preview#locking-entities-from-orchestrations that I copy & pasted below

This would be really helpful for avoiding raced conditions and should scale better that going through single grain to achieve the same.

Locking entities from orchestrations
Orchestrations can lock entities. This capability provides a simple way to prevent unwanted races by using critical sections.

The context object provides the following methods:

LockAsync: acquires locks on one or more entities.
IsLocked: returns true if currently in a critical section, false otherwise.
The critical section ends, and all locks are released, when the orchestration ends. In .NET, LockAsync returns an IDisposable that ends the critical section when disposed, which can be used together with a using clause to get a syntactic representation of the critical section.

For example, consider an orchestration that needs to test whether two players are available, and then assign them both to a game. This task can be implemented using a critical section as follows:

[FunctionName("Orchestrator")]
public static async Task RunOrchestrator(
    [OrchestrationTrigger] IDurableOrchestrationContext ctx)
{
    EntityId player1 = /* ... */;
    EntityId player2 = /* ... */;

    using (await ctx.LockAsync(player1, player2))
    {
        bool available1 = await ctx.CallEntityAsync<bool>(player1, "is-available");
        bool available2 = await ctx.CallEntityAsync<bool>(player2, "is-available");

        if (available1 && available2)
        {
            Guid gameId = ctx.NewGuid();

            await ctx.CallEntityAsync(player1, "assign-game", gameId);
            await ctx.CallEntityAsync(player2, "assign-game", gameId);
        }
    }
}

Within the critical section, both player entities are locked, which means they are not executing any operations other than the ones that are called from within the critical section). This behavior prevents races with conflicting operations, such as players being assigned to a different game, or signing off.

We impose several restrictions on how critical sections can be used. These restrictions serve to prevent deadlocks and reentrancy.

Critical sections cannot be nested.
Critical sections cannot create suborchestrations.
Critical sections can call only entities they have locked.
Critical sections cannot call the same entity using multiple parallel calls.
Critical sections can signal only entities they have not locked.

Source

wanton7

Most helpful comment

FYI, the DF implementation uses persistent, reliable queues for all messaging. This simplifies the programming model, at the expense of performance - all messages between entities and/or orchestrations are durably persisted by a queue which incurs significant latency and throughput cost. The benefit is that the execution is fully reliable, in terms of both state and messages, and messages can be guaranteed to be delivered exactly-once and in-order.

With this type of reliable execution in place, critical sections are not so hard to implement. Like @sergeybykov says, it just requires a round of coordination at the beginning, i.e. going around and locking all participants, and then sending release messages at the end of the critical section. This is not particularly fast. But it is pretty simple.

I suppose one could do something similar in Orleans, e.g. by using a reliable stream provider to implement exactly-once messaging. However, I would expect that most of the time it would be more appropriate to just use the Orleans support for distributed transactions.

sebastianburckhardt on 23 Jul 2019

👍3

All 6 comments

I'm not I see how this pattern can work well in a distributed setting. If the orchestrator and the two player grains reside on different silos, obtaining a lock on the players would require a round of coordination between the three nodes and one of them keeping the lock while the operation is going and reliably releasing it at the end. I think that would be expensive and complicated to get right, considering all possible failure modes.

sergeybykov on 20 Jul 2019

Does this mean it also doesn't work well for durable functions virtual actors? Or is there implementation difference between Orleans and Azure Durable Functions 2.0 that makes it worse for Orleans?

wanton7 on 20 Jul 2019

I'm not familiar with how DF are exactly implemented. So it's hard for me to tell. If all three entities were held within a single process, then it wouldn't be as much of a problem.

sergeybykov on 21 Jul 2019