Azure-functions-durable-extension: Add support for interfaces in ReadEntityStateAsync()

Created on 3 Oct 2019  路  22Comments  路  Source: Azure/azure-functions-durable-extension

Currently durable API with entities seems to put developers on a CQRS pattern.
Commands can be signaled (1-way communication), orchestrators can call entities to update their state still async, but in a 2-way communication.

To query data from entities in a two way communication, but still async (using Tasks<T>) can be done through orchestrators or with ReadEntityStateAsync method from non-orchestrators.
Using orchestrators we get to a pull model, someone at the very end will need to check the state of the the orchestration result.
The other option is using ReadEntityStateAsync. It feels this method is exposing the internals of an entity. Instead of doing this, I would like a way an entity may provide a method to transform the state before it is being returned.

It feel today this can be achieved with an additional function (to read the entity's state, transform it before being returned), but I think this could be part of the Entity function's definition.

enhancement needs-discussion

Most helpful comment

So actually 'transmitting' the state to the call side is more efficient with one deserialize less?

Yes, exactly. This is definitely true for the current Azure.Storage backend.

is there any optimization to use some sort of a local cache?

Extended sessions provide some level of caching; but I think there is quite a bit of room for improvement. I believe for entities in particular, there is still a fair amount of storage traffic and serialization/deserialization that could be optimized.

All 22 comments

/cc @sebastianburckhardt

Very similiar, if not the same, as my ask here https://github.com/Azure/azure-functions-durable-extension/issues/885#issuecomment-519610498 - A way to run some logic and project the response would certainly be desirable, rather than bleeding out the entire state to the caller.

I agree that it seems "wrong" to expose the internal representation of the entity state. It breaks encapsulation, which goes against established software engineering principles.

However, we don't really prevent users from encapsulating the state. For example, consider the counter entity example. One can use a private field for the current value.

public class Counter
{
    [JsonProperty("value")]
    private int CurrentValue { get; set; }

    public void Add(int amount) => this.CurrentValue += amount;

    public Task<int> Get() => Task.FromResult(this.CurrentValue);

    [FunctionName(nameof(Counter))]
    public static Task Run([EntityTrigger] IDurableEntityContext ctx)
        => ctx.DispatchAsync<Counter>();
}

Now, on the client, we have to use the Get() method to be able to read the current value of the counter:

var counter = await client.ReadEntityStateAsync<Counter>(entityId);
return counter.Get();

What we don't currently have is support for using just the interface, as in:

var counter = await client.ReadEntityStateAsync<ICounter>(entityKey);
return counter.Get();

were the interface is defined as

public Counter : ICounter {
   ...
}

public interface ICounter {
    void Add(int amount);
    Task<int> Get();
}

Perhaps it would be worth adding support for this since it provides stronger encapsulation (via interface) than the private field.

I would be happy with that approach. And as I understand this would be executed on the caller side, something to pay attention for.

@sebastianburckhardt I like the solution and it would be a great start!

I have a use-case where the entity state size could be considerable. Imagine a List<string> with a few thousand items and growing, but in a certain scenario I only need to expose the last entry in the List to the caller through a GetLastEntry() method.

I'm assuming this method you propose would serialise the entire state to the caller?

Yes, the entire state is transmitted every time.

Perhaps it is worth noting here that working with large entity states can be problematic (from a performance viewpoint) not just on clients, but in general: our current implementation is not optimized to deal with large states. In particular, what happens is that every time a batch of operations is ready to process, the state gets loaded/deserialized (before running the operations) and then serialized/stored (after running the operations).

So that means even if GetLastEntry() would run on an entity side, the whole state would be serialized/deserialized by the entity. So actually 'transmitting' the state to the call side is more efficient with one deserialize less?

On the other side, is there any optimization to use some sort of a local cache/file on the "serverless" machine running the entity? Is there a machine affinity to run the same functions on the same machine?

So actually 'transmitting' the state to the call side is more efficient with one deserialize less?

Yes, exactly. This is definitely true for the current Azure.Storage backend.

is there any optimization to use some sort of a local cache?

Extended sessions provide some level of caching; but I think there is quite a bit of room for improvement. I believe for entities in particular, there is still a fair amount of storage traffic and serialization/deserialization that could be optimized.

@sebastianburckhardt Thank you for the warning on large entity states. This is certainly something that I shall be load testing for suitability when you GA.

What are the concurrency characteristics of Entities in the current implementation? I'm comfortable that I understand how Orchestrations achieve a level of concurrency through the partitionCount property. Does the same concurrency characteristics apply to Entities?

Does the same concurrency characteristics apply to Entities?

Yes, it uses the exact same mechanism. In particular, for scaling out, you may want to increase the partitionCount.

Is there a way to assure that Read Operations does modify the entity state? Having an entity:

public class Counter
{
    [JsonProperty("value")]
    private int CurrentValue { get; set; }

    public Task<int> Get() => Task.FromResult(++this.CurrentValue);

    [FunctionName(nameof(Counter))]
    public static Task Run([EntityTrigger] IDurableEntityContext ctx)
        => ctx.DispatchAsync<Counter>();
}

Calling the get from another function:

var counter = await client.ReadEntityStateAsync<Counter>(entityKey);
return counter.Get();

This would not update the state of the Counter entity (or at least it would not persist), right?

I seems to end up creating Orchestrator functions so that I can coordinate two entities. This seems to make my functions extra chatty, I really would like to have an entity make 2-way calls. This also makes me create some boilerplate code.

To give it a context:
I have two different Entity types (Entity1 and Entity2). And an update operation, which would belong to Entity2. The update operation shall call Entity1.Update() and based on its result call update its own state.

In function I am using an Orchestrator which will call Entity2.Update1(p1) and Entity2.Update2(p2, p3), but I also need to define new types for operation parameters etc.

Yes, what you get on the client is not the actual entity, just a snapshot of some past state of the entity. If you are modifying that state, you are just modifying the object in memory, it has no other effect.

By design, what you can do inside clients and entities is very limited. You really need to use orchestrators to do multi-step or multi-object things. This is a design invariant; though it may be possible to figure out ways to reduce the boilerplate overhead of defining orchestrations.

BTW, you may want to use C# tuples for operation arguments to reduce the boilerplate. It helps a little.

1, Is there a way so that Entities could expose GET operations (or Read from CRUD) through an HTTP API instead of the current ReadEntityStateAsync?

2, This might worth its own github question: is there a way to replace Newtonsoft.Json serialization to custom or say System.Text.Json? (and I mean for Entities, not for the whole Durable Extensions)

I would also like if there was a proxy when calling an entity. Currently it is

var entityKey = new EntityId(nameof(MyEntityActor), actorId.ToString());
await _context.CallEntityAsync(entityKey, nameof(MyEntityActor.Operation), optionalArguments);

and a proxy could be

var entityKey = new EntityId(nameof(MyEntityActor), actorId.ToString());
IMyEntityActor proxy = _contenxt.CreateProxy<IMyEntityActor>(entityKey);
await proxy.Operation(optionalArguments);

This could cleanup input arguments and code as well (at least on the call site). What do you think @sebastianburckhardt ?

We already support the use of entity proxies from within orchestrations. Are you specifically asking for the same functionality on clients? As stated, this is not currently possible - we know it is a desirable feature, but the implementation simply does not have the internal wiring required to send response messages to the client. It would require somewhat substantial work to add that, so it probably won't be supported in the near future.

If you want the top experience (full consistency & type-checked access via interfaces for both reading and updating) you will need to use orchestrations to access the entities, for now.

Ah sorry I meant the last example with the proxy from orchestrator functions.
Ah nice, just seeing it:

context.CreateEntityProxy<T>()

Having a limited number of input arguments seems to be doable if the proxy wraps input args and DispatchAsync unwraps them. Is that is something doable, you are looking into it? (or it is something that is left for us to extend the code with)

I would recommend just using a C# tuple type for the input. Then the C# compiler already does all the work. Note that the tuple can have named and typed components, mimicking multiple arguments. For example:

public void SomeOperation((int a, int b) input)
{
    var sum = input.a + input.b;
}

and you can call it like this:

proxy.SomeOperation((5, 8));

Yep, using that currently.

When MyEntity type has a ctr and using ctr dependency injection, the ReadEntityStateAsync will not do the dependency injection.
ReadEntityStateAsync<MyEntity>(new EntityId(nameof(MyEntity), id));

Is there a suggested workaround?

I worked around this by moving my state properties to a base class and using ReadEntityStateAsync<MyEntityState>

@ladeak

I'm going to limit this issue to talking about allowing ReadEntityStateAsync with interfaces, as that solution seems to address the initial question.

I am opening a separate issue for the dependency injection with ReadEntityStateAsync. I'm not positive that this is something we could enable support for, but at the very least we should document it better.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mark-szabo picture mark-szabo  路  3Comments

SayusiAndo picture SayusiAndo  路  3Comments

cgillum picture cgillum  路  3Comments

danielearwicker picture danielearwicker  路  3Comments

shibayan picture shibayan  路  3Comments