Azure-functions-durable-extension: Include batch position of operation on Entity context

Created on 31 Aug 2020  路  16Comments  路  Source: Azure/azure-functions-durable-extension

I have a scenario where I know there is a high potential for multiple operations to occur within the same small time frame window and will be invoked within the same entity batch.

Right now, after each operation I perform a check against state and make an external api call if the condition passes. One workaround would be to always send a signal to myself to allow the other messages to complete but it would be great if there was an event I could subscribe too after all the operations (batch or not) have been made.

Entity.Current.OnBeforeDeactivated(e => {});

Thanks!

enhancement

All 16 comments

@mpaul31, to clarify your ask, you would essentially like to be able to execute some code after an entire batch is completed (including batches of 1 operation), and our current programming model makes that rather difficult to detect as you only have visibility into operations, and not our batching. Do I understand that correctly?

@sebastianburckhardt, what do you think about this proposed extension to the entity programming model?

@ConnorMcMahon Yes that is correct.

@mpaul31

hey, can you provide a little info on what your problem is?

For example, is the problem that the operations are happening so frequently (because they鈥檙e happening in a single batch) that it鈥檚 putting pressure on the external API, which is causing back-pressure on the Entity itself causing slow downs/timeouts on the batch?

Or are you looking for a way to distribute these API calls over a wider time frame in order to allow other operations on the Entity (or other Entities) to interleave?

@olitomlinson your second example. we have a service bus trigger that is extremely chatty and non session aware so i鈥檓 trying avoid sending multiple api requests within seconds (even milliseconds) of each other.

@mpaul31 sorry, is the SB trigger what is feeding signals to your Entity? Or is the Entity operation itself pushing messages onto SB, which is then in turn making calls to the external API?

yes feeding the signals

@mpaul31

just so I鈥檓 crystal clear, is your objective either or both of these

  • to call the external API less frequently because it鈥檚 causing problems for the API
  • to call the external API less frequently to free up bandwidth in the DF app for processing other Entity Operations for this Entity, or other Entities

@olitomlinson its kind of both. we are currently in pilot mode and the third-party vendor has not mentioned this is a problem but did bring it up during a call regarding the metrics they are seeing and so i was wanting to prepare myself in case this gets escalated somehow.

you could also consider this a pre-optimization but i was thinking using some sort of buffer/delay would be an easier solution to transition too rather than implement service bus batching (kind of a pain managing the completion checkpoints) and exposing batching operations on my entities.

having a hook on the durable entity life cycle events would be more of a low-level solution but would handle the majority of the cases.

does this make sense?

While I understand the desire for this feature, I am not sure Entity.Current.OnBeforeDeactivated is the way to go. It seems slightly complicated and there is a risk of misunderstanding the semantics (e.g. what exactly are the guarantees on this call, such as under failures? What if I call this multiple times?).

What I would suggest is that we simply expose the current position of the batching loop which then allows you to implement your own mechanism. For example, we can add two fields Entity.Current.BatchSize and Entity.Current.BatchPosition. Then you can check if the current operation is the last one in the batch:

if (Entity.Current.BatchPosition == Entity.Current.BatchSize - 1)
{
    // perform this only for the last operation of the batch
}

Would this address your situation?

@sebastianburckhardt yes that would be a very beneficial for scenarios like mine.

thanks!

@olitomlinson would still love to hear your thoughts on this because I'm pretty sure you were going somewhere with this :)

@mpaul31 I like the solution which @sebastianburckhardt provided.

My only concern is that if the underlying implementation changed which made batchSize smaller due to optimisations, would that have a knock-on affect to you? It seems like throttling requests to the 3rd party is one of your biggest concerns, so using this mechanism would be a pragmatic, but implicit solution.

If you truly want to protect that 3rd party API from too many requests in a given interval, then you may have to handroll something and create an explicit solution, which may be difficult without blocking the entity.

(I've used a singleton RateGate with great success to present flooding 3rd Party APIs, but this is going to block any other signals that are queuing up on your entity, which is probably not a good idea.)

Ideally you need a pull model, which runs on a schedule and can grab batches of work. this could be implemented using entity self-scheduling but if your use case needs to call this API with very specific short intervals, then Entities may not be able to provide that at scale.

Does your Entity depend on the result and/or response bodyfrom the API? or is it a 202 API?

If its a 202 API, this gives you a bit more freedom to offload these requests to something that doesn't run in the context of the Entity, where you are able to do your batching process without blocking the Entity. If the 3rd party API is giving you throughput limitations that are constrained by some logical key, then I can recommend Service Bus Demultiplex pattern for shaping requests to 3rd Parties with this in mind

@olitomlinson this gives me some other great ideas thanks for the feedback!

one other solution i am tinkering with since what @sebastianburckhardt recommended is not baked in yet is to send a self scheduled message with a small time delay after i update the entity state and increment a version. then when the scheduled message is received i can do a quick check and see if the expected version matches the version that is currently stored in the state. if it is the expected i can just fire off the call to the 3rd party api or i noop knowing there is a tailing messages right behind.

this does add one extra message and a small bit of latency to the flow but it seems straight forward.

@mpaul31 yes this would work as long as your use-case isn't demanding of short intervals between calls the 3rd party API :)

I'm going to change the name of the ticket to reflect the current proposal.

This is now implemented.

Was this page helpful?
0 / 5 - 0 ratings