Azure-functions-durable-extension: [Feature Request] Add support for pausing orchestration functions

Created on 31 Jul 2018  路  2Comments  路  Source: Azure/azure-functions-durable-extension

It would be awesome if there was a built-in way of pausing long-running orchestration functions.

Our orchestrations consist of a series of activity functions that represent steps in a workflow. Currently we listen for a "Pause" external event when executing each of these activities, and then we wait for a "Continue" external event if the "Pause" is received before the activity finishes running. This works, but there is a lot of additional code to implement these custom events and to report the state of the workflow (e.g. paused versus executing) to the caller.

It would be great if pausing workflows came out of the box, including a "Paused" run-time status. This would be helpful for use-cases like ours where there is a long-running workflow that other services or users may want to pause and resume at a later point (e.g. low-pri data backup, auditing, safety checks, etc.).

Thank you!

breaking-change design-proposed enhancement help wanted

All 2 comments

Thanks Daniel for the suggestion. I agree this would be extremely useful. Unfortunately the Durable Task Framework (DTFx) doesn't really have a concept for this, so implementing it would either require us to make some big changes to DTFx or we would have to introduce a new abstraction which basically does the same thing you're currently doing under the covers (and we'd love to know more about your manual solution for this if you're willing to share).

I don't think we'll be able to fit this in in the next few months, but we'll definitely keep this on the backlog as I can see more people will likely ask for something like this in the near future.

FYI @jeffhollan, I've received requests for this kind of feature internally as well. Some design notes on how I think this could be implemented (would love it if we could get this as a community contribution):

Overview

Orchestrations will support two new management operations, Pause and Resume (note: I like _Resume_ better than _Continue_ because _Continue_ sounds too passive IMO and could be confusing). For simplicity and ease of maintenance, I recommend that this feature is implemented in the Durable Extension layer (i.e. this GitHub repo) rather than in the Durable Task Framework.

Pausing and Resuming

Pausing and resuming are done using new management APIs that will be exposed by IDurableOrchestrationClient and as HTTP APIs. Orchestrations can also pause themselves directly via IDurableOrchestrationContext.PauseAsync. A reason parameter can be specified, and that text data is written to the tracking logs that go to Application Insights.

Internally, pausing and resuming actions are implemented using special external events. These events could be distinguished from normal events using some naming convention - e.g. events named reserved:PauseOrchestration and reserved:ResumeOrchestration. Note that this is potentially a breaking change if any customers are using this naming already today. Note that the Durable Task Framework will not know the difference between user-defined external events and these new "system"-defined external events.

Runtime Behavior

While paused, orchestrations will continue to execute replays whenever a message is received. However, the orchestration will not take any _new_ actions until it is resumed. Specifically, any call to an async method (e.g. CallActivityAsync) will be internally gated by an AsyncManualResetEvent. When the orchestration is paused, these gates will asynchronously block calls to the underlying DTFx APIs. When a _resume_ event is received, all blocked async calls will be resumed. This is done by signaling the AsyncManualResetEvent from the TaskOrchestrationShim.

Note that the internally implementation of WaitForExternalEventAsync<T> behaves in a similar way, except that it uses TaskCompletionSource<T> instead of an AsyncManualResetEvent. The latter is necessary so that multiple async actions can be resumed by a single _Resume_ event.

Runtime Status

A new Paused runtime status will be added to indicate that an orchestration is paused. Since the Durable Task Framework is not aware of this status, it will need to be saved as part of the CustomStatus field. This is a breaking change because CustomStatus will need to be overloaded to support both user-defined custom status and system-defined (e.g. the Durable extension) custom status.

When querying for status, users would see the user-defined custom status but not the system-defined custom status. This will work for both single instance query and multiple instance query. The Durable CLI commands would also need to be updated to understand this custom protocol (the Durable CLI commands currently call the DTFx storage layer directly).

Was this page helpful?
0 / 5 - 0 ratings