Azure-functions-durable-extension: Recover from accidental deletion of Task Hub components

Created on 16 Nov 2020 · 3Comments · Source: Azure/azure-functions-durable-extension

We've been using Durable Functions to build human interaction workflows into our products and it works wonderfully. We have recently started to pay more attention to disaster recovery aspect and had our first successful practice run last week. We now look into some recovery scenarios that cause by human errors.

Let's say we're having a lot of inflight orchestrations that waiting for human approval and we accidentally delete some components of the Task Hub logical container including the queues, the tables and the blobs. Would it be possible to recover from that? I think the question really is how to protect against resource deletion which is something we've been doing by enable soft delete on blob container as well as apply resource lock, however seems like individual queues or tables can still be deleted.

If my question is out of place please feel free to redirect me to a better place to ask.

Thanks,

question

Source

anhhnguyen206

Most helpful comment

Unfortunately, I don't know of any "soft-delete" capability for Azure Storage queues or tables, but I will ask around to see if I can get any more concrete answers regarding that.

In this case, I would follow @olitomlinson's recommendation of trying to prevent these human errors as opposed to trying to recover once they have happened.

That being said, if your orchestrations are entirely idempotent, you could theoretically store each orchestration's inputs as a blob as the first activity in your orchestration, and only delete them as the last activity of your orchestration. Then, in a catastrophic human error event, you could iterate through all of these blobs and start a new orchestration instance with the same inputs.

ConnorMcMahon on 16 Nov 2020

👍2

All 3 comments

hey @anhhnguyen206

I think this is something that you need to solve at the Azure Resource level. A possible solution is to use RBAC in your environments, in order to minimise the footprint of users who are able to run delete operations against the queues.

https://docs.microsoft.com/en-us/azure/storage/common/storage-auth-aad#azure-built-in-roles-for-blobs-and-queues

olitomlinson on 16 Nov 2020

👍2

Unfortunately, I don't know of any "soft-delete" capability for Azure Storage queues or tables, but I will ask around to see if I can get any more concrete answers regarding that.

In this case, I would follow @olitomlinson's recommendation of trying to prevent these human errors as opposed to trying to recover once they have happened.

ConnorMcMahon on 16 Nov 2020

👍2

@olitomlinson @ConnorMcMahon thanks for all the good suggestions.

I think I'm going with @olitomlinson suggestion to reduce the number of people who has high access level to those critical storage resources.

@ConnorMcMahon, we're also doing that currently too - basically we're taking snapshot of the workflow state every time it moves to a new state and save that snapshot to the blob, then purge it once the workflow completes. We can that restart the workflow and get it back to its current state by using the snapshot.

anhhnguyen206 on 17 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings