Azure-functions-durable-extension: Expected behavior with multiple eternal instances

Created on 19 Dec 2019  路  5Comments  路  Source: Azure/azure-functions-durable-extension

Description

I have certain events in my system that have to be executed periodically at a specific time. I create an eternal orchestrator instance for each such event. Each instance does what it has to do with that event with various activity functions and then I create a timer and put the instance to sleep based on the frequency with which that event has to be executed. This continues indefinitely until some events are terminated or completed.

My issue is related to scaling if let's say I have ten thousand or a million instances of such an orchestrator function that sleeps for more than 99% of the time. Does a sleeping (awaiting a timer) function impact performance in the executing container? Is some state about the sleeping instances held in the memory of the executing container at all times? Will the GC container eventually clean-up the information about the sleeping instances until it's their time to run again? Are CPU cycles wasted when an instance is sleeping?

Is this usage of eternal orchestrators an anti-pattern and it's not intended?

Relevant source code snippets

// Get event from DB
var foobar = await context.CallActivityWithRetryAsync<Foobar>("GetFoobar", retryOptions, foobarId);

// Execute Foobar event
await context.CallActivityWithRetryAsync("ExecuteFoobar", retryOptions, foobarId);

// Get next run time of the event
var nextRunTime = await context.CallActivityWithRetryAsync("GetNextExecutionTime", retryOptions, foobarId);

// Sleep orchestrator instance until it's time to run again
await context.CreateTimer<object>(nextRunTime , null, default);

context.ContinueAsNew(foobarId);
question

All 5 comments

Hi @underscoreHao, great questions.

An orchestration that is awaiting on _any_ durable task, whether it is a timer (your case), an external event, or a response from an activity function call is completely removed from memory and does not occupy any CPU*. Having millions of these in an awaiting state should result in zero overhead, as long as they don't all start executing at the same time. :) Your current usage of eternal orchestrators is perfectly valid and encouraged.

The durable timer is implemented as a scheduled (invisible) message in Azure Storage. The message becomes visible at the time you specified (nextRunTime). Only when it is visible do we load your orchestration again and start using CPU and memory.

*_The slight exception to this is when you use extended sessions, which will keep an orchestration around in memory for a limited period of time before unloading it._

Let me know if this answers your question, or if you have any other questions about this.

Hi @cgillum that answers my question, thanks a lot!

@underscoreHao maybe this discussion is important for you to know, since you plan to use timers: https://github.com/Azure/azure-functions-durable-extension/issues/14

It seems that for now, you will be constrained to use a maximum of 72 hours (or 7 days, not sure) for each new timer. Maybe I'm wrong and there is a internal workaround for this already.

https://github.com/Azure/azure-functions-durable-extension/pull/1100 would close this issue but there were some problems that prevented it going further

@ltouro I鈥檓 aware of the limitation and there鈥檚 a workaround/hack provided by Mark Heath. In my case the timers can go well beyond 7 days.

Essentially if an event has to be executed more than 7 days in the future I wake up the function and create a new timer. As an example if I have an event that has to happen every month the instance will wake up 4 or 5 times (depending) during the month and create the desired timer. I do calculations so I don鈥檛 overshoot my desired NextRun.

Is there anything to look out for with this when testing it locally? We have a solution that could have multiple orchestrations in a running state at the same time. When testing this concurrency locally, the latest call seems to affect the first and the first stays in a running state and never continues.

Was this page helpful?
0 / 5 - 0 ratings