Azure-functions-durable-extension: Non-Deterministic workflow detected: TaskScheduledEvent: 0 TaskScheduled CopyIModelToProject

Created on 14 Nov 2018  路  13Comments  路  Source: Azure/azure-functions-durable-extension

I have recently run into an issue with the "Non-Deterministic workflow detected" message while using Durable functions with Node.js + Typescript. (Note: I am on a consumption plan). The workflow that leads to this exception involves a few components:

  • A Http Endpoint function 'SampleIModelProjects' with GET, POST, PATCH, & DELETE
  • A Http Orchestration Starter (Starts Orchestration Client -> Orchestration Trigger)
  • An Orchestrator (Activated by Orchestration Trigger -> Activity Trigger)
  • An Activity Function 'CopyIModelToProject' (Long running, Memory/CPU intensive, with async I/O)(Activated by Activity Trigger) (Usually takes 1-5 minutes)
  • An Http Orchestration Status Endpoint

When POST is called on the 'SampleIModelProjects' it calls a few APIs, posts to start the Http Orchestration Starter, and returns some data regarding the created project and the status URI for the Orchestration that was started. This works fine in most cases when the load scales slowly, but when the load scales to quickly (lets say 5 workflows in 10 seconds) is when I get the exception "Non-Deterministic workflow detected" and the CopyIModelToProject fails. See below for more details, any suggestions are welcome.

This is what the Status endpoint then returns with 500 when this happens:
Error: StatusCodeError: 500 - {"instanceId":"2ed8b845-b2fc-42c4-9a54-f27c46e4eae3","runtimeStatus":"Failed","input":{"targetProjectName":"IMODEL_DEV_SERVICES_LOAD_TEST","srcProjectName":"Test Project","srcIModelName":"Test IModel","userEmails":["Regular.[email protected]"],"targetProjectId":"f7c35bdd-eb24-4450-8273-e2594162c1e5"},"customStatus":null,"output":"Non-Deterministic workflow detected: TaskScheduledEvent: 0 TaskScheduled CopyIModelToProject","createdTime":"2018-11-14T17:45:51Z","lastUpdatedTime":"2018-11-14T17:48:11Z"}

This is what is logged in App Insights:

Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.CopyIModelToProject ---> Microsoft.Azure.WebJobs.Script.Rpc.RpcException: Result: Failure
Exception: Error Copying IModel
Stack: 
   at Microsoft.Azure.WebJobs.Script.Description.WorkerLanguageInvoker.InvokeCore(Object[] parameters, FunctionInvocationContext context) in C:\azure-webjobs-sdk-script\src\WebJobs.Script\Description\Rpc\WorkerLanguageInvoker.cs:line 74
   at Microsoft.Azure.WebJobs.Script.Description.FunctionInvokerBase.Invoke(Object[] parameters) in C:\azure-webjobs-sdk-script\src\WebJobs.Script\Description\FunctionInvokerBase.cs:line 84
   at Microsoft.Azure.WebJobs.Script.Description.FunctionGenerator.Coerce[T](Task`1 src) in C:\azure-webjobs-sdk-script\src\WebJobs.Script\Description\FunctionGenerator.cs:line 225
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`2.InvokeAsync(Object instance, Object[] arguments) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionInvoker.cs:line 63
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.InvokeAsync(IFunctionInvoker invoker, ParameterHelper parameterHelper, CancellationTokenSource timeoutTokenSource, CancellationTokenSource functionCancellationTokenSource, Boolean throwOnTimeout, TimeSpan timerInterval, IFunctionInstance instance) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 556
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithWatchersAsync(IFunctionInstance instance, ParameterHelper parameterHelper, ILogger logger, CancellationTokenSource functionCancellationTokenSource) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 503
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstance instance, ParameterHelper parameterHelper, IFunctionOutputDefinition outputDefinition, ILogger logger, CancellationTokenSource functionCancellationTokenSource) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 439
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstance instance, FunctionStartedMessage message, FunctionInstanceLogEntry instanceLogEntry, ParameterHelper parameterHelper, ILogger logger, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 249
   --- End of inner exception stack trace ---
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstance instance, FunctionStartedMessage message, FunctionInstanceLogEntry instanceLogEntry, ParameterHelper parameterHelper, ILogger logger, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 293
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.TryExecuteAsync(IFunctionInstance functionInstance, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 89

Here is my host.json:
json { "version": "2.0", "functionTimeout": "00:10:00", "extensions": { "http": { "routePrefix": "api", "maxOutstandingRequests": 100, "maxConcurrentRequests": 50, "dynamicThrottlesEnabled": false }, "queues": { "maxPollingInterval": 60000, "visibilityTimeout" : "00:00:30" }, "durableTask": { "HubName": "DurableTaskHub", "extendedSessionsEnabled": true, "ControlQueueBatchSize": 32, "PartitionCount": 4, "ControlQueueVisibilityTimeout": "00:01:00", "WorkItemQueueVisibilityTimeout": "00:01:00", "MaxConcurrentActivityFunctions": 3, "MaxConcurrentOrchestratorFunctions": 3 } }, "logger": { "categoryFilter": { "defaultLevel": "Information", "categoryLevels": { "Host": "Error", "Function": "Error", "Host.Aggregator": "Information" } }, "applicationInsights": { "sampling": { "isEnabled": true, "maxTelemetryItemsPerSecond" : 5 } } } }

bug

Most helpful comment

The fix for this issue has been merged and will be available in the next release.

All 13 comments

@SpicySyntax Can you share your orchestrator code? This error often indicates a bug in the orchestrator function implementation.

@SpicySyntax Is "Error Copying IModel" an exception message you've defined? If so, in what circumstances is it thrown? +1 on the orchestrator code; that would be very helpful.

@cgillum @kashimiz. That was something I am logging. I noticed an error with one of my client libraries I was using from the orchestration. I have seemed to prevent the error by just increasing the http timeout duration. This was likely what was causing the Non-Deterministic behavior.
However, before closing this. Can either of you see any potential downfalls to the setup described above and below?
Thanks

(Note: using [email protected] and [email protected])

Http Orchestration Starter

index.ts

import { HttpRequest, Context } from "azure-functions-ts-essentials";
import { Guid, Logger } from "@bentley/bentleyjs-core";
import { BaseIndex } from "../shared/BaseIndex";
import { IModelDevServicesConfig } from "../shared/Config";

export = (context: Context, req: HttpRequest) => {
    BaseIndex.initialize();
    const _id = Guid.createValue();
    const startArgs = [{
        FunctionName: req!.params!.functionName,
        Input: req.body,
        InstanceId: _id,
    }];
    Logger.logInfo(IModelDevServicesConfig.loggingCategory, `HttpOrchestrator called targeting: ${req!.params!.functionName}`);
    context.bindings.starter = startArgs;

    const hostUrl = context.req!.originalUrl!.split("/api/")[0]!;
    const _statusQueryGetUri = `${hostUrl}/api/v0/task-status?instanceId=${_id}`;
    context.done(undefined, {status: 202, body: {id: _id, statusQueryGetUri: _statusQueryGetUri}});
};

host.json

{
    "bindings": [
      {
        "authLevel": "anonymous",
        "name": "req",
        "type": "httpTrigger",
        "direction": "in",
        "route": "orchestrator/{functionName}",
        "methods": ["post"]
      },
      {
        "name": "$return",
        "type": "http",
        "direction": "out"
      },
      {
        "name": "starter",
        "type": "orchestrationClient",
        "direction": "out"
      }
    ],
    "disabled": false
  }

Sample IModel Project Orchestrator

index.ts

import * as df from "durable-functions";
export = df.orchestrator(function* (context) {
    const resp = yield context.df.callActivity("CopyIModelToProject", context.bindings);
    return resp;
});

host.json

{
    "bindings": [
      {
        "name": "context",
        "type": "orchestrationTrigger",
        "direction": "in"
      }
    ],
    "disabled": false
  }

Your setup looks reasonable to me. The only question I have is about the _statusQueryGetUri your starter constructs - it doesn't look like our webhook format. Is this another function you've defined? If so, can you share the implementation? Nothing here jumps out at me as something that should cause a non-deterministic workflow exception, and that has me concerned.

@kashimiz yes I defined another function that can be called by external clients. It serves as a proxy, and just targests the webhook endpoint with the Azure Function Code in the request query. This way the external clients won't have to worry about the Code Level Auth imposed by the webhook status endpoint.

Orchestration Status

index.ts

import { Logger } from "@bentley/bentleyjs-core";
import { IModelDevServicesConfig } from "../shared/Config";
import { HttpRequestHandler } from "../shared/HttpRequestHandler";
import { BaseIndex } from "../shared/BaseIndex";
import { HttpRequest, Context, HttpStatusCode, HttpMethod, HttpResponse } from "azure-functions-ts-essentials";
export const index = async (context: Context, req: HttpRequest) => {
    let res: HttpResponse;
    BaseIndex.initialize();
    // TODO: add user level auth
    try {
        switch (req.method) {
            case HttpMethod.Get:
                const taskInstanceId = req.query!.instanceId!;
                Logger.logInfo(IModelDevServicesConfig.loggingCategory, `Fetching status for task: ${taskInstanceId}`);
                const hostUrl = req!.originalUrl!.split("/api/")[0]!;
                const httpRequestHandler = new HttpRequestHandler();
                const options = {
                    method: "GET",
                    json: true,
                    uri: `${hostUrl}/runtime/webhooks/durabletask/instances/${taskInstanceId}?code=${IModelDevServicesConfig.azureFunctionCode}`,
                };

                context.res = {
                    status: HttpStatusCode.Accepted,
                    body: await httpRequestHandler.getRequestResponse(options),
                };
                context.done(undefined, context.res);
                break;
            default:
                res = {
                    status: HttpStatusCode.MethodNotAllowed,
                    body: {
                        error: {
                            type: "not_supported",
                            message: `Method ${req.method} not supported.`,
                        },
                    },
                };
                context.res = res;
                context.done(undefined, context.res);
        }
    } catch (error) {
        Logger.logError(IModelDevServicesConfig.loggingCategory, `Error ${error}`);
        res = {
            status: HttpStatusCode.InternalServerError,
            body: { message: "Internal Server Error" },
        };
        context.res = res;
        context.done(undefined, context.res);
    }
};

function.json

{
  "disabled": false,
  "bindings": [
    {
      "authLevel": "anonymous",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": [
        "get"
      ],
      "route": "v0/task-status"
    },
    {
      "type": "http",
      "direction": "out",
      "name": "res"
    }
  ]
}

@SpicySyntax Ah, makes sense! Thank you for sharing.

it's still unclear what caused you to see a non-deterministic workflow exception from the status endpoint rather than the activity-level exception revealed by App Insights, but I suspect it may be a flaw in our exception propagation logic that slipped through the test suite rather than anything on your end. Thank you for bringing it to my attention.

@SpicySyntax we've identified the issue that is causing this, and will work on making a fix. It turns out that failed JavaScript orchestrations are incorrectly being reported as non-deterministic. We'll try to have a fix out by next week. Thanks for reporting this!

The fix for this issue has been merged and will be available in the next release.

Hi @cgillum, is there a way to tell if this has been released already? Is this in https://www.nuget.org/packages/Microsoft.Azure.WebJobs.Extensions.DurableTask package version 1.7.1?

@cliffkoh I see that this issue was added to the v1.7.0 release milestone, so it should be in nuget packages 1.7.0 and above.

image

Thank you so much :)

I am having this issue appear on a sub orchestrator with an activity that throws an error.

Flow for the durable function is:

  • [X] Event Hub Client

    • [X] PrimaryOrchestrator

    • [X] Activity1

    • [ ] SubOrch1

    • [ ] SubOrch2

    • [X] SubOrch3



      • [X] Activity1 <- Error thrown



Once the error bubbles back up to PrimaryOrchestrator, it no longer contains the original error, but the following error:

"Error: Non-Deterministic workflow detected: 
TaskScheduledEvent: 0 
TaskScheduled JobProcessorUpdateActivity \n    
at Orchestrator.callSubOrchestrator (/Users/rosteele/dev/pcc/TaskApi-AzFn/node_modules/durable-functions/lib/src/orchestrator.js:162:164)\n    
at Orchestrator.jobOrchestrator [as fn] (/Users/rosteele/dev/pcc/TaskApi-AzFn/JobProcessorOrchestrator/jobOrchestrator.js:48:43)\n    
at jobOrchestrator.next (<anonymous>)\n   
at Orchestrator.<anonymous> (/Users/rosteele/dev/pcc/TaskApi-AzFn/node_modules/durable-functions/lib/src/orchestrator.js:96:29)\n    
at Generator.next (<anonymous>)\n    at /Users/rosteele/dev/pcc/TaskApi-AzFn/node_modules/durable-functions/lib/src/orchestrator.js:7:71\n    at new Promise (<anonymous>)\n    
at __awaiter (/Users/rosteele/dev/pcc/TaskApi-AzFn/node_modules/durable-functions/lib/src/orchestrator.js:3:12)\n    
at Orchestrator.handle (/Users/rosteele/dev/pcc/TaskApi-AzFn/node_modules/durable-functions/lib/src/orchestrator.js:22:16)\n    
at /Users/rosteele/dev/pcc/TaskApi-AzFn/node_modules/durable-functions/lib/src/shim.js:7:9"

I was in process of moving everything in our DF to use context.done(error, null) where there are errors to bubble the error to the main orchestrator, but this seems to be a blocking issue to do so.

The call to the sub orchestrator and any activity has :0 appended to the end of it from another bug.

I'm using JS and having this issue with any unhandled exception whatsoever, even in simplest cases.

Microsoft.Azure.WebJobs.Extensions.DurableTask version 1.8.3 (tried with 1.7.0 and 2.0.0-beta1 too with no luck)

Orchestrator:

durableFunctions.orchestrator(function*(context) {
  const args = context.df.getInput()
  yield context.df.callActivity('activity', args)
})

Activity:

function(context) {
  throw new Error('Test error')
  context.done()
}

Results in Non-Deterministic workflow detected: TaskScheduledEvent: 0 TaskScheduled activity

the only way to get rid of it - always put try/catch for entire orchestration code.
InstanceId 2ebb9d195cfc4b08946efe3ac5f5ed6a

Was this page helpful?
0 / 5 - 0 ratings