This enhancement allows orchestrator and activity functions to send large argument or event payloads when invoking other functions. By default all function call payloads are serialized into a queue message. If the size of the payload exceeds the queue message limit, then the payload will instead be stored in blob storage. This happens automatically and the function code does not need to be aware of it.
Many users will have scenarios where they need to deliver large payloads to other functions. Having support for this would help save developers a lot of time and effort, especially when they may not be aware of the size limitations of queues (or may not be aware that some payloads will exceed those sizes).
Hi,
We're running into this issue (i've read your SO posts etc on this).
What patterns do you recommend to work around this problem while still being able to use durable functions? Our scenario involves getting a large (1MB) amount of data and then processing a large amount of data from redis (an action per redis key essentially). Right now, there's no way to get the data back to the orchestrator function (for it to spin up activity functions to process the data).
Do you have the option of compressing the data? For example, could you have the activity function return a string and write code that serializes the payload, compresses it using GZip, and then returning the base64-encoded version.
We do, but i'm not sure it's going to help in our particular scenario. We're retrieving a large amount of keys from Redis (tens of thousands) and even compressed, we'll likely be over the 64KB limit. Unfortunately redis also doesn't seem to have a way for us to get keys in a partitioned manner (so that we could spin up activity functions per partition) .
This is somewhat urgent so i went back to vanilla functions and queues in the meantime.
OK, thanks for letting me know, and sorry that this was a blocker for you. :(
Anytime. And no worries; these are early days still for durable functions and i'm excited for what we'll be able to do once the platform matures a bit.
Thanks to you and the team for all your hard work on it.
This would also be very useful when fanning out. I've already had to minimize chaining a bit as I can't pass messages around (too big), but now I'm looking at the orchestrator doing more work (loading blobs directly) than it should to manage fanning.
Will this be fixed for GA? I currently need to use Blob path outputs from some of my activities just to make sure, removing that overhead and having the framework manage it would be interesting for us.
Yes - this will be in for GA. II've already started some preliminary work on this. Step one is to introduce compression so that we can deal with larger payloads without involving blobs. Step 2 is to start moving message payloads into blob storage. Step 3 would be to optimize our use of blobs to ensure we're not refetching too often during replays, but that likely plays into our discussion for #82.
@cgillum does this only apply to the function arguments or is it also the return values that are size limited?
Won't adding compression add overhead/pressure to the orchestrator replays which need to lightweight?
Yes, this would apply both to arguments and return values.
And yes, compression would definitely add CPU overhead. We would only apply compression if the payload is > 64KB so that smaller payloads are not penalized.
Longer term I want to eliminate replays where possible so that this becomes less of an issue. That's being tracked separately.
@cgillum which strategy will be included for the RC: blob or compression?
We're targeting having both for RC. The general approach is this:
We ended up with a simpler solution than what I mentioned earlier, which is to either do the inline JSON like we do today, or do compressed blob storage. No intermediary compressed storage in the table. This large message support will go out as part of the next release.
@cgillum I am still getting this error while setting CustomStatus Object. We are sending some data back from an excel file, that fails validation, in the custom status object. Whenever the invalid records size goes beyond 16 KB, we get the following error :
"Orchestrator function 'ProcessExcelOrchestrator' failed: The UTF-16 size of the JSON-serialized payload must not exceed 16 KB. The current payload size is 61 KB."
We are using Azure Functions v2, Microsoft.Azure.WebJos.Extensions.Storage 3.0.0 and Microsoft.Azure.WebJobs.Extensions.DurableTask 1.7.0
@rahuldobriyal The limit for custom status objects is limited to 16 KB. We mention this in our documentation here: https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-diagnostics#custom-status
The custom status payload is limited to 16 KB of UTF-16 JSON text because it needs to be able to fit in an Azure Table Storage column. You can use external storage if you need larger payload.
Large message support is intended for function inputs and outputs. We don't currently have plans to increase the limit for custom status values. I suggest instead manually uploading large status messages to blob storage and then exposing the blob URL as the custom status value.
I am having an issue returning a large list of objects from an activity function to an orchestrator function. I have a function that downloads a 180 MB file and parses it. This file will produce a list of objects with over 962K entries. Each object has about 70 properties but only about 20% of them are populated. When I run the function, the code successfully downloads and parses the file into the list, but when the list is returned, an exception is raised with the following information:
Exception: "Exception while executing function: #######" - Source: "System.Private.CoreLib"
Inner exception: "Error while handling parameter $return after function returned." - Source: "Microsoft.Azure.WebJobs.Host"
Inner / Inner exception: "Exception of type 'System.OutOfMemoryException' was thrown." - Source: "System.Private.CoreLib"
The last nested exception lists the NewtonsoftJson package as being the one making the call that generates the out of memory error being reported. I am including the full stack trace for this exception at the end.
Is there anything I need to do to get this working? This is the code and the exception is raised at the return:
[FunctionName("GetDataFromSource")]
public static IEnumerable<DataDetail> GetDataFromSource([ActivityTrigger]ISource source, ILogger logger)
{
try
{
string importSettings = Environment.GetEnvironmentVariable(source.SettingsKey);
if (string.IsNullOrWhiteSpace(importSettings))
{
logger.LogError($"No settings key information found for the {source.SourceId} data source"); }
else
{
List<DataDetail> _Data = source.GetVinData().Distinct().ToList();
return vinData;
}
}
catch (Exception ex)
{
logger.LogCritical($"Error processing the {source.SourceId} Vin data source. *** Exception: {ex}");
}
return new List<DataDetail>();
}
This is the stack trace for the most inner exception:
```
at System.Text.StringBuilder.ExpandByABlock(Int32 minBlockCharCount)
at System.Text.StringBuilder.Append(Char value, Int32 repeatCount)
at System.Text.StringBuilder.Append(Char value)
at System.IO.StringWriter.Write(Char value)
at Newtonsoft.Json.JsonTextWriter.WritePropertyName(String name, Boolean escape)
at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeObject(JsonWriter writer, Object value, JsonObjectContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty)
at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeValue(JsonWriter writer, Object value, JsonContract valueContract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerProperty)
at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeList(JsonWriter writer, IEnumerable values, JsonArrayContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty)
at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeValue(JsonWriter writer, Object value, JsonContract valueContract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerProperty)
at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.Serialize(JsonWriter jsonWriter, Object value, Type objectType)
at Newtonsoft.Json.JsonSerializer.SerializeInternal(JsonWriter jsonWriter, Object value, Type objectType)
at DurableTask.Core.Serializing.JsonDataConverter.Serialize(Object value, Boolean formatted)
at Microsoft.Azure.WebJobs.Extensions.DurableTask.MessagePayloadDataConverter.Serialize(Object value, Int32 maxSizeInKB) in C:\projects\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\MessagePayloadDataConverter.cs:line 55
at Microsoft.Azure.WebJobs.Extensions.DurableTask.MessagePayloadDataConverter.Serialize(Object value) in C:\projects\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\MessagePayloadDataConverter.cs:line 43
at Microsoft.Azure.WebJobs.DurableActivityContext.SetOutput(Object output) in C:\projects\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\DurableActivityContext.cs:line 136
at Microsoft.Azure.WebJobs.Extensions.DurableTask.ActivityTriggerAttributeBindingProvider.ActivityTriggerBinding.ActivityTriggerReturnValueBinder.SetValueAsync(Object value, CancellationToken cancellationToken) in C:\projects\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Bindings\ActivityTriggerAttributeBindingProvider.cs:line 213
at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ParameterHelper.ProcessOutputParameters(CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 972
````
Most helpful comment
We're targeting having both for RC. The general approach is this: