Tracking item to investigate delays in queue triggers starting with v2.0.12050
Reported by @odd-rune
So far, there are delays right after any deployment either via deployment script or when functions runtime rolls out which triggers the app restart.
cc: @davidebbo
@NickDarvey - Can you please share your app name either here or privately?
Is your app using Java functions?
Thanks for opening this. Our app is transformbind3mbovetps52rq and using dotnet.
We see this deployed in Azure, but our easiest repro for this is from our end-to-end tests which run using Docker containers (mcr.microsoft.com/azure-functions/dotnet:2.0 and arafato/azurite:2.6.5) and test a chain of our Functions with queues in between. They were consistently running at ~1 minute on 2.0.11961-alpha but now are always >5 minutes on 2.0.12134. If we monitor the queues during the test, messages are sitting for sometimes minutes. This despite the listener Function host running (we added a test timer just to tick to indicate the host was running fine).
This is our queue trigger configuration in host.json:
{
"version": "2.0",
"logging": {
},
"extensions": {
"queues": {
"maxPollingInterval": 5000,
"visibilityTimeout": "00:00:30",
"batchSize": 8,
"maxDequeueCount": 5,
"newBatchThreshold": 8
}
}
}
cc: @maiqbal11 @paulbatum
Here is an instance of a queue message sitting for >30 minutes.
Environment
Name: transformutil3mbovetps52rq
Queue message ID: b6274923-6e9a-4dbb-9a11-a110829e8c55
Queue message insertion timestamp: 20181015T11:21:33+11:00
Logs
Query: union * | where cloud_RoleName == "transformutil3mbovetps52rq"
Timespan: Last two days
Result: 5 of these, all from around 20181015T11:45+11:00
Host Status: {
"id": "transformutil3mbovetps52rq",
"state": "Running",
"version": "2.0.12134.0",
"versionDetails": "2.0.12134.0 Commit hash: d32ea0a0be66ce1208ca677190c54a8f8838ad82"
}
Mitigation
Restarting Function app transformutil3mbovetps52rq caused queue message b6274923-6e9a-4dbb-9a11-a110829e8c55 to be popped within a minute.
Today we noticed that a queue had over 350 messages.
It had not been process since 2018-10-17 00:50:15.644.
Disabled and enabled the function.
The processing then resumed from 2018-10-19T11:30:07.890.
cc: @mhoeger (current on call)- Can you please take a look?
Same experience for us with .NET Core + 2.0.12134.0, 1+ hour as we speak.
You're hitting the issue described here: https://github.com/Azure/azure-webjobs-sdk/pull/1988.
The fix is to change your MaxQueuePollingInterval from an integer 5000 to a timespan "00:00:05".
The documentation for maxPollingInterval still uses integers which makes this confusing.
https://docs.microsoft.com/en-us/azure/azure-functions/functions-host-json
@NoelOConnell you linked to the v1 wiki doc. I've already fixed the v2 wiki doc with a correct sample here: https://github.com/Azure/azure-functions-host/wiki/host.json-(v2).
Also, we have a doc bug to fix up the non wiki official doc: https://github.com/MicrosoftDocs/azure-docs/issues/16263
@mathewc that doc page is for both v1 and v2, but only mentions maxPollingInterval in the v1 part. So I'd suggest also adding maxPollingInterval to the v2 part, with the correct syntax sample.
@jeffhollan is making the doc updates and had a PR out I believe: https://github.com/jeffhollan/azure-docs-pr/commit/4903aaf7716644b62843954822ab95beeabb3d6c. Jeff can you fix the doc here?
Yes doc fix out and going out today.
Can you please fix / append to the docs everywhere? Don't see a mention of the change in format here
https://docs.microsoft.com/de-de/azure/azure-functions/functions-bindings-storage-queue#trigger---hostjson-properties
I had this wrong for weeks now and had no clue I was using the wrong format. Some validation + warning would be nice. Really frustrating 馃憥
Yes update here - doesn't look like it's been localized yet though
https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue#hostjson-settings
That said this is a good callout as I didn't realize this trigger doc had a seperate host.json already. Will clean that up.