Azure-webjobs-sdk: Per Function Queue message Visibility Timeout configuration

Created on 28 Feb 2017 · 19Comments · Source: Azure/azure-webjobs-sdk

The visibility timeout of a message could be configured via a specific attribute or a new property of the QueueTrigger attribute. I suspect that currently the timeout is not the default and there is no way to define a custom timeout. I mean something like EstimatedTimeToProcessMessage (an int with minutes or milliseconds).

Just a feature request.

Discuss improvement

Source

fabiomaulo

👍6

All 19 comments

What problem are you having? What visibility timeout are you referring to exactly?

mathewc on 1 Mar 2017

Hi Math.
I have messages that can be processed in ~30" and can fails max 2 times. Those messages have to be processed ASAP or fails ASAP. The problem come when a message fails, it seems that it will be "re-processed" ~10minutes after the first fault.
I would have something that allow me to define an EstimatedTimeToProcessMessage, or something that the SDK can learn by itself, to establish a more accurate visibility timeout.
When the EstimatedTimeToProcessMessage is defined the SDK can use a function of it (example EstimatedTimeToProcessMessage * 2) to define the default timeout for a specific queue.

fabiomaulo on 1 Mar 2017

I'm also looking into Azure Web Jobs for a new project and just checked this comment by @mathewc which states that the default queue visibility timeout is set at 10 minutes. This is a value highly dependent on project context and should be configurable.

brunoklein99 on 8 Mar 2017

👍1

Note that this 10 minute timeout will only occur in rare cases, say if the host dies, etc. During regular processing, if an invocation fails, there is a different configurable timeout that is used. See the code here. You can configure that via JobHostConfiguration.Queues.VisibilityTimeout. I believe that is what you are looking for. In regular processing while the host remains up and running, there is no 10 minute delay.

We could also make that initial 10 minute timeout configurable if we wanted - do you require that?

mathewc on 8 Mar 2017

👍1

Thank you, Mathew, for the rapid response.

The configuration you provided is enough for me. Although I don't NEED it, for my specific project, in case of the host dying, a lower timeout would be desirable. I think it's a valid feature for the SDK.

Thank you.

brunoklein99 on 8 Mar 2017

@fabiomaulo can you confirm that this existing knob also meets your needs? Feel free to log a feature request for the other timeout config if it turns out you need that. But in all the years of this project, I haven't heard people having problems with that timeout.

Note that the new configuration knob I mentioned is new in 2.0.0 which we released last week. So upgrade if you need to.

mathewc on 8 Mar 2017

👍1

Sorry for the delay...
Math, it doesn't.
I know the configurable timeout (for all queues managed inside the same worker) and even the possibility to use the IQueueStorageProcessorFactory to have specific configuration per Queue.
In fact I could use a specific implementation of IQueueStorageProcessorFactory to configure the specific QueueProcessor but... why implements n classes, where one (the factory) have to check by string-comparison (queue name), when queueName, storageAccountConnectionString and estimatedTime can be specified in the same line exactly where the message will be consumed ?

fabiomaulo on 8 Mar 2017

Reopening. @fabiomaulo I want to be sure I understand exactly what you're asking for. You're saying that the _initial_ timeout we use with the 10 minute delay is causing you issues? Again you should only see that delay in play if the host died unexpectedly, which should happen rarely. That timeout is here in the code. How specifically is this causing you issues in practice - are you really seeing 10 minute delays often?

mathewc on 9 Mar 2017

@mathewc what happen when the job fail ? Which is the time between the first fail and the second dequeue ?
The message-process may fail more than one time (that is why we have the maxdequeuecount).

fabiomaulo on 9 Mar 2017

When the job function fails, the aforementioned JobHostConfiguration.Queues.VisibilityTimeout governs, as I mentioned above. I think this is all you are looking for - its already there.

mathewc on 10 Mar 2017

That is right but... JobHostConfiguration.Queues.VisibilityTimeout is for all queues managed in a WebJob (queueS).
Perhaps is a matter of philosophy, let me hypothesize to understand better:

a webjob (app) run inside a WebApp
a WebApp runs in the hw defined by the AppPlan and is invoiced by it's AppPlan
to have x WebApps each with y WebJobs where each WebJob has z Queue triggers all running in the same AppPlan has no impact in the cost.
So...
we can have a WebJob with unique configuration per unique queuetrigger.

If this is the philosophy, the unique VisibilityTimeout for all queues managed in a WebJob is acceptable even if it should be clear to everybody.

If the WebJob SDK let us work and group QueuesTriggers in the way we need (as so far) without create a WebJob project per each queue, we should have a more fine grained configuration per queue without implements "custom" QueueProcessor just to configure each.

That is my opinion.

fabiomaulo on 11 Mar 2017

This makes sense, but it's pretty big. We'll need to see some more folks suggest this before we can justify tackling it over other features.

christopheranderson on 24 Apr 2017

Ok, no problem.
Btw the code to implements it is already there...
https://github.com/Azure/azure-webjobs-sdk/blob/61aa42461696de855f0780aafa52ca386027f62e/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs

In the ctor the QueueProcessor copy the configuration in its state so each queueprocessor can work independently from others.
Even the QueueProcessorFactoryContext has all needed properties.
The matter is read all specific configuration in the same place where the name of the queue is... ;)

fabiomaulo on 24 Apr 2017

@mathewc I am having a similar problem. I think there are 2 settings.

My webjob runs for 10min. So I don't want the message to reappear in the queue after 5 min
If the web job function threw an exception, I would like the message to reappear in the queue quite soon.

I set config.Queues.VisibilityTimeout = new TimeSpan(0, 0, 15, 0);

But now when there is an exception, it takes 15min for the message to appear in the queue again.

How do I solve this problem?

suhu on 11 May 2017

I am also experiencing repeatable behaviour, whereby the code that is supposed to be 'renewing' the visibility timeout seems to not get executed. One outcome is that the queue message is processed twice. How? The original message is still in memory waiting to be processed and the same queue message becomes visible again on the queue.

This only seems to happen when i stress test my application and there is a backlog of thousands of queue messages. Im assuming the competition for resources is causing the 'renew' task to fail/not get executed, but i cant be 100% sure. Maybe the application is running out of threads?

When I increase the visibility timeout to 6 hours of the message (using a local built version of the SDK on https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Queues/Listeners/QueueListener.cs#L79 ) this behaviour stops.

I think i saw something similar mentioned in a different thread. Is there a known work around to increase the default visibility timeout without implementing a custom code fix? Or perhaps another solution to this problem...like increase the processing power?

gorillapower on 8 Nov 2017

why don't just let configure the visibilityTimeout through output bindings ?

ie.

await` ReleaseMessageAsync(message, result, message.VisibilityTimeout ? message.VisibilityTimeout : VisibilityTimeout, cancellationToken);

in
https://github.com/Azure/azure-webjobs-sdk/blob/bd5891d622c06bbed5141beb7332c9b1b1ab6a93/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs#L123

this would let the user plug some custom delay strategy (ie. exponential backoff)

gunzip on 26 Dec 2017

@gorillapower I have the same problem. When we "attack" the queue with thousands of messages in short period of time, we start receiving Exceptions because the storage itself cannot handle the load, but once we've resolved that, now we're seeing function executions not ending, or waiting (while marked as "Never finished"), and messages being processed multiple times.
@mathewc @christopheranderson if the function hangs, and visibility timeout expires, thus returning the message into the queue, does the dequeue count change? How can we resolve this issue? It does happen rarely, under heavy loads, but it still happens. The function ends up idling in that weird state, not throwing exceptions and not succeeding, so the configurable VisibilityTimeout setting never kicks in. As @gorillapower said the problem would probably be resolved if we would be able to set the initial timeout to higher value.

nixa333 on 6 Jun 2018

@gorillapower's comments above on host instances under extreme load resulting in background visibility renewal threads not being able to run is correct. We've seen this come up in other situations as well (e.g. Singleton logs which rely on background renewals of blob leases). If you're running into issues like this (e.g. you're maxing out CPU/memory, etc.) then you need to either scale up/out, or throttle your instance concurrency down using the the queue config settings (BatchSize/NewBatchThreshold).

@nixa333 Yes, when messages fail processing due to visibility timeout expiry, Azure Storage will increment the dequeue count the next time that message is fetched. You CAN set the initial visibility timeout to a higher value via JobHostQueuesConfiguration.VisibilityTimeout.

Anyhow, the issues that are being discussed now are not the same as the original issue that this item remains open for - the request to allow the visibility timeout to be declaratively configured per function, as opposed to the current host level knob that applies to all functions.

mathewc on 6 Jun 2018

@mathewc I meant the initial 10 minute visibility delay, and this cannot be altered with the property you mentioned, as it only has effect on failed calls. I would like to alter this property and set it for example to 2-3 hours.

nixa333 on 6 Jun 2018

Was this page helpful?

0 / 5 - 0 ratings