The visibility timeout of a message could be configured via a specific attribute or a new property of the QueueTrigger attribute. I suspect that currently the timeout is not the default and there is no way to define a custom timeout. I mean something like EstimatedTimeToProcessMessage (an int with minutes or milliseconds).
Just a feature request.
What problem are you having? What visibility timeout are you referring to exactly?
Hi Math.
I have messages that can be processed in ~30" and can fails max 2 times. Those messages have to be processed ASAP or fails ASAP. The problem come when a message fails, it seems that it will be "re-processed" ~10minutes after the first fault.
I would have something that allow me to define an EstimatedTimeToProcessMessage, or something that the SDK can learn by itself, to establish a more accurate visibility timeout.
When the EstimatedTimeToProcessMessage is defined the SDK can use a function of it (example EstimatedTimeToProcessMessage * 2) to define the default timeout for a specific queue.
I'm also looking into Azure Web Jobs for a new project and just checked this comment by @mathewc which states that the default queue visibility timeout is set at 10 minutes. This is a value highly dependent on project context and should be configurable.
Note that this 10 minute timeout will only occur in rare cases, say if the host dies, etc. During regular processing, if an invocation fails, there is a different configurable timeout that is used. See the code here. You can configure that via JobHostConfiguration.Queues.VisibilityTimeout. I believe that is what you are looking for. In regular processing while the host remains up and running, there is no 10 minute delay.
We could also make that initial 10 minute timeout configurable if we wanted - do you require that?
Thank you, Mathew, for the rapid response.
The configuration you provided is enough for me. Although I don't NEED it, for my specific project, in case of the host dying, a lower timeout would be desirable. I think it's a valid feature for the SDK.
Thank you.
@fabiomaulo can you confirm that this existing knob also meets your needs? Feel free to log a feature request for the other timeout config if it turns out you need that. But in all the years of this project, I haven't heard people having problems with that timeout.
Note that the new configuration knob I mentioned is new in 2.0.0 which we released last week. So upgrade if you need to.
Sorry for the delay...
Math, it doesn't.
I know the configurable timeout (for all queues managed inside the same worker) and even the possibility to use the IQueueStorageProcessorFactory to have specific configuration per Queue.
In fact I could use a specific implementation of IQueueStorageProcessorFactory to configure the specific QueueProcessor but... why implements n classes, where one (the factory) have to check by string-comparison (queue name), when queueName, storageAccountConnectionString and estimatedTime can be specified in the same line exactly where the message will be consumed ?
Reopening. @fabiomaulo I want to be sure I understand exactly what you're asking for. You're saying that the _initial_ timeout we use with the 10 minute delay is causing you issues? Again you should only see that delay in play if the host died unexpectedly, which should happen rarely. That timeout is here in the code. How specifically is this causing you issues in practice - are you really seeing 10 minute delays often?
@mathewc what happen when the job fail ? Which is the time between the first fail and the second dequeue ?
The message-process may fail more than one time (that is why we have the maxdequeuecount).
When the job function fails, the aforementioned JobHostConfiguration.Queues.VisibilityTimeout governs, as I mentioned above. I think this is all you are looking for - its already there.
That is right but... JobHostConfiguration.Queues.VisibilityTimeout is for all queues managed in a WebJob (queueS).
Perhaps is a matter of philosophy, let me hypothesize to understand better:
If this is the philosophy, the unique VisibilityTimeout for all queues managed in a WebJob is acceptable even if it should be clear to everybody.
If the WebJob SDK let us work and group QueuesTriggers in the way we need (as so far) without create a WebJob project per each queue, we should have a more fine grained configuration per queue without implements "custom" QueueProcessor just to configure each.
That is my opinion.
This makes sense, but it's pretty big. We'll need to see some more folks suggest this before we can justify tackling it over other features.
Ok, no problem.
Btw the code to implements it is already there...
https://github.com/Azure/azure-webjobs-sdk/blob/61aa42461696de855f0780aafa52ca386027f62e/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs
In the ctor the QueueProcessor copy the configuration in its state so each queueprocessor can work independently from others.
Even the QueueProcessorFactoryContext has all needed properties.
The matter is read all specific configuration in the same place where the name of the queue is... ;)
@mathewc I am having a similar problem. I think there are 2 settings.
I set config.Queues.VisibilityTimeout = new TimeSpan(0, 0, 15, 0);
But now when there is an exception, it takes 15min for the message to appear in the queue again.
How do I solve this problem?
I am also experiencing repeatable behaviour, whereby the code that is supposed to be 'renewing' the visibility timeout seems to not get executed. One outcome is that the queue message is processed twice. How? The original message is still in memory waiting to be processed and the same queue message becomes visible again on the queue.
This only seems to happen when i stress test my application and there is a backlog of thousands of queue messages. Im assuming the competition for resources is causing the 'renew' task to fail/not get executed, but i cant be 100% sure. Maybe the application is running out of threads?
When I increase the visibility timeout to 6 hours of the message (using a local built version of the SDK on https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Queues/Listeners/QueueListener.cs#L79 ) this behaviour stops.
I think i saw something similar mentioned in a different thread. Is there a known work around to increase the default visibility timeout without implementing a custom code fix? Or perhaps another solution to this problem...like increase the processing power?
why don't just let configure the visibilityTimeout through output bindings ?
ie.
await` ReleaseMessageAsync(message, result, message.VisibilityTimeout ? message.VisibilityTimeout : VisibilityTimeout, cancellationToken);
this would let the user plug some custom delay strategy (ie. exponential backoff)
see also https://github.com/Azure/azure-webjobs-sdk-script/issues/1465
@gorillapower I have the same problem. When we "attack" the queue with thousands of messages in short period of time, we start receiving Exceptions because the storage itself cannot handle the load, but once we've resolved that, now we're seeing function executions not ending, or waiting (while marked as "Never finished"), and messages being processed multiple times.
@mathewc @christopheranderson if the function hangs, and visibility timeout expires, thus returning the message into the queue, does the dequeue count change? How can we resolve this issue? It does happen rarely, under heavy loads, but it still happens. The function ends up idling in that weird state, not throwing exceptions and not succeeding, so the configurable VisibilityTimeout setting never kicks in. As @gorillapower said the problem would probably be resolved if we would be able to set the initial timeout to higher value.
@gorillapower's comments above on host instances under extreme load resulting in background visibility renewal threads not being able to run is correct. We've seen this come up in other situations as well (e.g. Singleton logs which rely on background renewals of blob leases). If you're running into issues like this (e.g. you're maxing out CPU/memory, etc.) then you need to either scale up/out, or throttle your instance concurrency down using the the queue config settings (BatchSize/NewBatchThreshold).
@nixa333 Yes, when messages fail processing due to visibility timeout expiry, Azure Storage will increment the dequeue count the next time that message is fetched. You CAN set the initial visibility timeout to a higher value via JobHostQueuesConfiguration.VisibilityTimeout.
Anyhow, the issues that are being discussed now are not the same as the original issue that this item remains open for - the request to allow the visibility timeout to be declaratively configured per function, as opposed to the current host level knob that applies to all functions.
@mathewc I meant the initial 10 minute visibility delay, and this cannot be altered with the property you mentioned, as it only has effect on failed calls. I would like to alter this property and set it for example to 2-3 hours.