Azure-webjobs-sdk: Consider Redesign of Control Queue monitoring

Created on 15 Jul 2016  路  15Comments  路  Source: Azure/azure-webjobs-sdk

The JobHost monitors host instance specific control queues for Dashboard communication (Abort messages, etc.). Since each _instance_ of the host has a new ID, these means a new queue for each instance. Now these queues are only created if the Dashboard actually adds a message (e.g. the user clicks the "Abort" button). However, it does mean that the host is constantly polling non existent queues waiting for a message. This results in storage 404s.

While these aren't errors per say, we have had lots of customers complain about this, since it fills their storage logs, App Insights logs, etc. with tons of 404s, producing a lot of noise.

See these related issues:

feature-monitoring improvement

Most helpful comment

Even if there isn't investment in the dashboard, right now every user runs into this the first time they write a Web Job. I've explained this 4 times this week. One person spent hours trying to track the error (they had queue job, and so thought it was coming from them)

All 15 comments

:+1:
It took me a while to figure out what was going on here. Is there a reason the SDK doesn't create the Queue -- either on startup or another point -- similar to the behavior for poison queues?

What would be the preferred approach here?

Moving this back to triage, since people are using App Insights more and more, causing these 404s to become more visible. See https://github.com/Azure/azure-webjobs-sdk/issues/770

@hrboyceiii Note that these queues are function invocation instance specific - so they can't be created on startup. The instance ID is only know when an invocation has started.

@christopheranderson - this will impact monitoring improvements you're looking at

I see some serious problems with all this 404 garbage being generated by WebJobs SDK:

  • User confusion - a working program is not supposed to generate errors.
  • Noise - the failures generated by WebJobs SDK may mask real program errors.
  • Quota expenditure - we are on the free plan for Application Insights and are constantly hitting the cap.
  • Frustrates WebJobs debugging - we have had an issue with the WebJobs dashboard since the beginning, but were unable to ever solve it. These constant 404 errors makes it hard to debug problems with the WebJobs SDK.

@mathewc - since we're not planning on investing in the dashboard going forward, would it make sense to give folks the option of disabling these control queues, thus avoiding the 404 noise?

@christopheranderson Does that mean that the dashboard will be treated as is or would it be depreciated in favor of other logging systems?

Even if there isn't investment in the dashboard, right now every user runs into this the first time they write a Web Job. I've explained this 4 times this week. One person spent hours trying to track the error (they had queue job, and so thought it was coming from them)

I think we should address this. A simple change we can make is to perform an existence check on the queue prior to trying to dequeue on it. We're currently doing an optimistic dequeue and handling failures. That doesn't make sense for a queue that will rarely ever exist.

I'm using Microsoft.Azure.WebJobs.Core" Version="3.0.0-beta5" against netcoreapp2.1.

In the debug console I'm flooded with those __"The specified queue does not exist"__-messages:

Exception has occurred: CLR/Microsoft.WindowsAzure.Storage.StorageException
Exception thrown: 'Microsoft.WindowsAzure.Storage.StorageException' in System.Private.CoreLib.dll: 'The specified queue does not exist.'
   at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.<ExecuteAsyncInternal>d__4`1.MoveNext()
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.WindowsAzure.Storage.Queue.CloudQueue.<>c__DisplayClass83_0.<<GetMessagesAsync>b__0>d.MoveNext()
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at Microsoft.Azure.WebJobs.Host.Storage.Queue.StorageQueue.<GetMessagesAsyncCore>d__16.MoveNext() in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Storage\Queue\StorageQueue.cs:line 106
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at Microsoft.Azure.WebJobs.Host.Queues.Listeners.QueueListener.<ExecuteAsync>d__24.MoveNext() in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Queues\Listeners\QueueListener.cs:line 145

I was expecting that this bug was already fixed via PR #1572??

@JohannesHoppe you're right - https://github.com/Azure/azure-webjobs-sdk/pull/1572 should have did address this. The problem is that that fix was never ported to our dev branch! I'll do that now, and the fix will be in our next release. Thanks for reporting :)

PR: https://github.com/Azure/azure-webjobs-sdk/pull/1821

You're welcome! 馃槃

In which version was this fix released? Did it ever end up in version 2.x?

@staal-it from the merge date and the nuget dll create dates I would imply that v3+ is when it made it in.

I have wrapped all my 'CreateIfNotExists' with :

  if(!qRef.Exists()) qRef.CreateIfNotExists();
Was this page helpful?
0 / 5 - 0 ratings