Azure-webjobs-sdk: Investigate moving away of use of StorageException Extended Error Info in Exception helpers

Created on 23 Nov 2016  路  4Comments  路  Source: Azure/azure-webjobs-sdk

From very early on, the SDK has used a set of StorageException extension methods to inspect and classify Azure Storage exceptions and to make decisions based on that info.

However, we've seen some instances that suggest that this information might not be reliable. We should investigate our use of this info for any important decisions and move away from it.

For example, we've received the following stack trace from a customer:

OnExceptionReceived got unhandled exception: Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (404) Not Found. ---> 
System.Net.WebException: The remote server returned an error: (404) Not Found.
at Microsoft.WindowsAzure.Storage.Shared.Protocol.HttpResponseParsers.ProcessExpectedStatusCodeNoException[T](HttpStatusCode expectedStatusCode, HttpStatusCode actualStatusCode, T retVal, StorageCommandBase`1 cmd, Exception ex) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\Common\Shared\Protocol\HttpResponseParsers.Common.cs:line 50
at Microsoft.WindowsAzure.Storage.Blob.CloudBlob.<DeleteBlobImpl>b__29(RESTCommand`1 cmd, HttpWebResponse resp, Exception ex, OperationContext ctx) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Blob\CloudBlob.cs:line 3118
at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndGetResponse[T](IAsyncResult getResponseResult) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Core\Executor\Executor.cs:line 299 --- End of inner exception stack trace --- Server stack trace:
at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndExecuteAsync[T](IAsyncResult result) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Core\Executor\Executor.cs:line 50
at Microsoft.WindowsAzure.Storage.Blob.CloudBlob.EndDelete(IAsyncResult asyncResult) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Blob\CloudBlob.cs:line 1496
at Microsoft.WindowsAzure.Storage.Core.Util.AsyncExtensions.<>c__DisplayClass4.<CreateCallbackVoid>b__3(IAsyncResult ar) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Core\Util\AsyncExtensions.cs:line 114 --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.WebJobs.Host.Protocols.PersistentQueueWriter`1.<DeleteAsync>d__6.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.WebJobs.Host.Loggers.CompositeFunctionInstanceLogger.<DeleteLogFunctionStartedAsync>d__e.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<TryExecuteAsync>d__1.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.WebJobs.Host.Executors.TriggeredFunctionExecutor`1.<TryExecuteAsync>d__0.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.WebJobs.ServiceBus.Listeners.ServiceBusTriggerExecutor.<ExecuteAsync>d__1.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.WebJobs.ServiceBus.Listeners.ServiceBusListener.<ProcessMessageAsync>d__3.MoveNext() Exception rethrown
at [0]:
at Microsoft.ServiceBus.Common.AsyncResult.End[TAsyncResult](IAsyncResult result)
at Microsoft.ServiceBus.Messaging.MessageReceivePump.EndTaskCallback(IAsyncResult asyncResult)
at Microsoft.ServiceBus.Messaging.MessageReceivePump.DispatchAsyncResult.<GetAsyncSteps>b__46(DispatchAsyncResult thisPtr, IAsyncResult r)
at Microsoft.ServiceBus.Messaging.IteratorAsyncResult`1.StepCallback(IAsyncResult result) Request Information RequestID:a2ac46a7-0001-013d-31c3-438b1b000000 RequestDate:Mon, 21 Nov 2016 06:51:32 GMT StatusMessage:The specified blob does not exist.

This does seem to indicate that we got an exception for a blob that doesn't exist, and the extended info was null (based on their ToString implemenation here).

improvement

All 4 comments

The information coming across the wire (in the HTTP response body) is reliable. But as you've discovered, there's currently a bug in the storage SDK that makes it unreliable there in certain edge cases:
https://github.com/Azure/azure-storage-net/issues/271
However, I believe the only case in which it's a problem is when the deployment is seriously broken anyway. Specifically, this situation can occur if some table-related assemblies can't be loaded (either because they're not present or because binding redirect configuration is broken). The bug is basically that the Storage SDK silently returns null extended error info rather than allowing the exception indicating the underlying broken deployment to propagate.

If I recall correctly, it could be rather risky to move away from using extended error information, as knowing the difference between these cases may be crucial for correctness of some of the concurrency & retry logic.

I'd suggest instead having JobHost check for this condition at startup and provide a good error message - letting the user know that the deployment is broken seems safer than trying to run when some required binaries are missing anyway.

Here's one way (goes through basically the same path as the real bug; a bit ugly due to the dependency on the hard-to-test HttpWebResponse):
```c#
static void FailEarlyIfAzureStorageDeploymentIsBroken()
{
if (IsAzureStorageDeploymentBroken())
{
throw new InvalidOperationException("Microsoft.WindowsAzure.Storage is deployed incorrectly. Are you missing Microsoft.Data.Services.Client.dll or a related binding redirect?");
}
}

static bool IsAzureStorageDeploymentBroken()
{
StorageException exception = StorageException.TranslateException(new WebException(null, null,
WebExceptionStatus.ConnectFailure, new FakeHttpWebResponse()), new RequestResult(),
(_) => new StorageExtendedErrorInformation());
return exception.RequestInformation == null ||
exception.RequestInformation.ExtendedErrorInformation == null;
}

class FakeHttpWebResponse : HttpWebResponse
{
public FakeHttpWebResponse()

pragma warning disable 618

    : base(CreateSerializationInfo(), new StreamingContext())

pragma warning restore 618

{
}

static SerializationInfo CreateSerializationInfo()
{
    SerializationInfo info = new SerializationInfo(typeof(FakeHttpWebResponse), new FormatterConverter());
    info.AddValue("m_HttpResponseHeaders", null);
    info.AddValue("m_Uri", null);
    info.AddValue("m_Certificate", null);
    info.AddValue("m_Version", new Version());
    info.AddValue("m_StatusCode", 0);
    info.AddValue("m_ContentLength", 0);
    info.AddValue("m_Verb", string.Empty);
    info.AddValue("m_StatusDescription", string.Empty);
    info.AddValue("m_MediaType", string.Empty);
    return info;
}

}
```

And here's one more approach - goes through a different path but validates more of the stack with less ugly testing stuff:
```c#
static void FailEarlyIfAzureStorageDeploymentIsBroken()
{
if (IsAzureStorageDeploymentBroken())
{
throw new InvalidOperationException("Microsoft.WindowsAzure.Storage is deployed incorrectly. Are you missing a table assembly (Microsoft.Data.Services.Client Microsoft.Data.OData or Microsoft.Data.Edm) or a related binding redirect?");
}
}

static bool IsAzureStorageDeploymentBroken()
{
try
{
ReferencePossiblyBrokenAssembly();
return false;
}
catch (Exception)
{
return true;
}
}

static void ReferencePossiblyBrokenAssembly()
{

pragma warning disable 618

TableServiceContext ignore = new TableServiceContext(new CloudTableClient(new Uri("http://test.core.windows.net"), null));

pragma warning restore 618

}
```

Thanks for all the details @davidmatson! This is an issue that has been bugging me for some time now, but we never tracked it down. I was just able to repro this simply in a WebJobs SDK console app using your code above. With the Microsoft.Data.Services.Client.dll present, those deployment checks return False (deployment not broken) and the job host starts successfully.

If I delete the Microsoft.Data.Services.Client.dll dll, the checks will return True (deployment is broken), but the job host still runs correctly, showing regular storage operations still work.

Was this page helpful?
0 / 5 - 0 ratings