Azure-sdk-for-net: Can't create session when the connection is closing

Created on 13 Nov 2019  路  34Comments  路  Source: Azure/azure-sdk-for-net

While receiving events using Event Processor Host, from time to time, I'm getting partition receiver exceptions:

System.InvalidOperationException: Can't create session when the connection is closing. at Microsoft.Azure.Amqp.AmqpConnection.AddSession(AmqpSession session, Nullable1 channel) at Microsoft.Azure.Amqp.AmqpCbsLink.OpenCbsRequestResponseLinkAsyncResult.GetAsyncSteps()+MoveNext() --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.Amqp.AsyncResult.EndTAsyncResult at Microsoft.Azure.Amqp.AmqpCbsLink.EndCreateCbsLink(IAsyncResult result) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action1 endAction, Task1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.Amqp.FaultTolerantAmqpObject1.OnCreateAsync(TimeSpan timeout) at Microsoft.Azure.Amqp.Singleton1.GetOrCreateAsync(TimeSpan timeout) at Microsoft.Azure.Amqp.Singleton1.GetOrCreateAsync(TimeSpan timeout) at Microsoft.Azure.Amqp.TaskHelpers.EndAsyncResult(IAsyncResult asyncResult) at Microsoft.Azure.Amqp.IteratorAsyncResult1.StepCallback(IAsyncResult result) --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.Amqp.AsyncResult.EndTAsyncResult at Microsoft.Azure.Amqp.AmqpCbsLink.<>c__DisplayClass4_0.b__1(IAsyncResult a) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action1 endAction, Task1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.CreateLinkAsync(TimeSpan timeout) at Microsoft.Azure.Amqp.FaultTolerantAmqpObject1.OnCreateAsync(TimeSpan timeout) at Microsoft.Azure.Amqp.Singleton1.GetOrCreateAsync(TimeSpan timeout) at Microsoft.Azure.Amqp.Singleton1.GetOrCreateAsync(TimeSpan timeout) at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime) at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime) at Microsoft.Azure.EventHubs.PartitionReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime) at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.ReceivePumpAsync(CancellationToken cancellationToken, Boolean invokeWhenNoEvents)

There is an open issue related to this in azure-amqp sdk github repo https://github.com/Azure/azure-amqp/issues/140), but one of the team members is suggesting that:

This exception is expected when a session is to be created but the connection is closing. Typically the session creation is a result of an API call from the upper SDK and should be handled by the SDK as a communication error. Please report the error to the SDKs you are using so it can be handled correctly by the retry policy in the SDKs.

Client Event Hubs Service Attention bug

All 34 comments

So sorry that I missed to notice this issue. Is it still happening? If so, how often are you seeing the failures? Can you also check your code that you are not unregistering host at some place. Receivers are only closed during unregister call.

I had the same error last night happening in multiple microservices running in k8s.
Two errors to be precise:

fail: ConfirmService[0]
      Message handler encountered an exception.Exception context for troubleshooting:
      - Endpoint: some-app-test.servicebus.windows.net
      - Entity Path: test-confirm
      - Executing Action: Receive

System.InvalidOperationException: Can't create session when the connection is closing.
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass64_0.<<ReceiveAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.MessageReceivePump.<MessagePumpTaskAsync>b__11_0()

And:

fail: CacheInvalidateHostedService[0]
      Message handler encountered an exception.Exception context for troubleshooting:
      - Endpoint: some-app-test.servicebus.windows.net
      - Entity Path: test-cache-invalidate/Subscriptions/CacheInvalidateHostedService
      - Executing Action: Receive

System.ObjectDisposedException: Cannot access a disposed object.
Object name: '$cbs'.
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass64_0.<<ReceiveAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.MessageReceivePump.<MessagePumpTaskAsync>b__11_0()

They happened from 5:00 AM to 7:30 AM.
Been working with the service bus for a few days, so I can't tell if this is happening all of a sudden or will happen every week for example.

This is my class: https://gist.github.com/stefankip/8ba745894018c3d0313ceae3633f8eef

So sorry that I missed to notice this issue. Is it still happening? If so, how often are you seeing the failures? Can you also check your code that you are not unregistering host at some place. Receivers are only closed during unregister call.

No I'm not unregistering a host in any place of the code.
It's happeing very rarely, I haven't seen this error since I reported this issue.

We experience the same issue from time to time.

Microsoft.Azure.EventHubs.ServiceFabricProcessor, Version=0.5.4.0, Microsoft.Azure.EventHubs.ServiceFabricProcessor.ServiceFabricProcessor+<InnerRunAsync>d__32.MoveNext - Can't create session when the connection is closing.

Microsoft.Azure.Amqp, Version=2.4.0.0, Microsoft.Azure.Amqp.AmqpConnection.AddSession - Can't create session when the connection is closing.

[{"parsedStack":[{"assembly":"Microsoft.Azure.EventHubs.ServiceFabricProcessor, Version=0.5.4.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c","method":"Microsoft.Azure.EventHubs.ServiceFabricProcessor.ServiceFabricProcessor+<InnerRunAsync>d__32.MoveNext","level":0,"line":0},{"assembly":"System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":1,"line":0},{"assembly":"System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":2,"line":0},{"assembly":"System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":3,"line":0},{"assembly":"Microsoft.Azure.EventHubs.ServiceFabricProcessor, Version=0.5.4.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c","method":"Microsoft.Azure.EventHubs.ServiceFabricProcessor.ServiceFabricProcessor+<RunAsync>d__31.MoveNext","level":4,"line":0}],"outerId":"0","message":"Can't create session when the connection is closing.","type":"System.InvalidOperationException","id":"49907794"}]

Any updates on this? We also get the same issue in the last days and weeks. It happens (so far I can see it in Application Insights) over night

Any updates on this? We also get the same issue in the last days and weeks. It happens (so far I can see it in Application Insights) over night

Which SDK and version are you using?

I got the same errors in k8s. Microsoft.Azure.ServiceBus 4.1.1 is used in my project. We have this in production. Please help troubleshoot.

System.InvalidOperationException: Can't create session when the connection is closing. at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass64_0.<<ReceiveAsync>b__0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func1 operation, TimeSpan operationTimeout)
at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func1 operation, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.MessageReceivePump.<MessagePumpTaskAsync>b__11_0().

System.ObjectDisposedException: Cannot access a disposed object. Object name: '$cbs'. at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass64_0.<<ReceiveAsync>b__0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func1 operation, TimeSpan operationTimeout)
at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func1 operation, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.MessageReceivePump.<MessagePumpTaskAsync>b__11_0().

It seems like they happened every 18 minutes.

  • 19 Jan 2020 13:38:33.075 System.ObjectDisposedException: Cannot access a disposed object.
  • 19 Jan 2020 13:56:23.724 System.ObjectDisposedException: Cannot access a disposed object.
  • 19 Jan 2020 14:14:14.666 System.ObjectDisposedException: Cannot access a disposed object.
  • 19 Jan 2020 14:39:41.858 System.InvalidOperationException: Can't create session when the connection is closing.
  • 19 Jan 2020 14:41:39.546 System.InvalidOperationException: Can't create session when the connection is closing.
  • 19 Jan 2020 15:09:19.221 System.InvalidOperationException: Can't create session when the connection is closing.
  • 19 Jan 2020 15:09:19.221 System.InvalidOperationException: Can't create session when the connection is closing.
  • 19 Jan 2020 15:09:19.273 System.ObjectDisposedException: Cannot access a disposed object.
  • 19 Jan 2020 15:09:19.952 System.InvalidOperationException: Can't create session when the connection is closing.
  • 19 Jan 2020 15:27:44.498 System.ObjectDisposedException: Cannot access a disposed object.

There are 2 cases where you can observe this failure.

  1. Client is closed while there is an ongoing operation such as send or recieve.
  2. Underlying TCP connection is faulted.

I will talk to the AMQP layer devs to distinguish between those two cases so we can tell which one is actully causing the error.

In the meantime, can you guys make sure option 1 isn't your case? In other words, is client closed while there are runtime operations pending?

Thank you @serkantkaraca for the feedback. I am sure option 1 is not my case.

We observed the same exception on multiple IoTHub connections between 2020-02-20T01:36:54 UTC and 2020-02-20T01:56:55 UTC all instances of IoT Hub run in the EU West region.
Multiple but not all partitions of the same IoT Hub are affected.

  • IoTHub A

    • Partition 0 failed at 1:44:54 UTC

    • Partition 1 failed at 1:56:55 UTC

    • Partition 2 ok

    • Partition 3 ok

  • IoTHub B

    • Partition 0 failed at 1:52:51 UTC

    • Partition 1 failed at 1:40:49 UTC

    • Partition 2 ok

    • Partition 3 failed at 1:48:52 UTC

  • IoTHub C

    • Partition 0 ok

    • Partition 1 failed at 1:44:54 UTC

    • Partition 2 ok

    • Partition 3 failed at 1:36:54 UTC

There was no shutdown request for the services at that time, even so i would expect that if we handle the cancellation token in the EventProcessor class, that the SDK should gracefully handle option 1 mentioned by @serkantkaraca. We only receive events on this way and do not use any other receive or send mechanism in parallel.

Following package versions are used:
Microsoft.Azure.EventHubs.ServiceFabricProcessor 0.5.4
which uses
Microsoft.Azure.EventHubs 4.1.0
were we think the error should be handled.

Exception Stacktrace:

System.InvalidOperationException: Can't create session when the connection is closing.
   at Microsoft.Azure.Amqp.AmqpConnection.AddSession (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at Microsoft.Azure.Amqp.AmqpCbsLink+OpenCbsRequestResponseLinkAsyncResult+<GetAsyncSteps>d__7.MoveNext (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Amqp.AsyncResult.End (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at Microsoft.Azure.Amqp.AmqpCbsLink.EndCreateCbsLink (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1+<OnCreateAsync>d__6.MoveNext (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Amqp.TaskHelpers.EndAsyncResult (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at Microsoft.Azure.Amqp.IteratorAsyncResult`1.StepCallback (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Amqp.AsyncResult.End (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at Microsoft.Azure.Amqp.AmqpCbsLink+<>c__DisplayClass4_0.<SendTokenAsync>b__1 (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<CreateLinkAsync>d__15.MoveNext (Microsoft.Azure.EventHubs, Version=4.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1+<OnCreateAsync>d__6.MoveNext (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<OnReceiveAsync>d__13.MoveNext (Microsoft.Azure.EventHubs, Version=4.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<OnReceiveAsync>d__13.MoveNext (Microsoft.Azure.EventHubs, Version=4.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.EventHubs.PartitionReceiver+<ReceiveAsync>d__30.MoveNext (Microsoft.Azure.EventHubs, Version=4.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<ReceivePumpAsync>d__18.MoveNext (Microsoft.Azure.EventHubs, Version=4.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)

I hope this information helps you solve the issue. We only observed the issue once but it blocked all our production instances as we got no more events from our sensors.

I will improve exception contract to distinguish between case 1 from case 2 as the first step.

any update on this?

The fix didn't make into 4.2.0.

4.2.1 will include it and will be released 3-4 weeks later.

Same problem here in NorthEurope.

Can't create session when the connection is closing at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<OnReceiveAsync>d__86.MoveNext()

on 13th April between 22h15 and 22h50 CET
on 23rd April between 05h20 and 07h25 CET

Microsoft.Azure.ServiceBus 3.3.0
Microsoft.Azure.Management.ServiceBus 2.0.1

Still seeing this from time to time any idea ?

I'm seeing a callstack similar to the original description of this issue.
What seems like is happening in our case is we get a timeout exception which is not unusual since we have some aggressive timeouts, but then the next call on the partition receiver will fail with this error. My hunch is that as the result of the timeout exception, there is some cleanup happening, but it doesn't complete before the exception surfaces to the caller. When the caller then retries, the invalid operation exception is thrown.

MSFT FTE here. Ping me and I can provide more details. I have a fairly consistent repro in our service.

Here is the abridged version of what I see from my logging breakpoints.
We call ReceiveAsync for Partition ID 20 with PartitionReceiver120 but we don't see this call return return. (there would be a log line after the one below saying the call completed).

Receiving messages... ClientId: "PartitionReceiver120(***,$Default,20)" hash: 403058 ThreadId: 26004

Exception is thrown for Partition ID 20 on thread ID 17120

System.TimeoutException: The operation did not complete within the allocated time 00:00:02.7197845 for object receiver691.
   at Microsoft.Azure.Amqp.AsyncResult.End[TAsyncResult](IAsyncResult result)
   at Microsoft.Azure.Amqp.AmqpObject.OpenAsyncResult.End(IAsyncResult result)
   at Microsoft.Azure.Amqp.AmqpObject.EndOpen(IAsyncResult result)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.CreateLinkAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1.OnCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)
   at Microsoft.Azure.EventHubs.PartitionReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)

The next time we call into PartitionReceiver120 ReceiveAsync, we get the InvalidOperationException for PartitionReceiver120.

Receiving messages... ClientId: "PartitionReceiver120(***,$Default,20)" hash: 403058 ThreadId: 21748
System.InvalidOperationException: Can't create session when the connection is closing.
   at Microsoft.Azure.Amqp.AmqpConnection.AddSession(AmqpSession session, Nullable`1 channel)
   at Microsoft.Azure.Amqp.AmqpConnection.CreateSession(AmqpSessionSettings sessionSettings)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.CreateLinkAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1.OnCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)
   at Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)
   at Microsoft.Azure.EventHubs.PartitionReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan waitTime)

We are using Microsoft.Azure.EventHubs v3.0.0.
This seems to happen consistently for any PartitionReceiver that gets a timeout exception.
Hope this helps. Again, feel free to ping me for more details.

Edit: just noticed the note about having an improved exception contract in 4.2.1 I'll see if we can upgrade and treat this as a retryable exception.

Edit2: Looks like upgrading to 4.2.0 has greatly reduced/eliminated the number of hits we are seeing when debugging locally.
@serkantkaraca mind linking the PR for the exception contract change so we can plan on treating this as a retryable exception when 4.2.1 is released. Thanks!

PR under review. Please reactivate if you still hit the issue with 4.3.0 release. https://github.com/Azure/azure-sdk-for-net/pull/14030

@serkantkaraca
I still facing this issue when our API in high load

At the begin I use Microsoft.Azure.EventHubs 3.0.0 and I updated it to latest version 4.3.0

System.InvalidOperationException. Details: Can't create session when the connection is closing..    at Microsoft.Azure.Amqp.AmqpConnection.AddSession(AmqpSession session, Nullable`1 channel)
   at Microsoft.Azure.Amqp.AmqpConnection.CreateSession(AmqpSessionSettings sessionSettings)
   at Microsoft.Azure.EventHubs.Amqp.AmqpEventDataSender.CreateLinkAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1.OnCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.EventHubs.Amqp.AmqpEventDataSender.OnSendAsync(IEnumerable`1 eventDatas, String partitionKey)
   at Microsoft.Azure.EventHubs.Amqp.AmqpEventDataSender.OnSendAsync(IEnumerable`1 eventDatas, String partitionKey)
   at Microsoft.Azure.EventHubs.EventDataSender.SendAsync(IEnumerable`1 eventDatas, String partitionKey)
   at Microsoft.Azure.EventHubs.EventHubClient.SendAsync(IEnumerable`1 eventDatas, String partitionKey)

image

Fix was to handle error cases better when client is closed. If you are still getting this failure with 4.3.0 repeatedly, then apparently underlying network channel is failing. Two questions:

  1. How often do you see the failures with 4.3.0?
  2. Does it recover on its own?

Hello @serkantkaraca

  1. From what I see, the frequent will base on the load in our API. Currently, we have an api that received the request and enqueues it to event hub
  2. I don't think it can recover when this exception throws for that specific event when enqueue. In addition, currently, we only implement retry logic for EventHubsException and this exception will throw with InvalidOperationException so that event will be lost. For us, in this case, it will be a critical issue because we don't want to lose any customer data. That's why I ask should I also implement retry logic for this or should I ignore it when it happens

https://github.com/Azure/azure-sdk-for-net/issues/15514#issuecomment-700409400

You won't lose any data as long as you retry. This should be a transient failure and should recover if retried.

Are you able to build a standalone repro like a console app that you can share? If we can reproduce the failures in a controlled manned, things will get easier to pinpoint the root cause.

As I said currently in our API we only implement retry logic for EventHubsException and not InvalidOperationException
Do you suggest us also retry when that exception InvalidOperationException occurs?

I could try to reproduce it using a console application then but of course, it will require some time to do

try
    {
        await _queueSender.SendAsync(eventData, partitionKey);
    }
    catch (EventHubsException ex)
    {
        if (ex.IsTransient)  // currently we only implement retry for this
        {
            throw ex;
        }
        else
        {
            _logger.LogError($"MessagingException occured but is not transient.{ex.Message}");
            return;
        }
    }
    catch (Exception ex)
    {
        if (ex is TimeoutException || ex is UnauthorizedAccessException)
        {
            throw ex;
        }
        else
        {
            var trace = string.IsNullOrEmpty(ex.StackTrace) ? "No stack trace" : ex.StackTrace;
            _logger.LogError($"Failed to send event due to {ex.GetType()}. Details: {ex.Message}. {trace}");
            return;
        }
    }

@serkantkaraca here is our code path that enqueues events to event hub

From my perspective, it will be better if you could throw EventHubsException rather than InvalidOperationException in this case. Please add more information if I miss-understand about any things

I will find out if that is possible.

Great thanks then I will implement retry for that specific error msg as a workaround for now
Please notify me if there is any new information

@bebeo92 Any updates on the results? How did change work?

@serkantkaraca after retry everything works fine
But again this is not correct base on the MS document
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-messaging-exceptions#exception-types
image

You are right, the client's behavior doesn't match the MS document because in this case InvalidOperationException is client side generated. Service side generated InvalidOperationExceptions are not retry-able. Unfortunately, this exception alone doesn't tell where exception is generated, so this is becoming a challenging issue. I still want to fix the API experience so I will send a PR to convert "cannot create a session" into a retry-able error.

Do you see errors are recovering after retry?

Yes, I can see the error recover after retry

I have sent a PR to convert this exception to retriable error.

Great thanks

Hi, I am having this issue as well, we are using Microsoft.Azure.EventHubs.Processor version 4.3.0 ,but I am only subscribing to the events.
Which version is that PR (https://github.com/Azure/azure-sdk-for-net/pull/15984#issue-503663997) will be available in? Would it affect my scenario?

Fix will ship in 4.3.1 release soon.

Was this page helpful?
0 / 5 - 0 ratings