Azure-sdk-for-js: MessagingError [OperationTimeoutError]: Unable to create the amqp session due to operation timeout

Created on 11 May 2020  路  8Comments  路  Source: Azure/azure-sdk-for-js

Describe the bug
We have a service where a service bus listener is created to receive the messages from a single dedicated queue. We have different environments running with each different queue listener.
It works perfectly for days but after that, it stops receiving any message from the queue and the messages stay there in the queue unread and unnoticed by anyone until one finds it out and restarts the service.

To Reproduce
Steps to reproduce the behavior:

  1. We did not find a way how to reproduce it quickly, just to start-up services and wait for a few days. After random days it then stops receiving messages.

Expected behavior
We expect it to run without any interruption, we are annoyed to restart the service after every few days from the past 2-3 months.

Screenshots

But in our service logs, as I have added console.log() in errorMessageHandler, we see the error when the service stops receiving messages.
image

Additional context
Here is the code snippet,

  const successMessageHandler = async (message) => {
    // here we are handling the message (operation to perform internally within the service after message is received)
  };

  const errorMessageHandler = (error) => {
    // Handle error
    logger.error('SERVICE BUS : ERROR DURING MESSAGE HANDLING', error);
  };

  receiver.registerMessageHandler(successMessageHandler, errorMessageHandler);

@ramya-rao-a or @chradek Please help me with the same as upgrading the version also didn't help.

Client Service Bus bug customer-reported

All 8 comments

Also, I checked with azure support for the past 1 week and there was no update or unexpected behavior in the provided time.

FYI @ramya-rao-a

Also, I see you have mentioned here regarding creating another receiver in case the link is broken, but I am not sure if this is actually done.

image

Thanks for reporting @kks010

  • Does your error handler log anything else before the OperationTimeoutError?
  • Do you have any estimation on the time difference between the last message that was received and this error being thrown? Perhaps you can check Azure portal to see the time for the last outgoing message
  • Am assuming you have already created a support ticket based on your comment in https://github.com/Azure/azure-sdk-for-js/issues/8835#issuecomment-626471434. The support team should have reached out to you by now with an update. If you can share the ticket number, I'll try and follow up. One of the things the support team can help answer is whether there were service upgrades during this time or if any service side issues occurred during this time

Also, can you check the version of rhea-promise package getting used in your application? This is a package that we depend on in the service bus package. The error you are seeing is originating from this package where it is not able to establish an AMQP session with the service in 60 seconds. Interestingly, it is marked as non retryable error, but it should have been a retryable error.

In case of non retryable errors we do not recover the receiver link.

Hi @ramya-rao-a

Thanks for replying. Here are the answers to your questions:

  1. No, only this error message is logged which I have shared above before it stops receiving service bus messages.
  2. The last message was received at 9th May 2:26 am and the error was thrown at 2020-05-09T02:25:30.595Z
  3. Yes, it didn't help. so I have created another one with high severity.
    here is the ticket number- 120051223001696

The version of rhea-package is as follows:
image

Also, please help with more details on how we can make it a retriable error.

Also, I had a discussion Microsoft support team and found that there was an update from 12:03 am to 5:00 am UTC ON 9th May.

But according to them if the retry logic is written then the connection would have been re-established. But I am not sure if this has to be in the code logic.
As I can see here in this Pull https://github.com/Azure/azure-sdk-for-js/pull/8401

that the problem was resolved with the hanging part.
If not, please help me understand the same.

Also, I would recommend you to provide help in making this retriable.

Thanks for your patience @kks010

We will look into why OperationTimeoutError is not retryable and make it so ASAP.
We will post a comment once we have an update

We have released an update to the @azure/service-bus package with version 1.1.7 where the OperationTimeoutError will be considered as a retryable error.

For more, see the changelog for 1.1.7

And thanks so much for reporting the issue!

Thanks @ramya-rao-a

Was this page helpful?
0 / 5 - 0 ratings