See: https://github.com/Azure/azure-event-hubs-node/issues/30
This issue still exists.
Thanks @jtjk for reporting this issues and testing against the new SDK. We have triaged the issue and will look into it.
@jtjk There have been a lot of changes in the Event Hubs SDK ever since the issue you linked was logged. Can you please provide us some more information that we go with?
@ramya-rao-a
I can confirm that this issue is still present. After a random amount of time (could range from minutes to days, rarely weeks), the hub subscription will permanently stop delivering messages. There are no calls to error callbacks, and nothing gets logged (with default logging options).
Yes, it is consistently reproducible. Given enough time (usually 1-3 days), it will happen.
Dockerfile
FROM node:lts-alpine
package.json
"@azure/event-hubs": "2.1.1",
Code with unnecessary parts (e.g. docs, comments, logging output) removed:
async function subscribe(connectionString, consumerGroup, messageHandler, errorHandler) {
const client = await EventHubClient.createFromIotHubConnectionString(connectionString);
const partitions = await client.getPartitionIds();
partitions.forEach((partition) => client.receive(partition, messageHandler, errorHandler, {
consumerGroup,
eventPosition: EventPosition.fromEnd(),
name: 'REDACTED'
}));
}
Functions given as messageHandler (onEventData) and errorHandler (onErrorDuringReceive):
onEventData(eventData) {
// logger.silly(`Message from IoT Hub: ${JSON.stringify(eventData)}`);
this.onMessage(eventData.body);
}
onMessage(message) {
// examine message data and potentially make a REST call which will callback to handle its results
}
onErrorDuringReceive(error) {
logger.error(`Could not receive REDACTED: ${JSON.stringify(error)}`);
}
Note: the docker container that runs the code briefly outlined above is running on an Azure Kubernetes Service instance, if that is of any importance.
@samerdokas Since this is consistently reproducible for you, can you please enable the logs so that we can get a clearer picture of the sequence of events?
At a minimum, you can set the env variable DEBUG to azure:event-hubs:error
For more verbose logging you can set it to azure:event-hubs*
@ramya-rao-a Confirming that the suggested debugging features were enabled as of today, in the two containers that were most affected by this issue; one had the DEBUG env var set to azure:event-hubs:error, the other to azure:event-hubs*.
I'll post an update when the issue manifests again.
Unsure if it's related, but it seems that we experienced something similar to this in the Event Hub in Azure itself today... After a Spring App restart, the Event hub responded briefly, but then fully stopped all message egress. Only switching to a different event hub made the flow run again. Azure metrics show continual message ingress, but zero egress.
@bhoogter Please log a support ticket with Azure Event Hubs. From their server side logging, they should be able to determine what went wrong if you can provide a timeframe and details of your event hub instance.
We are facing a similar issue with event-processor-host 2.1.0 which uses event-hubs 2.1.1 as a sub dependency.
We were able to reproduce the error with debug logs
event-hubs-error-log.pdf. There we can see that there was a connection problem (OperationTimeoutError) to the event-hub on 11/22/19 17:12:53.619. From this moment on, we had never received a message from partition 0 again. There is no retry attempt in the following days. In addition, the error gets swallowed silently. In particular, we did not receive any error that we could handle in our service. Of course, there was no other consumer inside the same consumer group for the whole time of observation. Only after a manual service restart, connection-2 (partition 0) has connected again and we received all missing events within our event-hub retention time.
My expected behavior would be that the event-hubs lib would retry to establish the amqp connection on a TimeoutError. If it is not able to reconnect after a few attempts, it should throw an error.
Thanks for the logs @moritz-tr
While we look at the logs, please consider using version 2.1.3 of the @azure/event-hubs library.
We have made a few improvements around the error handling scenarios which should help.
Thanks for the feedback @ramya-rao-a . We now installed the latest @azure/event-hubs and wait till the next OperationTimeout Error occurs. We'll keep you posted!
We've released @azure/event-hubs 2.1.4, and @azure/event-processor-host 2.1.1 (which sets a minimum version for event-hubs to be the one above). This update allows the SDK to detect when the connection has gone idle (no data or heartbeat received from service) after 60 seconds so it can attempt to reconnect.
I'm going to close this issue due to the improvements made from versions 2.1.1-2.1.4 of event hubs. If you see any problems please open a new issue.
Thanks for working with Microsoft on GitHub! Tell us how you feel about your experience using the reactions on this comment.
Most helpful comment
@samerdokas Since this is consistently reproducible for you, can you please enable the logs so that we can get a clearer picture of the sequence of events?
At a minimum, you can set the env variable DEBUG to
azure:event-hubs:errorFor more verbose logging you can set it to
azure:event-hubs*