OS type and version: 4.9.125-linuxkit GNU/Linux
Python version and virtual environment information: Python 2.7.16
google-cloud-pubsub package version: 0.41.0
publish(), _before_ the grpc publish call in the batch thread returnsHanging for 10 minutes is surprising behaviour for an asynchronous API.
See this gist for code and instructions on how to reproduce this issue.
This is wild conjecture that I have no supporting evidence for.
That being said, I think the issue starts when the batch thread gets stuck in this call to grpc publish. At this point it is holding onto the lock _state_lock and will continue to hold on to it for 10 minutes until it the call to grpc publish times out.
When the client application calls publish() in the main thread for the second time, it will try to acquire the same lock _state_lock. As this lock is already being held by the batch thread, the main thread hangs and doesn't return from the call to publish().
@asnr Thank you for the effort and the detailed steps to reproduce the issue.
I can confirm that the issue is reproducible, either by using the linked Docker application, or by simply disabling the internet connection and running the test publisher script (without creating the topic and subscription, that is).
The cause of the long delay is that the lock in the underlying batch (an object that batches publish requests) is held for too long. It also turned out that the fix for it is essentially the same as #7686.
I will open a follow-up PR that also includes tests, and mention the creators of the original PR as co-authors.
@Dan4London Just FYI, the pull request for this issue that you reported in the other thread has been created.
Most helpful comment
@Dan4London Just FYI, the pull request for this issue that you reported in the other thread has been created.