Google-cloud-python: Pubsub 0.35.3 stop consuming messages - ConsumeBidirectionalStream exited RPC

Created on 6 Jun 2018  Â·  16Comments  Â·  Source: googleapis/google-cloud-python

After an ~hour, the rpc connection is closed:

[2018-06-06 11:02:36,258 - INFO - Thread-14 - streaming_pull_manager] Observed recoverable stream error 504 Deadline Exceeded
[2018-06-06 11:02:36,259 - INFO - Thread-ConsumeBidirectionalStream - streaming_pull_manager] Observed recoverable stream error 504 Deadline Exceeded
[2018-06-06 11:02:36,260 - INFO - Thread-14 - streaming_pull_manager] Observed recoverable stream error 504 Deadline Exceeded
[2018-06-06 11:02:36,260 - **ERROR - Thread-ConsumeBidirectionalStream - bidi] The bidirectional RPC unexpectedly exited. <google.api_core.grpc_helpers._StreamingResponseIterator object at 0x7fb758946090>** 

I need to cancel and start the subscriber again.

OS type and version - "Debian GNU/Linux 8 (jessie)"
Version - Python 2.7.9

  1. pip freeze | grep pubsub
    google-cloud-pubsub==0.35.3

Code example:

        def subscriber_handler(message):
            path = message.attributes.get("type")
            self.handler_function(path=path, body=message.data)  
            message.ack()
        self.listening_subscriber = self.subscriber.subscribe(subscription_path, callback=subscriber_handler, flow_control=flow_control)

All the other info is the same as in https://github.com/GoogleCloudPlatform/google-cloud-python/issues/5325.

bug pubsub p1 investigating

Most helpful comment

I upgraded to 35.4 and i am still having the issue where my subscriber stops pulling from a subscription. My environment is in kubernetes and should be pulling 40-50 messages a second. Looking at the logs i get the following:

Jun 26, 2018, 12:58:14 PM |
INFO:google.cloud.pubsub_v1.subscriber._protocol.leaser:Thread-LeaseMaintainer exiting.
INFO:google.cloud.pubsub_v1.subscriber._protocol.heartbeater:Thread-Heartbeater exiting.

INFO:google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager:Observed recoverable stream error 504 Deadline Exceeded

After this it seems to go into an endless loop of:

INFO:google.cloud.pubsub_v1.subscriber._protocol.bidi:Re-established stream

INFO:google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager:Observed recoverable stream error 504 Deadline Exceeded

Every ten minutes after that it re-establishes the stream then immediately gets the 504 error without pulling any more messages from the subscription. Any ideas?

All 16 comments

Thanks, @shomratalon! Can you provide me a bit more information?

  1. What does your callback do?
  2. How many messages per second are you processing?
  3. How are you authenticating?
  4. What are your flow-control settings?

Okay, I think I may have identified one potential cause of this issue. I've yet to be able to reproduce this so if y'all could try this out and let me know that would be really awesome.

You can install the dev version by either:

pip install --upgrade http://temp.theadora.io/google_cloud_pubsub-0.35.3-py2.py3-none-any.whl

or

git clone https://github.com/GoogleCloudPlatform/google-cloud-python.git
cd google-cloud-python
git checkout pubsub-lock-active
pip install --upgrade .

I would highly recommend doing this in a fresh virtualenv.

I reproduced the bug

DEBUG | 19:16:04 | google.cloud.pubsub_v1.subscriber._protocol.bidi | waiting for recv.
DEBUG | 19:16:04 | google.cloud.pubsub_v1.subscriber._protocol.dispatcher | Handling 14 batched requests
DEBUG | 19:16:04 | google.cloud.pubsub_v1.subscriber._protocol.bidi | recved response.
DEBUG | 19:16:04 | google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager | Scheduling callbacks for 2 messages.
INFO | 19:16:04 | google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager | Observed recoverable stream error 504 Deadline Exceeded
DEBUG | 19:16:05 | google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager | Sent request(s) over unary RPC.
DEBUG | 19:16:05 | google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager | Sent request(s) over unary RPC.
ERROR | 19:16:05 | google.cloud.pubsub_v1.subscriber._protocol.bidi | The bidirectional RPC unexpectedly exited. <google.api_core.grpc_helpers._StreamingResponseIterator object at 0x7fd536b71908>
INFO | 19:16:05 | google.cloud.pubsub_v1.subscriber._protocol.bidi | Thread-ConsumeBidirectionalStream exiting
DEBUG | 19:16:05 | google.cloud.pubsub_v1.subscriber._protocol.bidi | Empty queue and inactive call, exiting request generator.
INFO | 19:16:06 | google.cloud.pubsub_v1.subscriber._protocol.heartbeater | Thread-Heartbeater exiting.
INFO | 19:16:08 | google.cloud.pubsub_v1.subscriber._protocol.leaser | Thread-LeaseMaintainer exiting.

and then it was hanging.

I made sure the @property you added is being reached... :(

Try it once more the the latest copy of that branch/file. I had to switch a Lock with an RLock.

I tried again, I can still reproduce the bug, unfortunately. It looks like with 120 r/s and info logging level I am able to reproduce it quite consistently. But I have to wait each time 10 minutes.

@jam182 is it hanging or is it the "the bidirectional RPC unexpectedly closed"?

Update: I was able to reproduce this myself. Investigating now.

I seem to have fixed this with #5446 as far as I can tell. Stand by for release.

0.35.4 has been released which resolves this as far as I'm able to test. Please let me know if you're able to reproduce this with that version and we'll continue digging.

awesome thanks!

I can confirm the issue is fixed. I repeated the same test that I was using the reproduce the bug before, and it did not happen once.

One thing I am a bit concerned of, is that the subscriber seems a bit slower now. It might be related to these fixes because it was not happening with previous versions (namely with v0.35.2).

I am gonna attach a graph from stackdriver to show what I am saying:
image

  • blue line is published messages
  • red line is unacknowledged messages

In previous tests the red line would never go above the blue line unless there is a tiny spike. Now we see those big anomalies.

Just to give some context:
that is on one subscription. The subscriber, saves messages to file and then sends the ack. It does file rotation every minute. It does not process the message or anything, just receive and save. Also it receives always the same message. So the behaviour should be quite predictable. Anyways, the code has been the same across all tests.

Request per second are still ~120

we will keep testing to see if those are just glitches anyways.

nvm, turns out it was just a glitch. we rebuilt the docker image from scratch and tested for few hours now. All seems fine :+1: :1st_place_medal:

@jam182 Awesome!

@jam182 that's great news! Thank you so much for taking the time to test all of this and thank you so much for your patience and feedback. :)

I upgraded to 35.4 and i am still having the issue where my subscriber stops pulling from a subscription. My environment is in kubernetes and should be pulling 40-50 messages a second. Looking at the logs i get the following:

Jun 26, 2018, 12:58:14 PM |
INFO:google.cloud.pubsub_v1.subscriber._protocol.leaser:Thread-LeaseMaintainer exiting.
INFO:google.cloud.pubsub_v1.subscriber._protocol.heartbeater:Thread-Heartbeater exiting.

INFO:google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager:Observed recoverable stream error 504 Deadline Exceeded

After this it seems to go into an endless loop of:

INFO:google.cloud.pubsub_v1.subscriber._protocol.bidi:Re-established stream

INFO:google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager:Observed recoverable stream error 504 Deadline Exceeded

Every ten minutes after that it re-establishes the stream then immediately gets the 504 error without pulling any more messages from the subscription. Any ideas?

Can you file a new issue? Please include a complete debug-level log that
covers at least 2 minutes before and 1 minute after you see the consumer
stop. We really can't debug without that. The "observed recoverable error"
messages as totally normal, so they don't tell us much.

On Tue, Jun 26, 2018, 4:08 PM seanmt13 notifications@github.com wrote:

I upgraded to 35.4 and i am still having the issue where my subscriber
stops pulling from a subscription. My environment is in kubernetes and
should be pulling 40-50 messages a second. Looking at the logs i get the
following:
Jun 26, 2018, 12:58:14 PM |
INFO:google.cloud.pubsub_v1.subscriber._protocol.leaser:Thread-LeaseMaintainer
exiting.
INFO:google.cloud.pubsub_v1.subscriber._protocol.heartbeater:Thread-Heartbeater
exiting.
INFO:google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager:Observed
recoverable stream error 504 Deadline Exceeded

After this it seems to go into an endless loop of:
INFO:google.cloud.pubsub_v1.subscriber._protocol.bidi:Re-established stream
INFO:google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager:Observed
recoverable stream error 504 Deadline Exceeded

Every hour on the hour after that it re-establishes the stream then
immediately gets the 504 error without pulling any more messages from the
subscription. Any ideas?

—
You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub
https://github.com/GoogleCloudPlatform/google-cloud-python/issues/5444#issuecomment-400491080,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPUc_3lYGFZj4MOr_drGQRx4Zdn8B05ks5uAr76gaJpZM4UceSu
.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tweeter0830 picture tweeter0830  Â·  4Comments

jannaspam picture jannaspam  Â·  3Comments

VikramTiwari picture VikramTiwari  Â·  4Comments

andrewsmartin picture andrewsmartin  Â·  3Comments

pongad picture pongad  Â·  4Comments