Google-cloud-node: Pubsub messages sit in queue until GKE pod with subscriber gets reset

Created on 1 Oct 2017  路  19Comments  路  Source: googleapis/google-cloud-node

Environment details

  • OS: Debian GNU/Linux 8.9 (jessie) [K8s pod based on dockerfile gcr.io/google_appengine/base]
  • Node.js version: 6.11.3
  • npm version: 5.4.2
  • google-cloud/pubsub version: 0.14.2

Steps to reproduce

  1. Spin up nodejs pubsub publisher to topic1 in GKE pod 1
  2. Spin up nodejs pubsub subscriber to subscription to topic1 in GKE pod 2
  3. Publish messages to topic1

I am facing an intermittent issue where pubsub messages are sitting in the queue and not being delivered to the subscriber in GKE pod 2. Only when I delete the GKE pod 2 subscriber and restart the pod does the message get delivered.

pubsub

Most helpful comment

Suffering the same issue too, please provide a fix as these are production used tools.

All 19 comments

We've seen a number of reports of messages not being delivered in k8s, I believe this issue is being investigated internally, although I do not know the status. @lukesneeringer have we heard any news in regards to this?

I'm suffering from the same issue at the moment :(

Suffering the same issue too, please provide a fix as these are production used tools.

@callmehiphop (@lukesneeringer) hey any update? as mentioned these tools (k8s and pubsub) are used in production.

I don't have any official updates, but a new patch release was made this morning that might resolve the issues you're seeing.

@callmehiphop I have done some preliminary testing with the google-cloud/pubsub patch version: 0.14.3 release this morning and it looks promising so far

I have not been able to reproduce the issue yet however will need to run full end to end tests to confirm

@ShahNewazKhan that's great, please keep us posted! 馃槂

@callmehiphop I have been able to replicate the issue with google-cloud/pubsub patch 0.14.3 in a slightly different use case.

Environment details

OS: Debian GNU/Linux 8.9 (jessie) [K8s pod based on dockerfile gcr.io/google_appengine/base]
Node.js version: 6.11.3
npm version: 5.4.2
google-cloud/pubsub version: 0.14.3

Steps to reproduce

  1. Spin up nodejs pubsub publisher to topic1 in GKE pod 1
  2. Spin up nodejs pubsub subscriber to subscription to topic1 in GKE pod 2
  3. Reset GKE pod 1 [pubsub publisher app]
  4. Publish messages to topic1

At this point the message remains stuck in the pubsub queue until I reset the GKE pod 2 [pubsub subscriber app]

Just checking in for updates on this issue.

@ShahNewazKhan We believe this is a GKE issue and because of that I can't comment on if its being worked on and when it will be fixed. I'm really sorry for the inconvenience.

We may be having similar issues, not sure. @ShahNewazKhan what version of GKE are you on?

@ehacke

GKE: 1.6.10-gke.1
Kubernetes: 1.5.6

Question for those who'd reported this: is there any chance you had no messages published or delivered for 10 minutes or longer before you started publishing and accumulating them in the backlog?

@kir-titievsky I can confirm that the published messages sit in the subscription queue only when the publisher has been inactive longer than 10 minutes.

Thanks @ShahNewazKhan . My guess here is this: by default, GCE suspends inactive connections after 10 minutes [1]. Since Pub/Sub relies on a persistent streamingPull connection, this connection would get suspended if no messages flow for 10 minutes. This condition was not properly detected by Pub/Sub. This was fixed as of 2017-10-20 by shutting down affected streamingPull connections. The server-initiated shutdown should now trigger the client library to rebuild the connection.

Can those of you affected check if the issue persists?

[1] https://cloud.google.com/compute/docs/troubleshooting#communicatewithinternet

@kir-titievsky Can you clarify what you mean by 'server-initiated shutdown'. Does this mean that the inactive Pub/Sub streamingPull connections are now being shutdown instead of being suspended by GCE?

I have noticed messages sitting in the queue intermittently still, do I have to update the Pub/Sub client to a latest version to handle the streamingPull connection rebuilds?

Thanks in advance!

I'm marking this as blocked, since it sounds like GKE is the party responsible for any progress on this. @callmehiphop does this sound right?

@stephenplusplus I believe it does!

This issue was moved to googleapis/nodejs-pubsub#11

Was this page helpful?
0 / 5 - 0 ratings