I am facing an intermittent issue where pubsub messages are sitting in the queue and not being delivered to the subscriber in GKE pod 2. Only when I delete the GKE pod 2 subscriber and restart the pod does the message get delivered.
We've seen a number of reports of messages not being delivered in k8s, I believe this issue is being investigated internally, although I do not know the status. @lukesneeringer have we heard any news in regards to this?
I'm suffering from the same issue at the moment :(
Suffering the same issue too, please provide a fix as these are production used tools.
@callmehiphop (@lukesneeringer) hey any update? as mentioned these tools (k8s and pubsub) are used in production.
I don't have any official updates, but a new patch release was made this morning that might resolve the issues you're seeing.
@callmehiphop I have done some preliminary testing with the google-cloud/pubsub patch version: 0.14.3 release this morning and it looks promising so far
I have not been able to reproduce the issue yet however will need to run full end to end tests to confirm
@ShahNewazKhan that's great, please keep us posted! 馃槂
@callmehiphop I have been able to replicate the issue with google-cloud/pubsub patch 0.14.3 in a slightly different use case.
Environment details
OS: Debian GNU/Linux 8.9 (jessie) [K8s pod based on dockerfile gcr.io/google_appengine/base]
Node.js version: 6.11.3
npm version: 5.4.2
google-cloud/pubsub version: 0.14.3
Steps to reproduce
At this point the message remains stuck in the pubsub queue until I reset the GKE pod 2 [pubsub subscriber app]
Just checking in for updates on this issue.
@ShahNewazKhan We believe this is a GKE issue and because of that I can't comment on if its being worked on and when it will be fixed. I'm really sorry for the inconvenience.
We may be having similar issues, not sure. @ShahNewazKhan what version of GKE are you on?
@ehacke
GKE: 1.6.10-gke.1
Kubernetes: 1.5.6
Question for those who'd reported this: is there any chance you had no messages published or delivered for 10 minutes or longer before you started publishing and accumulating them in the backlog?
@kir-titievsky I can confirm that the published messages sit in the subscription queue only when the publisher has been inactive longer than 10 minutes.
Thanks @ShahNewazKhan . My guess here is this: by default, GCE suspends inactive connections after 10 minutes [1]. Since Pub/Sub relies on a persistent streamingPull connection, this connection would get suspended if no messages flow for 10 minutes. This condition was not properly detected by Pub/Sub. This was fixed as of 2017-10-20 by shutting down affected streamingPull connections. The server-initiated shutdown should now trigger the client library to rebuild the connection.
Can those of you affected check if the issue persists?
[1] https://cloud.google.com/compute/docs/troubleshooting#communicatewithinternet
@kir-titievsky Can you clarify what you mean by 'server-initiated shutdown'. Does this mean that the inactive Pub/Sub streamingPull connections are now being shutdown instead of being suspended by GCE?
I have noticed messages sitting in the queue intermittently still, do I have to update the Pub/Sub client to a latest version to handle the streamingPull connection rebuilds?
Thanks in advance!
I'm marking this as blocked, since it sounds like GKE is the party responsible for any progress on this. @callmehiphop does this sound right?
@stephenplusplus I believe it does!
This issue was moved to googleapis/nodejs-pubsub#11
Most helpful comment
Suffering the same issue too, please provide a fix as these are production used tools.