Google-cloud-go: Proposal: Add Delayed Retry Functionality

Created on 19 Jun 2019  路  7Comments  路  Source: googleapis/google-cloud-go

Proposal

It would be useful if there were a Pubsub-native means of delaying message retry. Currently, the only way to implement something similar is to hold the message outstanding, but this obviously has a negative effect on throughput, etc.

The changes discussed in this issue and introduced from version 0.19.0 were breaking for some users who were not calling Ack or Nack upon return from Receive. Prior to these changes, the message was released from flow control upon callback from Receive (i.e. not counted toward MaxOutstandingMessages/MaxOutstandingBytes). This allowed users to implement a rudimentary, fixed-time retry delay by setting MaxExtension to the desired retry period and returning without an Ack or Nack call.

From v0.19.0 onwards, messages are not released from flow control until Ack, meaning they still count toward the max messages/bytes limit if you attempt the above approach. As far as I can see, this means that users have no option but to disable flow control and implement their own semaphore if they wish to maintain a retry delay.

The proposed new addition would allow such users an alternative, officially-supported means of implementing the same functionality, with more precision/control.

Calling Nack over and over in these cases is not an ideal solution. A good problem statement can be seen in this comment on an issue thread in the Java library

Sample Implementation

I forked the github mirror and implemented a version of the functionality I am looking for as a basis for discussion. The proposed changes can be seen here: https://github.com/alyosha/google-cloud-go/pull/1

Explanation of Proposed Changes

  • Add a new method Delay which can be called on a Message pointer receiver
  • The new method takes an argument of time.Duration, representing the desired period of time to wait before retrying delivery of the message

    • Example: msg.Delay(30*time.Second)

  • Calling Delay simply sets the unexported retryDelay delay field of the Message struct to the desired wait period
  • Upon return from Receive, the value of the message's retryDelay field is checked

    • If non-zero, the message is released from flow control and has its ackDeadline modified based on the provided retryDelay value

    • The retry value is capped at the maxAckDeadline value of 10 minutes, regardless of the provided duration

  • Because the delay duration is simply an argument, each client is free to implement their own backoff logic (with the caveat that it will be capped at 10m max)

    Example use case for this new functionality:

  • A message is received which cannot be processed immediately (e.g. database is down, and may be down for an extended period)

    • This message needs to be retried later, but it may take some time until subsequent attempts can be successfully processed
    • Rather than calling Nack repeatedly and having the message retry over and over, it would be preferable to delay the message retry by a certain period of time and only try again once that specified duration has passed.

Related issue threads

pubsub question

All 7 comments

cc: @jadekler 馃檹

Apologies for the delay! After some discussions with other people on the Pub/Sub team, we aren't looking to add delay functionality in the client libraries as of yet. The reasoning is consistent with the discussions in the issues you linked, but more clearly addressed here and in #1147. Essentially, we want this feature to be native to Pub/Sub and not dependent on the client library. This particular issue has been brought up a number of times, so it's definitely on our radar. We keep track of common feature requests like this so they are properly prioritized.

In the meantime, you could consider offloading the burden of retrying to another service, like Cloud Tasks. Thanks!

@hongalex What about at least adding functionality to release these messages from flow control without explicit Ack? That change was definitely breaking and I still haven't seen a good justification for why it was necessary.

Looking at this thread, it almost seems like it was made based on one strange use case where a user was acking at the beginning of processing.

To be clear, what I'm suggesting is allowing users an option to manually release individual messages from flow control. Maybe a Release method on the Message receiver or something, so that we don't have to implement our own semaphores just to bump the library and maintain functionality

reference: https://github.com/googleapis/google-cloud-go/issues/870#issuecomment-501749395

Will look more into Cloud Tasks possibility though, thanks for your response. If I can ask one more question, why add retry to PubSub if configurable retry is already possible in CloudTasks?

@hongalex Is there somewhere I can subscribe to see updates on this? Is this feature available natively in Pub/Sub nowadays?

@alyosha Cloud Tasks is unsuitable for interactive applications as the delay may be too high - this is even explicitly noted in their docs. Also, Cloud Tasks is not available in a pull configuration - only push (where Tasks HTTP POSTs to your service).

@orishoshan @alyosha Sorry for the lack of a response. We have been working on several other new features, so there hasn't been any updates on this yet. In cases like these, we recommend filing a new issue on our Pub/Sub issue tracker, which also keeps track of vote count so it's easier to justify bumping issues up in priority if a lot of people are asking for it

Was this page helpful?
0 / 5 - 0 ratings