Google-cloud-go: storage: Add exponential backoff for REFUSED_STREAM errors

Created on 20 Jul 2017  路  7Comments  路  Source: googleapis/google-cloud-go

We're currently facing issues with GCS in production returning REFUSED_STREAM errors sporadically.
Would it be feasible to add the already existing exponential backoff logic (on 5xx and 429 errors) to REFUSED_STREAM errors as well?
It's currently in discussion for Go to add official support for this in https://github.com/golang/go/issues/20985, but until then it'd be awesome if the Google Cloud client libraries could already implement this manually.

What do you think? 馃檪

storage p1 bug

Most helpful comment

Go won't have this until at least Go 1.10 so it makes sense to do it in the GCS library temporarily for 12-18 months until most Go users have the automatic version.

All 7 comments

Go won't have this until at least Go 1.10 so it makes sense to do it in the GCS library temporarily for 12-18 months until most Go users have the automatic version.

@jba seems like we could just kill the check for if the error is a *googleapi.Error, no? Why is that there anyway? https://github.com/GoogleCloudPlatform/google-cloud-go/blob/master/storage/invoke.go#L32-L35

@jba Is there a recommended strategy for handling this "issue" with GCS somewhere by the way?
I'm just wondering if pooling with more connections as a prevention might not be the better option than retrying since the latency might be lower that way, right?

@lhecker I was planning on following the same logic that will be in Go 1.10.

@shadams I'd rather we be conservative in what we retry. If the error is permanent the client would hang, which is harder to diagnose than if it fails fast.

@lhecker I actually just added the error to the list we normally retry. Turned out to be the simplest fix.

@jba Thanks for the fix! We'll soon try it out again in production. 馃檪
I do whonder though, if this really fixes the underlying usability problem with GCS... 馃 Could it be it's actually (additionally?) a server side issue, where streams aren't load-balanced properly, for REFUSED_STREAM errors to not even happen? Because I've got the strong suspicion that this error might be HTTP/2-only and not happen with HTTP/1.1 at all, where you "naturally" use multiple TCP connections.

You may be right; I don't know what's happening at the server.

Was this page helpful?
0 / 5 - 0 ratings