Google-cloud-go: often recieving "rpc error: code = 13 desc = transport is closing" when fetching or putting data in datastore

Created on 5 Jan 2017  路  16Comments  路  Source: googleapis/google-cloud-go

I'm constantly getting "rpc error: code = 13 desc = transport is closing"

It usually happens after my connection is ideal for a few minutes. I even tried running a noop against datastore (delete a key that will never exist every minute) but to no avail.

What is this error (seems Docker related according to Google search)? Why am I getting it?

p2 bug

Most helpful comment

I am seeing this as well against bigtable. I had assumed it would retry as code 13 is listed here:

https://github.com/GoogleCloudPlatform/google-cloud-go/blob/master/bigtable/bigtable.go#L76

13 is codes.Internal

Any reason why this error is propagating up? Do I have to explicitly turn on retries?

Also very minor note, looks like that file I linked to above is not go fmt'ed

All 16 comments

I am seeing this as well against bigtable. I had assumed it would retry as code 13 is listed here:

https://github.com/GoogleCloudPlatform/google-cloud-go/blob/master/bigtable/bigtable.go#L76

13 is codes.Internal

Any reason why this error is propagating up? Do I have to explicitly turn on retries?

Also very minor note, looks like that file I linked to above is not go fmt'ed

Retrying on internal errors was added very recently to Bigtable. Are you using the latest code? Also, what ops are you seeing fail without retries?

Last week we updated our vendoring as we saw the retries were added. We were hoping it would eliminate all our errors as they seemed retryable (we were seeing code 14 as well). It actually eliminated a number of code 14 errors for us but we are still occasionally seeing these code 13 transport is closing errors.

We have seen this on ApplyBulk and ReadRows.

Thanks

Can you provide any of the logging that you're seeing? As of the current bigtable client code, every error 13 and 14 (INTERNAL and UNAVAILABLE) should be retried for ReadRows. ApplyBulk will also be retried unless the mutations have the timestamp set to ServerTime and therefore aren't idempotent. You should see logging like "Retryable error: <error details>". If you see this but the error doesn't bubble up to your application, OR you see some retries but the request deadline is reached before a retry is successful, then from the client's perspective it's working ok. If you see that these errors aren't be retried at all then something strange is going on.

I am seeing this while using the datastore package. If I don't use the opened datastore.Client for a while and send a query after a long time (I haven't measured) being idle, I get this error.

It causes calls to methods like datastore.Get/GetAll to return this error. So if the caller code is not doing retries around these things, I don't think it's being retried in the datastore package.

Our logs are also full of these nowadays, and I'm trying to figure out whether we need to build our own retry or not. Would be nice to know what's the future of these internal, clearly temporal errors - are they retried by Datastore package internally or not?

Edit: I've traced the error to the speech api. I've opened a new issue.

I'm also encountering an rpc error: code = 13 desc = transport is closing error.

Here is some stackdrive log info that might help:

app_engine_release=1.9.48

insertId: "58bd97d8000bb8414ec9ff53" 

requestId: "58bd97d300ff0b4697d8a690700001737e666176656c6166756e000131000100" 

instanceId: "00c61b117c29b9d1145756713c24440d3dd19d089bf89b045e642984b25cd0fca50f7c10118f1430"   
  line: [
   0: {
    time: "2017-03-06T17:09:44.710185Z"     
    severity: "ERROR"     
    logMessage: "rpc error: code = 13 desc = transport is closing"     
   }
  ] 

@GregorioDiStefano @ahmetb @teelahti Are you still experiencing this against datastore?

By the way, the datastore docs specifically say not to retry INTERNAL errors (code 13).

@jba Not anymore it seems, I just see a lot of this now:

2017/06/11 21:00:07 transport: http2Client.notifyError got notified that the client transport was broken EOF.

but not sure if this is related or not.

That error is harmless and either will go away or has already.

Closing due to lack of activity.

@jba I am still seeing this against datastore.

related, but not exactly the same: i'm getting 2-3 such errors per hour

rpc error: code = Internal desc = transport: oauth2/google: incomplete token received from metadata

the line of code is doing client.Get with a datastore.NameKey. vast majority of the same call works, so i don't suppose there's real oauth config issue

A couple of things are clear: 1. these errors aren't coming from the datastore client, and 2. the only remedy that the client could take is to retry. Besides the warning in the docs against retrying INTERNAL errors, there is the issue that adding retry when the error is permanent could result in a hanging RPC if no deadline has been set on the context. Since datastore is heavily used, I don't think it wise to introduce that bad behavior in the hope that it will smooth over some of these rare problems.

I suggest that datastore users experiencing INTERNAL errors on idempotent calls should implement their own retry logic that gives up after a certain amount of time or number of tries.

@jba it seems this error happened to other langs' clients too. https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2039

we are still getting this a few per day with latest grpc and cloud-go. (bigtable)

@uschen Note: if you are still seeing these kinds of errors, we welcome you to file a new issue so that we can triage.

Was this page helpful?
0 / 5 - 0 ratings