Google-cloud-go: cloudtasks: transient failures to schedule task on App Engine: "rpc error: code = Unavailable desc = transport is closing"

Created on 12 Nov 2020  路  6Comments  路  Source: googleapis/google-cloud-go

Client

CloudTasks (cloud.google.com/go/cloudtasks/apiv2)

Environment

Standard go111 runtime on App Engine

Go Environment

Go 1.11 (see https://cloud.google.com/appengine/docs/standard/go111/runtime)

Code

https://github.com/web-platform-tests/wpt.fyi/blob/91067d062f8137d541c2ba1fe9d5147f3006b906/shared/appengine.go#L245-L248

Clients.cloudtasks.CreateTask(a.ctx, req)

More background: we create a new client at the startup of the process using the background context and close it right before exit. We then reuse the same client in each request to schedule tasks.

Expected behavior

The task is scheduled, reliably.

Actual behavior

Sometimes (roughly once a day across a few instances) we got rpc error: code = Unavailable desc = transport is closing.

I've chatted with Cloud Tasks team internally; they've confirmed that the task in question never reached Cloud Tasks front-end.

Additional questions

What's the best practice to use the client, specifically on App Engine? Shall we create and reuse a single instance for the whole lifetime of a process (instance)? Or shall we create (and close) a new client for each instance? If the former is recommended, do we need to specify special gRPC options to keep it alive?

cloudtasks question

All 6 comments

Sometimes (roughly once a day across a few instances) we got rpc error: code = Unavailable desc = transport is closing.

These errors can occur for many reasons. Here are a couple of examlpes: grpc-go RPC error
One thing you might try is playing with keep-alive settings and pooling:

    option.WithGRPCDialOption(grpc.WithKeepaliveParams(keepalive.ClientParameters{
        Time: 5 * time.Minute,
    }))

What's the best practice to use the client, specifically on App Engine?

It can depend on the use case, but in general I would recommend reusing a single client. But if there are large periods of time where you are not handling requests this may cause these error to happen more consistently due to the server closing inactive connections. Are you retrying these errors today?

Thanks for the reply!

Our use case is both simple while hard to characterize. It's "simple" as we run on AppEngine standard runtime and schedule a task for each API request, without any special setup. It's hard to describe in more details as we ask AppEngine to automatically manage and scale our instances; we don't know how long each instance is alive, how many requests an instance serves at one time or the gap between two requests on an instance. We do reuse a single client in an instance now. If there are "large periods of time where you are not handling requests", I'd expect the instance to be shut down by AppEngine, but I'm not sure if AppEngine and gRPC agree on the definition of "large period of time".

This was never a concern when we used the legacy API. And we really wish the new CloudTasks API would similarly work out of the box (or at least with a set of parameters recommended on AppEngine standard).

This was never a concern when we used the legacy API.

I am not familiar with the legacy API, but looking very quickly at some snippets it appears to be HTTP based, so you would not run into this type of persistent connection issues that gRPC can bring. In our sister repo we do provide an HTTP based cloudtasks client. Maybe that would offer you a more familiar experience? In general we do recommend the clients in this repository over that one though as the generated code can be more friendly to work with.

looking very quickly at some snippets it appears to be HTTP based

Not sure about that. The legacy AppEngine runtime doesn't actually allow arbitrary HTTP requests, but that's a bit off-topic.

https://pkg.go.dev/google.golang.org/[email protected]/cloudtasks/v2beta3 says "this package is DEPRECATED". And indeed the generated code is less elegant, so I very much want to keep using this client library.

Regarding your earlier suggestion on keep-alive, I'm happy to try it out! One question though, according to the docs:

Make sure these parameters are set in coordination with the keepalive policy on the server, as incompatible settings can result in closing of connection.

So is 5 minutes a good value that's compatible with the CloudTasks server policy?

"this package is DEPRECATED".

That deprecation warning is incorrect, thanks for pointing that out. I will get the docs updated.

So is 5 minutes a good value that's compatible with the CloudTasks server policy?

I am unsure what the Cloud Tasks server policy is, but I think 5 minutes seems reasonable. Might need to tune the value to your use case and how much down time your connection might have between requests. Here is what grpc say the default value are. Also from grpc-go: what will happen.

Closing this issue for now, please report back if this did not work for you.

Was this page helpful?
0 / 5 - 0 ratings