Google-cloud-python: RST_STREAM error from grpc via Bidi in Firestore Client Library

Created on 26 Nov 2019  路  9Comments  路  Source: googleapis/google-cloud-python

Implementing a watch/listen in Firestore, waiting a while (some reports of 30, 60, and 120 minutes) results in an unhandled RST_STREAM error. We should investigate if this can be handled.

Related to https://github.com/firebase/firebase-admin-python/issues/282

google-api-core==1.14.3
google-auth==1.7.1
google-cloud-core==1.0.3
google-cloud-firestore==1.6.0
googleapis-common-protos==1.6.0
grpcio==1.25.0

Output and repro code at: https://gist.github.com/crwilcox/0ed944c7afa70a6a78dd8e8e76ebf1fd

bug p1 firestore

All 9 comments

In Go we retry on far more error codes,
https://github.com/googleapis/google-cloud-go/blob/be07a624a2e4501df135ab7f731b7cfb408d9df2/firestore/watch.go#L507

codes.Unknown, codes.DeadlineExceeded, codes.ResourceExhausted, codes.Internal, codes.Unavailable, codes.Unauthenticated

Does retry mean that documents will be re-downloaded/do I get charged for it?

In this case retry/recover has more to do with the channel stopping due to exceeding allowed lifetime. This time seems to be at around an hour.

While I am not certain of this, I think, due to not knowing how long it took to connect and what might have happened between the disconnect/reconnect, we do need to resynchronize the local model so I think we will have to redownload the references to the document references under watch.

I get it. But this doesn't resolve an actual problem with Firestore. Running a thread that will check Watch status and restart it when necessary is not a big deal. Paying the bills for tons of requests that are made because of a bug in the Firestore and additional workload on the own server is.

@linevych the plan isn't to restart the stream so in theory this should be minimum amount of effort needed. I ran it for a couple of days and it didn't need to restream all the data, just notify of no change. So this seems to be what you would desire. I was being cautious to state such a thing without being sure.

Currently the fix in https://github.com/googleapis/google-cloud-python/pull/9995 should resolve this.

https://pypi.org/project/google-cloud-firestore/1.6.1/ is released and contains the change to widen retries.

@crwilcox hi, looks like a restart of the stream does happen, here are the logs and example source code for it: https://gist.github.com/linevych/3480c133d6a839f2b04ee2dd4caa8a93

As you can see from the graph we are still getting billed for this.

https://i.imgur.com/9i4LNrW.png

cc:
@schmidt-sebastian

The Python library should use the resume token from the previous session when it re-establishes the stream. Any further document updates should be merged with the state of the Snapshot at the time of the resume token. If properly implemented, documents that don't change will not be re-read.

Was this page helpful?
0 / 5 - 0 ratings