Google-cloud-python: Service Unavailable (503) gRPC error from datastore occurs every 30 minutes

Created on 21 Dec 2016  路  11Comments  路  Source: googleapis/google-cloud-python

  1. OS type and version
    Running on docker centos image on GCE

  2. Python version and virtual environment information python --version
    Python 3.5.2

  3. google-cloud-python version pip show google-cloud, pip show google-<service> or pip freeze
    google-cloud==0.21.1

  4. Stacktrace if available
    ```INFO:werkzeug:10.36.0.1 - - [21/Dec/2016 05:54:19] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.36.0.1 - - [21/Dec/2016 06:24:20] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.132.0.6 - - [21/Dec/2016 06:54:21] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.132.0.6 - - [21/Dec/2016 07:24:22] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.36.0.1 - - [21/Dec/2016 07:54:23] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.36.0.1 - - [21/Dec/2016 08:24:25] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.132.0.6 - - [21/Dec/2016 08:54:27] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.132.0.6 - - [21/Dec/2016 09:24:28] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.132.0.6 - - [21/Dec/2016 09:54:29] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.36.0.1 - - [21/Dec/2016 10:24:31] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.36.0.1 - - [21/Dec/2016 10:54:33] "GET /endpoint HTTP/1.1" 500 -
    INFO:werkzeug:10.132.0.6 - - [21/Dec/2016 11:24:34] "GET /endpoint HTTP/1.1" 500 -

Stacktrace from one of the errors:

Traceback (most recent call last):
File "/opt/python3/lib/python3.5/site-packages/google/cloud/datastore/_http.py", line 253, in _grpc_catch_rendezvous
yield
File "/opt/python3/lib/python3.5/site-packages/google/cloud/datastore/_http.py", line 321, in run_query
return self._stub.RunQuery(request_pb)
File "/opt/python3/lib/python3.5/site-packages/grpc/_channel.py", line 481, in __call__
return _end_unary_response_blocking(state, False, deadline)
File "/opt/python3/lib/python3.5/site-packages/grpc/_channel.py", line 432, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, {"created":"@1482319474.777747614","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1482319474.777714092","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]})>`

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/code/src/main.py", line 18, in getIPs
black_ips = get_blacklisted_ip_addresses()
File "/code/src/service.py", line 11, in get_blacklisted_ip_addresses
return self.datastore_client.get_all_keys_of_kind(ENTITY_KIND_BLACKLIST)
File "/code/src/datastore_client.py", line 15, in get_all_keys_of_kind
for entity in query_iter:
File "/opt/python3/lib/python3.5/site-packages/google/cloud/iterator.py", line 210, in _items_iter
for page in self._page_iter(increment=False):
File "/opt/python3/lib/python3.5/site-packages/google/cloud/iterator.py", line 239, in _page_iter
page = self._next_page()
File "/opt/python3/lib/python3.5/site-packages/google/cloud/datastore/query.py", line 499, in _next_page
transaction_id=transaction and transaction.id,
File "/opt/python3/lib/python3.5/site-packages/google/cloud/datastore/_http.py", line 574, in run_query
response = self._datastore_api.run_query(project, request)
File "/opt/python3/lib/python3.5/site-packages/google/cloud/datastore/_http.py", line 321, in run_query
return self._stub.RunQuery(request_pb)
File "/opt/python3/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/opt/python3/lib/python3.5/site-packages/google/cloud/datastore/_http.py", line 260, in _grpc_catch_rendezvous
raise error_class(exc.details())
google.cloud.exceptions.ServiceUnavailable: 503 {"created":"@1482319474.777747614","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1482319474.777714092","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]}

5. Steps to reproduce
Occurs regularly every 30 mins.

6. Code example

self.client = datastore.Client(project=project)
query = self.client.query(kind=kind)
query_iter = query.fetch()
```

datastore grpc

Most helpful comment

We are seeing this error quite a bit too, especially since increasing the threads that the worker has to interact with datastore:

image

I'm using exponential backoff, and in this particular case I had overlooked a simple "get" to retrieve one entity that resulted in the exception:

image

I added an additional retry to this function, and I hope this resolves the issue! But +1 from me that it would be great if this retry could be built into the client.

All 11 comments

@quom thanks for the report!

There's a new auth module that is used in google-cloud at the latest version (0.22.0) that may help with this.

Could you update google-cloud and let us know if the issue persists?

Thanks - I've updated the library and so far it looks to have worked.

Reopened - after a few hours of looking stable the exact same issue has reoccured.

Sorry to hear that @quom.
Is this a long lived process that uses query_iter?
If this happens you may have to create a new client or you could try refreshing the credentials.

@dhermes or @jonparrott, I'm guessing the token is expiring? If so, what's the right way to refresh the token? (client.credentials.refresh()?)

I don't see any issues with Bigtable on https://status.cloud.google.com.

@quom if you immediately retry the action, does it work the second time?

from google.cloud.exceptions import ServiceUnavailable

self.client = datastore.Client(project=project)
query = self.client.query(kind=kind)
try:
    query_iter = query.fetch()
except ServiceUnavailable:
    query_iter = query.fetch()

@daspecster re-trying immediately worked for me.

I got this error while trying to update an existing entity.

>>> entity['test'] = 0
>>> 
>>> client.put(entity)
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/google/cloud/datastore/_http.py", line 253, in _grpc_catch_rendezvous
    yield
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/google/cloud/datastore/_http.py", line 356, in commit
    return self._stub.Commit(request_pb)
  File "/usr/local/lib/python3.5/site-packages/grpc/_channel.py", line 481, in __call__
    return _end_unary_response_blocking(state, False, deadline)
  File "/usr/local/lib/python3.5/site-packages/grpc/_channel.py", line 432, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, {"created":"@1482491988.272167000","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1482491988.272010000","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]})>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/google/cloud/datastore/client.py", line 335, in put
    self.put_multi(entities=[entity])
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/google/cloud/datastore/client.py", line 362, in put_multi
    current.commit()
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/google/cloud/datastore/batch.py", line 265, in commit
    self._commit()
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/google/cloud/datastore/batch.py", line 242, in _commit
    self.project, self._commit_request, self._id)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/google/cloud/datastore/_http.py", line 628, in commit
    response = self._datastore_api.commit(project, request)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/google/cloud/datastore/_http.py", line 356, in commit
    return self._stub.Commit(request_pb)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/google/cloud/datastore/_http.py", line 260, in _grpc_catch_rendezvous
    raise error_class(exc.details())
google.cloud.exceptions.ServiceUnavailable: 503 {"created":"@1482491988.272167000","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1482491988.272010000","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]}

@daspecster Yep it does work if you immediately retry but this logic should ideally be in the client library

@quom, I agree. I added this issue to #2694 as another example. I'm not sure when retry logic will get added but there is support to do so.

I'm glad retrying is working for you. It sounds like adding retry logic is all that's left for this issue and since we're tracking that in #2694, is it ok if we close this issue?

Same error seen here, using the library in a django app via mod_wsgi and apache2.

End of the trace I get is here:

File "/usr/lib/python2.7/site-packages/google/cloud/datastore/client.py", line 335, in put
self.put_multi(entities=[entity])
File "/usr/lib/python2.7/site-packages/google/cloud/datastore/client.py", line 362, in put_multi
current.commit()
File "/usr/lib/python2.7/site-packages/google/cloud/datastore/batch.py", line 265, in commit
self._commit()
File "/usr/lib/python2.7/site-packages/google/cloud/datastore/batch.py", line 242, in _commit
self.project, self._commit_request, self._id)
File "/usr/lib/python2.7/site-packages/google/cloud/datastore/_http.py", line 627, in commit
response = self._datastore_api.commit(project, request)
File "/usr/lib/python2.7/site-packages/google/cloud/datastore/_http.py", line 356, in commit
return self._stub.Commit(request_pb)
File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/lib/python2.7/site-packages/google/cloud/datastore/_http.py", line 260, in _grpc_catch_rendezvous
raise error_class(exc.details())
ServiceUnavailable: 503 {"created":"@1484640492.298628143","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@xxxxx.xxx","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]}

I'm seeing the error as well - though I imagine there's nothing this library can do about it other than add retry logic.

@hir3npatel you can handle this by retrying with exponential backoff yourself since the library isn't doing it for us.

We are seeing this error quite a bit too, especially since increasing the threads that the worker has to interact with datastore:

image

I'm using exponential backoff, and in this particular case I had overlooked a simple "get" to retrieve one entity that resulted in the exception:

image

I added an additional retry to this function, and I hope this resolves the issue! But +1 from me that it would be great if this retry could be built into the client.

Was this page helpful?
0 / 5 - 0 ratings