Using the current codebase from master branch (e1fbb6b), with GRPC, we sometimes (0.5% of requests, approximately) see the following exception:
AbortionError(code=StatusCode.UNAVAILABLE, details="{"created":"@1478255129.468798425","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1478255129.468756939","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]}"))
Retrying this seem to always succeed.
Should application code have to care about this kind of error and retry? Or is this a bug in google-cloud-pubsub code?
Package versions installed:
gapic-google-logging-v2==0.10.1
gapic-google-pubsub-v1==0.10.1
google-api-python-client==1.5.4
google-cloud==0.20.0
google-cloud-bigquery==0.20.0
google-cloud-bigtable==0.20.0
google-cloud-core==0.20.0
google-cloud-datastore==0.20.1
google-cloud-dns==0.20.0
google-cloud-error-reporting==0.20.0
google-cloud-language==0.20.0
google-cloud-logging==0.20.0
google-cloud-monitoring==0.20.0
google-cloud-pubsub==0.20.0
google-cloud-resource-manager==0.20.0
google-cloud-storage==0.20.0
google-cloud-translate==0.20.0
google-cloud-vision==0.20.0
google-gax==0.14.1
googleapis-common-protos==1.3.5
grpc-google-iam-v1==0.10.1
grpc-google-logging-v2==0.10.1
grpc-google-pubsub-v1==0.10.1
grpcio==1.0.0
Note: Everything google-cloud* comes from git master.
This is on Python 2.7.3
Traceback:
File "ospdatasubmit/pubsub.py", line 308, in _flush
publish_response = self.pubsub_client.Publish(publish_request, self._publish_timeout)
File "grpc/beta/_client_adaptations.py", line 305, in __call__
self._request_serializer, self._response_deserializer)
File "grpc/beta/_client_adaptations.py", line 203, in _blocking_unary_unary
raise _abortion_error(rpc_error_call)
@forsberg As you can see from the stack trace, this comes from grpc.beta (the beta interface). We haven't used the beta interface for some time. How are you installing the library?
@nathanielmanistaatgoogle I can consistently reproduce an Unavailable error when a connection goes stale. Is there any way to avoid this, short of retrying on failures?
That library installation seems to have gone horribly wrong. We had an earlier version using the grpc.beta interface, and I guess installing this into the same virtualenv, something went wrong. Will investigate that on Monday.
@forsberg grpc 1.0 still supports the beta interface, but none of google-cloud-python (or its dependencies) use that interface any longer. So it'd be your google-cloud-python install that's b0rked rather than your grpc install.
@nathanielmanistaatgoogle See #2693 and #2699. What is the recommended way to deal with this for stale connections?
Small update: We have fixed our borked google-cloud-python install to actually use e1fbb6bc, but we're still seeing roughly the same number of UNAVAILABLE - retrying always works on first attempt.
@nathanielmanistaatgoogle Bump
@nathanielmanistaatgoogle Bump
/cc @geigerj This is the issue I was referring to about GAPIC retry strategies
@dhermes You can configure this on the GAPIC layer, see comment here for details. It actually looks like we already retry by default on UNAVAILABLE for Pub/Sub Publish, but you can override the default settings to extend the timeout on retry if that's the issue?
@geigerj I believe the correct link to "retry by default on UNAVAILABLE for Pub/Sub Publish" is this one, because the one you've provided no longer works.
The bug affects us too. When the connection is stale, we get exactly the same error as here (grpc._channel._Rendezvous but caused by PubSub publish action). Once we retry, it works.
We're using:
gapic-google-pubsub-v1==0.11.1
google-cloud-core==0.21.0
google-cloud-pubsub==0.21.0
google-gax==0.15.0
googleapis-common-protos==1.5.0
grpc-google-iam-v1==0.11.1
grpc-google-pubsub-v1==0.11.1
What seems strange is that the default retry you mentioned doesn't seem to work. And I have checked that the file publisher_client_config.json is present, with correct values. I get the unhandled exception much sooner than 60s, almost immediately (haven't measured it precisely).
Updated:
It seems that I was wrong about dependency versions, but publisher_client_config.json is the same. I don't think that it will change anything, but I will switch to newest versions and report back.
Actual versions:
gapic-google-pubsub-v1==0.10.1
google-cloud-core==0.21.0
google-cloud-pubsub==0.21.0
google-gax==0.14.1
googleapis-common-protos==1.5.0
grpc-google-iam-v1==0.10.1
grpc-google-pubsub-v1==0.10.1
grpcio==1.0.1
Updated 2:
Newest versions do not fix this issue.
I'm still getting
GaxError(RPC failed, caused by <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, {"created":"@1480777925.720435842","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1480777925.720399286","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]})>)
when using a stale connection. Retrying fixes the issue.
Package versions:
gapic-google-cloud-pubsub-v1==0.14.0
google-cloud-core==0.21.0
git+https://github.com/GoogleCloudPlatform/google-cloud-python.git@a02ac500548cf9fc37f4d81033e696a0efb53f99#egg=google-cloud-pubsub&subdirectory=pubsub
google-gax==0.15.0
googleapis-common-protos==1.5.0
grpc-google-cloud-pubsub-v1==0.14.0
grpc-google-iam-v1==0.11.1
grpcio==1.0.1
(google-cloud-pubsub installed from Git master)
Just wanted to add to the above, that disabling gRPC "fixed" the issue ($ export GOOGLE_CLOUD_DISABLE_GRPC=true) - there's no need for manual retrying and there are no errors with stale connections.
@dhermes: does that code of yours hit a particular host? If so, are you able to reproduce the problem against any other host? If you're able to hit that host with unauthenticated RPCs, are you able to reproduce the defect in the absence of authentication? If you are able observe the traffic at a low level (with Wireshark or something like it) is there anything obviously the matter? Obviously the expected behavior is that if you hold a grpc.Channel and don't use it for a matter of minutes it should still be able to make RPCs. They may take slightly longer if the underlying TCP connection has been taken down, but they shouldn't fail and then immediately succeed when reattempted.
@dhermes: when I run this code of yours I get StatusCode.PERMISSION_DENIED, so there's more to the reproduction of the problem than merely running that, right? Something has to happen server-side? Possibly something else has to happen locally?
I think I'm seeing a similar issue.
My goal is to have a connection to speech.googleapis.com always open so that whenever a user wants to say something, they can enter a 'y' through terminal and then speak instantly. Otherwise, establishing the connection seems to take about 4 seconds on our architecture.
However, it seems that the connection closes after a while. Would this issue be the cause?
I have taken google's streaming python example code and modified it for my purposes.
def main():
# Open channel to Google Speech and keep it open indefinitely.
with cloud_speech.beta_create_Speech_stub(
make_channel('speech.googleapis.com', 443)) as service:
answer = ""
while True:
# If we're not retrying from a failed attempt,
# wait for the user to send 'y' to start recording
if answer != "retry":
answer = raw_input("Do you want to record? y/n: ")
# pass through raw_input block
# in an attempt to retry the streaming
# request.
else:
answer = "y"
if answer == "y":
print("Recieved the Y")
# For streaming audio from the microphone, there are three threads.
# First, a thread that collects audio data as it comes in
with record_audio(RATE, CHUNK) as buffered_audio_data:
# Second, a thread that sends requests with that data
requests = request_stream(buffered_audio_data, RATE)
# Third, a thread that listens for transcription responses
recognize_stream = service.StreamingRecognize(
requests, DEADLINE_SECS)
try:
listen_print_loop(recognize_stream)
recognize_stream.cancel()
except face.CancellationError:
pass
except face.AbortionError:
print("ABORTION ERROR RECEIVED")
answer = "retry"
@dakrawczyk I think you might be running into the 1 minute limit for streaming.
See: https://cloud.google.com/speech/limits#content
You said...
establishing the connection seems to take about 4 seconds on our architecture.
Do you know if that connection overhead is on the google-cloud-python side or has something to do with your architecture?
@daspecster I don't think it's the 1 minute limit for streaming, I do know what you're talking about, but I'm not actually streaming until the user enters 'y' and the record/request/response streams are started and used. My understanding is that I'm only just creating the channel to begin with and that doesn't count against the 1 minute timeout. Also I only end up streaming for about 10 seconds at a time.
I am building an embedded system using a samsung artik710 running debian.
When I run
with cloud_speech.beta_create_Speech_stub(
make_channel('speech.googleapis.com', 443)) as service:
on my MacBook it is basically instant.
When it runs on my embedded architecture it takes about 4 seconds.
Ok, good to know!
I just realized that your code is actually not using this library. You're using the gRPC library directly.
I don't know that this is the best issue to discuss as it might be kind of drawn out.
If you want you can ping me on https://googlecloud-community.slack.com. I've spent some time in Speech so I might be able to help get you going.
@dakrawczyk Hello! Looks like you're using the sample that I wrote, so I'm here to take blame / responsibility ^_^;
From the symptoms you describe (error happens in time span > streaming limits, re-starting the stream a lot, some auth thing mentioned above), my guess is that the access token is expiring.
The make_channel function grabs an access token the first time it's run (ie when it creates the channel), but doesn't refresh it, so if you keep it open for long enough, soon enough the access token's validity period expires. I imagine each time you start a new stream, grpc is re-sending the auth headers, so eventually you'll start getting auth errors.
If that's the case, you might be able to fix this by modifying make_channel so that it calls get_access_token() on the fly, instead of just getting it upon channel creation.
Let me know how that works. I haven't looked at the google-cloud-python code, but perhaps it's a similar issue?
Also - the sample has since been updated to use the google-auth package, which should also fix that issue.
@jerjou Thank you! Trying this out now :]
@jerjou I've updated to the newer sample code that uses the google-auth package.
I am still under the impression that the channel closes after a certain amount of time, probably due to the validity expiring.
Here is the error I receive when trying to send/receive data from the channel after some time and the channel has closed.
Traceback (most recent call last):
File "transcribe_streaming.py", line 323, in <module>
main()
File "transcribe_streaming.py", line 310, in main
listen_print_loop(recognize_stream)
File "transcribe_streaming.py", line 239, in listen_print_loop
for resp in recognize_stream:
File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 344, in next
return self._next()
File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 335, in _next
raise self
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, {"created":"@1482091441.761674000","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1482091441.761631000","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]})>
A couple of questions:
I can't reopen the channel when the user wants to record because it has a few second delay and that's not the experience we're going for.
How do I join https://googlecloud-community.slack.com? Is it a closed group?
@jerjou thanks. I just joined. What is the channel you are in? I could not find a cloud speech channel there. In fact there is one, but it is inactive.
Hey guys, read this thread an still didn't came up a solution..
when trying
GOOGLE_CLOUD_DISABLE_GRPC=true
getting connection timeout because it's trying to localhost:8499
what am I am missing??
@ohadperry, some of the emulators run on 84xx ports. I'm not sure what your system configuration is though.
@daspecster hi. I understand that it supposed to connect to an emulator.
My question is how to get it connect to my project's production pub sub?
I don't care whether it's RPC or https
By default, it shouldn't try to connect to the emulator. My guess is that there's probably something in your configuration/environment that's redirecting it.
You could check if the PUBSUB_EMULATOR_HOST environment variable is set. If so then you'll want to unset it.
@daspecster thanks unset my PUBSUB_EMULATOR_HOST from my environment variables worked!!
Any comments as to what the rpc network shouldn't work? RPC should increase The push / pull rate , shouldn't ? if yes, then i'm really interested in why it's not working for me.
gRPC may increase performance depending on your application.
google-cloud defaults to using gRPC. So unless there was an error during installation, I would guess that it's using PubSub over gRPC already.
```$ pip freeze
grpc-google-cloud-pubsub-v1==0.14.0
grpcio==1.0.4
If you have those two libraries installed then I think you're probably all set.
You can check after you instantiate your `Client` with something like...
```python
client = pubsub.Client()
print(client._use_gax)
If _use_gax is True then the library is using gRPC.
@daspecster , yes I know. I meant i'm getting gRPC StatusCode.UNAVAILABLE errors when using the gRPC here are my pips
grpc-google-iam-v1==0.10.1
grpc-google-pubsub-v1==0.10.1
grpcio==1.0.1
Those are fairly old. You might want to try updating them, but I can't say that updating will solve the UNAVAILABLE issue. Mostly I think you'll want to code in retries on that event.
The exception thrown is fairly internal. Could the cloud-pubsub library catch the error and throw something more descriptive. Also can the retries be documented?
Upgraded my stack to google-cloud-pubsub==0.22.0. Error is still present, traceback/error message is slightly different. Here's a fresh one:
ERROR 2017-02-22 08:16:31,484
Traceback (most recent call last):
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/ospgcptools/pubsub/__init__.py", line 327, in flush
self.pubsub_topic.publish(data)
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/cloud/pubsub/topic.py", line 253, in publish
message_ids = api.topic_publish(self.full_name, [message_data])
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/cloud/pubsub/_gax.py", line 173, in topic_publish
options=options)
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/cloud/gapic/pubsub/v1/publisher_client.py", line 290, in publish
return self._publish(request, options)
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/gax/api_callable.py", line 442, in inner
return api_caller(api_call, this_settings, request)
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/gax/api_callable.py", line 70, in inner
return a_func(request, **kwargs)
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/gax/api_callable.py", line 395, in inner
gax.errors.create_error('RPC failed', cause=exception))
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/gax/api_callable.py", line 391, in inner
return a_func(*args, **kwargs)
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/google/gax/retry.py", line 67, in inner
return a_func(*updated_args, **kwargs)
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/grpc/_channel.py", line 511, in __call__
return _end_unary_response_blocking(state, False, deadline)
File "/opt/ospdatasubmit/virtualenvs/ospsubmit.opera.com/v1/local/lib/python2.7/site-packages/grpc/_channel.py", line 459, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
GaxError: GaxError(RPC failed, caused by <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, {"created":"@1487751391.483882744","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1487751391.483832140","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":166,"referenced_errors":[{"created":"@1487751391.483828339","description":"Socket closed","fd":67,"file":"src/core/lib/iomgr/tcp_posix.c","file_line":249,"target_address":"ipv6:[2a00:1450:400e:807::200a]:443"}]}]})>)
Some package versions:
pip freeze|egrep 'grpc|pubsub|google-cloud-core|grep protobuf'
gapic-google-cloud-pubsub-v1==0.14.1
google-cloud-core==0.22.1
google-cloud-pubsub==0.22.0
grpc-google-cloud-pubsub-v1==0.14.0
grpc-google-iam-v1==0.11.1
grpc-google-pubsub-v1==0.10.1
grpcio==1.1.0
Timestamp in UTC if some googler wants to look on the other side. Let me know if there's something I can add to my logs to aid in debugging.
In most cases, an immediate retry will fix the problem. Sometimes we have to retry 2 or 3 times (we give up after 3 times and drop the message).
Also seeing this issue. Retrying within in our own code seems to workaround the issue. we also only retry a max of 3 times. Usually second try fixes it.
We were on 0.18 and just upped to 0.23.
We run python 3.6
$ pip freeze|egrep 'grpc|pubsub|google-cloud-core|grep protobuf'
gapic-google-cloud-pubsub-v1==0.15.0
google-cloud-core==0.23.1
google-cloud-pubsub==0.23.0
grpc-google-cloud-logging-v2==0.90.0
grpc-google-iam-v1==0.11.1
grpcio==1.1.3
proto-google-cloud-pubsub-v1==0.15.1
Traceback (most recent call last):
File "/opt/app/psutils.py", line 129, in _pub_topic
return topic.publish(bytes(json.dumps(data), 'utf-8'))
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/cloud/pubsub/topic.py", line 255, in publish
message_ids = api.topic_publish(self.full_name, [message_data])
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/cloud/pubsub/_gax.py", line 174, in topic_publish
options=options)
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/cloud/gapic/pubsub/v1/publisher_client.py", line 320, in publish
return self._publish(request, options)
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 419, in inner
return api_caller(api_call, this_settings, request)
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 67, in inner
return a_func(request, **kwargs)
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 372, in inner
gax.errors.create_error('RPC failed', cause=exception))
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/future/utils/__init__.py", line 419, in raise_with_traceback
raise exc.with_traceback(traceback)
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/gax/api_callable.py", line 368, in inner
return a_func(*args, **kwargs)
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/google/gax/retry.py", line 68, in inner
return a_func(*updated_args, **kwargs)
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/grpc/_channel.py", line 507, in __call__
return _end_unary_response_blocking(state, False, deadline)
File "/home/pythonapp/.pyenv/versions/3.6.0/envs/venv/lib/python3.6/site-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
google.gax.errors.GaxError: GaxError(RPC failed, caused by <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, {"created":"@1488959268.808794184","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1488959268.808696874","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":166,"referenced_errors":[{"created":"@1488959268.808693234","description":"Socket closed","fd":55,"file":"src/core/lib/iomgr/tcp_posix.c","file_line":249,"target_address":"ipv4:173.194.74.95:443"}]}]})>)
I really think my problem is related to this, we have a node.js connecting to a python server using gRpc and we frequently receive this:
Critical: gRPC server raised an error.
Error: {"created":"@1489090385.752311821","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1489090385.752305292","description":"TCP Read failed","file":"../src/core/lib/iomgr/tcp_uv.c","file_line":170}]} { Error: {"created":"@1489090385.752311821","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1489090385.752305292","description":"TCP Read failed","file":"../src/core/lib/iomgr/tcp_uv.c","file_line":170}]}
at /app/node_modules/grpc/src/node/src/client.js:434:17
cause:
{ Error: {"created":"@1489090385.752311821","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1489090385.752305292","description":"TCP Read failed","file":"../src/core/lib/iomgr/tcp_uv.c","file_line":170}]}
at /app/node_modules/grpc/src/node/src/client.js:434:17 code: 14, metadata: Metadata { _internal_repr: {} } },
isOperational: true,
code: 14,
metadata: Metadata { _internal_repr: {} } }
Sometimes, the same request on the same server works without any problem.
@barroca This looks like a Node.js failure?
might be, we are creating a client in other language to isolate it.
I've just needed to share my frustration and see if someone had the same problem. Anyway, it's very strange since it is an intermittent error with no pattern, it sometimes happens between seconds, sometimes between minutes. :(
@barroca I had the same problem. In my case, If my Node.js run for a while without any request, this error will occur and making request again will get normal response.
Really need help
{ Error: {"created":"@1490271131.819044969","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1490271131.819031343","description":"Socket closed","fd":16,"file":"../src/core/lib/iomgr/tcp_posix.c","file_line":249,"target_address":"ipv4:172.16.250.137:8980"}]}
2017-03-23T12:12:11.842630309Z at /usr/local/wongnai/node_modules/grpc/src/node/src/client.js:434:17
2017-03-23T12:12:11.842638153Z cause:
2017-03-23T12:12:11.842643446Z { Error: {"created":"@1490271131.819044969","description":"Endpoint read failed","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1851,"grpc_status":14,"occurred_during_write":0,"referenced_errors":[{"created":"@1490271131.819031343","description":"Socket closed","fd":16,"file":"../src/core/lib/iomgr/tcp_posix.c","file_line":249,"target_address":"ipv4:172.16.250.137:8980"}]}
2017-03-23T12:12:11.842652712Z at /usr/local/wongnai/node_modules/grpc/src/node/src/client.js:434:17 code: 14, metadata: Metadata { _internal_repr: {} } },
isOperational: true,
code: 14,
metadata: Metadata { _internal_repr: {} } }
I confirm the intermittent errors when working with bigtable API

python 3.5.2
google-cloud==0.23.0
(storm) ➜ storm git:(develop) ✗ vi requirements/base.txt
(storm) ➜ storm git:(develop) ✗ pip freeze G google
21:gapic-google-cloud-datastore-v1==0.15.3
22:gapic-google-cloud-error-reporting-v1beta1==0.15.3
23:gapic-google-cloud-logging-v2==0.91.3
24:gapic-google-cloud-pubsub-v1==0.15.3
25:gapic-google-cloud-spanner-admin-database-v1==0.15.3
26:gapic-google-cloud-spanner-admin-instance-v1==0.15.3
27:gapic-google-cloud-spanner-v1==0.15.3
28:gapic-google-cloud-speech-v1beta1==0.15.3
29:gapic-google-cloud-vision-v1==0.90.3
31:google-auth==0.10.0
32:google-auth-httplib2==0.0.2
33:google-cloud==0.23.0
34:google-cloud-bigquery==0.23.0
35:google-cloud-bigtable==0.23.1
36:google-cloud-core==0.23.1
37:google-cloud-datastore==0.23.0
38:google-cloud-dns==0.23.0
39:google-cloud-error-reporting==0.23.2
40:google-cloud-language==0.23.1
41:google-cloud-logging==0.23.1
42:google-cloud-monitoring==0.23.0
43:google-cloud-pubsub==0.23.0
44:google-cloud-resource-manager==0.23.0
45:google-cloud-runtimeconfig==0.23.0
46:google-cloud-spanner==0.23.1
47:google-cloud-speech==0.23.0
48:google-cloud-storage==0.23.1
49:google-cloud-translate==0.23.0
50:google-cloud-vision==0.23.3
51:google-gax==0.15.8
52:googleapis-common-protos==1.5.2
55:grpc-google-iam-v1==0.11.1
80:proto-google-cloud-datastore-v1==0.90.3
81:proto-google-cloud-error-reporting-v1beta1==0.15.3
82:proto-google-cloud-logging-v2==0.91.3
83:proto-google-cloud-pubsub-v1==0.15.3
84:proto-google-cloud-spanner-admin-database-v1==0.15.3
85:proto-google-cloud-spanner-admin-instance-v1==0.15.3
86:proto-google-cloud-spanner-v1==0.15.3
87:proto-google-cloud-speech-v1beta1==0.15.3
88:proto-google-cloud-vision-v1==0.90.3
(storm) ➜ storm git:(develop) ✗ pip freeze G grpc
55:grpc-google-iam-v1==0.11.1
56:grpcio==1.2.1
(storm) ➜ storm git:(develop) ✗
@nathanielmanistaatgoogle Can you weigh in / can we all have a pow-wow about this?
I have same error with:
Client:
Server:
Client error message:
File "/home/project/venv/lib/python3.6/site-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, Endpoint read failed)>
I updated to [email protected] and I'm waiting to see if the error occurs again.
Notes: I'm not sure if Endpoint read failed is same as Secure read failed. I think I had both error from time to time.
@dhermes: my apologies for the silence; this is now properly being recognized as a problem in gRPC Core affecting all Core-using languages and will be fixed in Core. In the meantime I don't know that the options are any good: if your RPC is idempotent you can make application-level retries until it succeeds, but if it isn't idempotent you may have to awkwardly work around the issue in a problem-specific way.
So... yay that it's now properly being recognized as a defect. Boo to the rest. Yes to a sync if you'd still like one.
Thanks @nathanielmanistaatgoogle, is there a tracking issue somewhere?
@dhermes: there is now; please add any details (pretty please a deterministic reproduction?).
@nathanielmanistaatgoogle I already gave a deterministic reproduction. I am happy to chat with you off the thread about how to set up the credentials needed for this or we could work together (I'll need your expertise) to create a gRPC service that doesn't require auth to accomplish the same goal.
@lukesneeringer @dhermes The issue that @nathanielmanistaatgoogle referenced (https://github.com/grpc/grpc/issues/11043) was fixed on June 8. Is this still an issue?
@bjwatson Checking right now
The example still fails:
Listing all instances:
Traceback (most recent call last):
File "bt_unavailable.py", line 30, in <module>
main()
File "bt_unavailable.py", line 26, in main
list_em(client)
File "bt_unavailable.py", line 8, in list_em
instances, failed_locations = client.list_instances()
File ".../google/cloud/bigtable/client.py", line 375, in list_instances
response = self._instance_stub.ListInstances(request_pb)
File ".../grpc/_channel.py", line 507, in __call__
return _end_unary_response_blocking(state, call, False, deadline)
File ".../grpc/_channel.py", line 455, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, Endpoint read failed)>
This fails in Python 2.7 with grpcio==1.4.0
@nathanielmanistaatgoogle, looks like the gRPC fix was insufficient for this issue. Do you have any insight into what else might be going wrong?
FYI @lukesneeringer
Hello! :-)
First, a _mea culpa_; I have not done as good of a job at keeping up with issues as I should have. If you are getting this (I admit it) cut and paste, it is likely because your issue sat for too long.
In this case, I have been in the process of making a radical update to the PubSub library (#3637) to add significant performance improvements and a new surface, which we hope to launch soon. As such, I am clearing out issues on the old library. It is my sincere goal to do a better job of being on top of issues in the future.
As the preceding paragraph implies, I am closing this issue. If the revamped library does not solve your issue, however, please feel free to reopen.
Thanks!
@lukesneeringer from my recollection, this wasn't Pub/Sub-specific?
That's correct, I even reproduced it with bigtable 12 days ago.
Yeah, I was firing through everything with an api: pubsub label. Thanks for reopening.
I am removing all the "api: X" labels from this issue since issue automation is coming. The grpc label is the appropriate tracking.
Although really this should just be moved to grpc.
Reproduced the Bigtable issue with google-cloud-bigtable==0.26.0 during:
def write_row(key, column_id):
"""
Utility method for writing a row to BigTable.
Note that we don't actually store values - the column ids are where we actually store values,
so the value is always just an empty string.
:param key: Key of the row to write
:param column_id: This is the actual value we want stored.
:return: None
"""
row = db.table.row(key)
row.set_cell(conf.column_family_id, column_id, '')
row.commit()
The error looks like:
_Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, Endpoint read failed)>
at _end_unary_response_blocking (/env/local/lib/python2.7/site-packages/grpc/_channel.py:455)
at __call__ (/env/local/lib/python2.7/site-packages/grpc/_channel.py:507)
at commit (/env/local/lib/python2.7/site-packages/google/cloud/bigtable/row.py:417)
Hi guys,
I'm also facing this issue but with PubSub subscription.
We've started a subscription for one topic yesterday and all is working fine.
However, today we saw this exception on the console.
I'm using the following packages:
google-auth==1.1.1
google-cloud-bigquery==0.27.0
google-cloud-core==0.27.1
google-cloud-logging==1.3.0
google-cloud-pubsub==0.28.3
google-cloud-storage==1.4.0
google-gax==0.15.15
google-resumable-media==0.2.3
googleapis-common-protos==1.5.3
grpc-google-iam-v1==0.11.3
proto-google-cloud-logging-v2==0.91.3
and I'm running on a Linux Machine (Ubuntu 16.04.3 LTS)
Exception:
Exception in thread Consumer helper: consume bidirectional stream:
Traceback (most recent call last):
File "/home/user/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/user/anaconda3/lib/python3.6/threading.py", line 864, in run
self._target(self._args, *self._kwargs)
File "/home/user/anaconda3/lib/python3.6/site-packages/google/cloud/pubsub_v1/subscriber/_consumer.py", line 248, in _blocking_consume
self._policy.on_exception(exc)
File "/home/user/anaconda3/lib/python3.6/site-packages/google/cloud/pubsub_v1/subscriber/policy/thread.py", line 135, in on_exception
raise exception
File "/home/user/anaconda3/lib/python3.6/site-packages/google/cloud/pubsub_v1/subscriber/_consumer.py", line 234, in _blocking_consume
for response in response_generator:
File "/home/user/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 348, in __next__
return self._next()
File "/home/user/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 342, in _next
raise self
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, OS Error)>
I just got this error while running a job on Google ML Engine.
Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", line 198, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", line 194, in main worker_job_name, is_chief, FLAGS.train_dir) File "/root/.local/lib/python2.7/site-packages/object_detection/trainer.py", line 296, in train saver=saver) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 793, in train train_step_kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 546, in train_step if sess.run(train_step_kwargs['should_log']): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call raise type(e)(node_def, op, message) UnavailableError: {"created":"@1507348685.699643392","description":"EOF","file":"external/grpc/src/core/lib/iomgr/tcp_posix.c","file_line":235,"grpc_status":14}
Honestly not sure how to solve other than retry the job.
Edit:
And rerunning does not seem to help. Keep getting the same error.
Hi, i'm getting this error on PubSub consumer. I manage to get a "not so pretty" workaround.
using a policy like this that replicates code for deadline_exceeded on google.cloud.pubsub_v1.subscriber.policy.thread.Policy.on_exception.
from google.cloud.pubsub_v1.subscriber.policy.thread import Policy
import grpc
class UnavailableHackPolicy(Policy):
def on_exception(self, exception):
"""
There is issue on grpc channel that launch an UNAVAILABLE exception now and then. Until
that issue is fixed we need to protect our consumer thread from broke.
https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2683
"""
unavailable = grpc.StatusCode.UNAVAILABLE
if getattr(exception, 'code', lambda: None)() in [unavailable]:
print("¡OrbitalHack! - {}".format(exception))
return
return super(UnavailableHackPolicy, self).on_exception(exception)
On receive message function i have a code like
subscriber = pubsub.SubscriberClient(policy_class=UnavailableHackPolicy)
subscription_path = subscriber.subscription_path(project, subscription_name)
subscriber.subscribe(subscription_path, callback=callback, flow_control=flow_control)
Problem is that when the resource it is trully UNAVAILABLE we will be not aware.
UPDATE: As noted here by @makrusak and here by @rclough. This hack cause high CPU usage leaving your consumer practically useless (available intermittently). So basically this changes one problem for another, your consumer does not die, but you will have to restart the worker that executes it often.
I might be getting a similar problem on spanner trying to read ranges with index. I will need to test if it's my code or not.
I think with all the work that @dhermes did on pubsub this should be resolved. I'm going to go ahead and close this, but if it's still reproducible with the latest version we can re-open.
Most helpful comment
Hi, i'm getting this error on PubSub consumer. I manage to get a "not so pretty" workaround.
using a policy like this that replicates code for deadline_exceeded on google.cloud.pubsub_v1.subscriber.policy.thread.Policy.on_exception.
On receive message function i have a code like
Problem is that when the resource it is trully UNAVAILABLE we will be not aware.
UPDATE: As noted here by @makrusak and here by @rclough. This hack cause high CPU usage leaving your consumer practically useless (available intermittently). So basically this changes one problem for another, your consumer does not die, but you will have to restart the worker that executes it often.