Bert: Colab TPU Timeoute Error on Google Colab

Created on 24 Mar 2019  路  1Comment  路  Source: google-research/bert

Hi,

I'm trying to run BERT on my own corpus using the instructions provided on the readme.

https://colab.research.google.com/drive/1A-ettwJ6YYnkWPY9Dg0YLq_iCKqaDVOS

I seem to be getting a URLLib timeout error when it's trying to connect to the TPU instance for the tpu_cluster_resolver line in the module. However, If I run this outside of the module, this works fine.

Any ideas, something that I'm missing?

`Traceback (most recent call last):
File "/usr/lib/python3.6/urllib/request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/usr/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/usr/lib/python3.6/http/client.py", line 936, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/usr/lib/python3.6/socket.py", line 724, in create_connection
raise err
File "/usr/lib/python3.6/socket.py", line 713, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./bert/run_pretraining.py", line 493, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "./bert/run_pretraining.py", line 427, in main
FLAGS.tpu_name, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/cluster_resolver/tpu_cluster_resolver.py", line 290, in __init__
self._requestComputeMetadata('project/project-id'))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/cluster_resolver/tpu_cluster_resolver.py", line 129, in _requestComputeMetadata
resp = urlopen(req)
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 526, in open
response = self._open(req, data)
File "/usr/lib/python3.6/urllib/request.py", line 544, in _open
'_open', req)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 1346, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.6/urllib/request.py", line 1320, in do_open
raise URLError(err)
urllib.error.URLError: `

Most helpful comment

@kayvane hope that it is solved by now, but posting just in case for the future searches landing up here for solution for self._requestComputeMetadata('project/project-id')) exception on Colab.

This issue is not bert-specific but seems like a general case in TF when a wrong TPU name is passed in to the tpu_cluster_resolver as an argument.

This should be easy to fix by making sure one passes in a string that looks something like grpc://10.70.83.90:8470 where the exact address on Colab acquired from env var ${TPU_NAME}.

>All comments

@kayvane hope that it is solved by now, but posting just in case for the future searches landing up here for solution for self._requestComputeMetadata('project/project-id')) exception on Colab.

This issue is not bert-specific but seems like a general case in TF when a wrong TPU name is passed in to the tpu_cluster_resolver as an argument.

This should be easy to fix by making sure one passes in a string that looks something like grpc://10.70.83.90:8470 where the exact address on Colab acquired from env var ${TPU_NAME}.

Was this page helpful?
0 / 5 - 0 ratings