Colabtools: TPU not working with Tensorflow 2.1

Created on 18 Jan 2020  路  2Comments  路  Source: googlecolab/colabtools

Bug report for Colab: http://colab.research.google.com/.

  • Describe the current behavior:

When using a Colab TPU session and trying to initialize a TPU Strategy with the following code:

import tensorflow as tf
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
    print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    strategy = tf.distribute.get_strategy()
print("REPLICAS: ", strategy.num_replicas_in_sync)

There will be the following error:

Running on TPU  ['10.109.245.114:8470']
INFO:tensorflow:Initializing the TPU system: 10.109.245.114:8470

INFO:tensorflow:Initializing the TPU system: 10.109.245.114:8470

INFO:tensorflow:Clearing out eager caches

INFO:tensorflow:Clearing out eager caches

---------------------------------------------------------------------------

NotFoundError                             Traceback (most recent call last)

<ipython-input-3-5c79288551ed> in <module>()
      7 if tpu:
      8     tf.config.experimental_connect_to_cluster(tpu)
----> 9     tf.tpu.experimental.initialize_tpu_system(tpu)
     10     strategy = tf.distribute.experimental.TPUStrategy(tpu)
     11 else:

3 frames

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/tpu/tpu_strategy_util.py in initialize_tpu_system(cluster_resolver)
    101     context.context()._clear_caches()  # pylint: disable=protected-access
    102 
--> 103     serialized_topology = output.numpy()
    104 
    105     # TODO(b/134094971): Remove this when lazy tensor copy in multi-device

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py in numpy(self)
    940     """
    941     # TODO(slebedev): Consider avoiding a copy for non-CPU or remote tensors.
--> 942     maybe_arr = self._numpy()  # pylint: disable=protected-access
    943     return maybe_arr.copy() if isinstance(maybe_arr, np.ndarray) else maybe_arr
    944 

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py in _numpy(self)
    908       return self._numpy_internal()
    909     except core._NotOkStatusException as e:
--> 910       six.raise_from(core._status_to_exception(e.code, e.message), None)
    911 
    912   @property

/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

NotFoundError: '__inference__tpu_init_fn_4' is neither a type of a primitive operation nor a name of a function registered in binary running on n-2221c432-w-0. Make sure the operation or function is registered in the binary running in this process.
  • Describe the expected behavior:
    With tensorflow 1.15 the above code will run. The output there is:
Running on TPU  ['10.26.51.18:8470']
INFO:tensorflow:Initializing the TPU system: 10.26.51.18:8470
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Querying Tensorflow master (grpc://10.26.51.18:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 5688683537495184073)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 3749670591192472159)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 12726202377899630824)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 4112510768860420127)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 4273195466617788134)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 18003206366557860002)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 11510611825613067855)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 11320511437524126117)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 10244199656502490705)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 8589934592, 14173748399582017948)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 11903147722232858508)
REPLICAS:  8
  • The web browser you are using (Chrome, Firefox, Safari, etc.):
    This is not related to the browser. However: Firefox 72.0.1 (64-bit) on Ubuntu

  • Link to self-contained notebook that reproduces this issue
    (click the Share button, then Get Shareable Link):
    Here

There is also an issue about this in the tensorflow repo.

Most helpful comment

Colab has tensorflow 2.1.0 pre-installed. In order to use it, run the magic %tensorflow_version 2.x. When you run this (before importing tensorflow), we not only select 2.x but also do some additional work to configure the cloud TPU to use 2.x. The error you're seeing results from the TPU using 1.15 and your runtime using 2.1.0.

All 2 comments

Colab has tensorflow 2.1.0 pre-installed. In order to use it, run the magic %tensorflow_version 2.x. When you run this (before importing tensorflow), we not only select 2.x but also do some additional work to configure the cloud TPU to use 2.x. The error you're seeing results from the TPU using 1.15 and your runtime using 2.1.0.

I believe I have found a way to fix this issue. See https://github.com/huan/tensorflow-handbook-tpu/issues/1#issuecomment-606189444

Was this page helpful?
0 / 5 - 0 ratings