Bert: How to train models on GPU instead of CPU when TPU is not available?

Created on 7 Nov 2018  路  10Comments  路  Source: google-research/bert

I can run run_pretraining.py, but it is now running on CPU, how can I make it run on GPU? Or was it because the memory of our GPU is not big enough? How to explicitly assign device (CPU/GPU) when TPU is not available?

Most helpful comment

same issue and i fixed it.

add the following line to the beginning of your code:

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

All 10 comments

From this part of the README :

Note: You might see a message Running train on CPU. This really just means that it's running on something other than a Cloud TPU, which includes a GPU.

Thank you for your answer. I checked the GPU running status, and I did see that the program is not running on GPU, I wonder if there is a way to explicitly set the device?

by default, it will train/eval on GPU, maybe your enviroment setting is something confilct.

I had the same issue and upgrading tensorflow-gpu seemed to fix the problem. Possibly a coincidence though.

This uses a standard TF API so any issues are likely to be environmental. Can you run stuff normally on the GPU with this version of TensorFlow? If your data file has zero examples then it will just hang forever and never seem to run, also.

@liweitj47

Use this command to check if you are using CPU.

python -c 'import tensorflow as tf; tf.test.gpu_device_name()'

if it shows like this, you are using CPU

2019-01-11 13:02:22.543745: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

same issue and i fixed it.

add the following line to the beginning of your code:

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

I have the same issue. can't seem to fix it

I had the same issue and upgrading tensorflow-gpu seemed to fix the problem. Possibly a coincidence though.

Thank you, I have tried. You are right!!

I'm running SQUAD in inference mode and I see the log message Running infer on CPU
Does that mean TF has picked the CPU?
For sure, I don't have any GPU visibility issues (training uses GPU).

0807 19:11:23.384163 140445616908032 deprecation_wrapper.py:119] From ../bert/run_eval_squad.py:1213: The name tf.parse_example is deprecated. Please use tf.io.parse_example instead.

I0807 19:11:23.388702 140445616908032 estimator.py:1145] Calling model_fn.
I0807 19:11:23.388859 140445616908032 tpu_estimator.py:2965] Running infer on CPU
W0807 19:11:23.389139 140445616908032 deprecation_wrapper.py:119] From ../bert/run_eval_squad.py:667: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

I0807 19:11:23.389205 140445616908032 run_eval_squad.py:667] *** Features ***
I0807 19:11:23.389310 140445616908032 run_eval_squad.py:669]   name = input_ids, shape = (8, 384)
I0807 19:11:23.389409 140445616908032 run_eval_squad.py:669]   name = input_mask, shape = (8, 384)
I0807 19:11:23.389485 140445616908032 run_eval_squad.py:669]   name = segment_ids, shape = (8, 384)
I0807 19:11:23.389556 140445616908032 run_eval_squad.py:669]   name = unique_ids, shape = (8,)

Try to install tensorflow and tensorflow-gpu with version 1.11.0
The requirements.txt uses

tensorflow >= 1.11.0 # CPU Version of TensorFlow.
tensorflow-gpu >= 1.11.0 # GPU version of TensorFlow.

The latest tensorflow might not suitable for your device / this code(?).

Was this page helpful?
0 / 5 - 0 ratings