I created the pre-training tf_record from the sample data ("sample_text.txt"). And, I was trying to run run_pretraining to train on that small dataset. And, I got this error,
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: bert/encoder/layer_10/intermediate/dense/truediv = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bert/encoder/layer_10/intermediate/dense/BiasAdd, ConstantFolding/gradients/bert/encoder/layer_0/intermediate/dense/truediv_grad/RealDiv_recip)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: add_1/_4159 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3653_add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
my script to create tf_record data is:
python create_pretraining_data.py --input_file=sample_text.txt --output_file=/tmp/tf.examples.tfrecord --vocab_file=/home/maybe/bert/model/uncased_L-12_H-768_A-12/vocab.txt --do_lower_case=True --max_seq_length=128 --max_predictions_per_seq=20 --masked_lm_prob=0.15 --random_seed=12345 --dupe_factor=5
and script to pre-train on the sample small data is:
python run_pretraining.py --input_file=/tmp/tf.examples.tfrecord --output_dir=/tmp/pretraining_output --do_train=true --do_eval=true --bert_config_file=/home/maybe/bert/model/uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint=/home/maybe/bert/model/uncased_L-12_H-768_A-12/bert_model.ckpt --train_batch_size=32 --max_seq_length=128 --max_predictions_per_seq=20 --num_train_steps=20 --num_warmup_steps=10 --learning_rate=2e-5
I know this error, I usually used to encounter this when i am running other program that is using the same GPU resource, but at this moment nothing is running all the resources are available/free. Or when Tensorflow graph is initiated.
The memory has almost nothing to do with the size of the input file. The example code assumes your GPU has ~12GB of memory. If it has less then you'll need to use a smaller batch size.
Hi @jacobdevlin-google ,
Yea, I do have more than 12GB of GPU. I am using Tesla M60 (8GB) x 4.
Also, I'll try with smaller batch size.
surprisingly, with the small batch size, it works. However, I have a huge GPU in my server.
INFO:tensorflow:Finished evaluation at 2018-11-14-06:07:45
INFO:tensorflow:Saving dict for global step 20: global_step = 20, loss = 0.9312667, masked_lm_accuracy = 0.8223652, masked_lm_loss = 0.9282017, next_sentence_accuracy = 1.0, next_sentence_loss = 0.004190466
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 20: /tmp/pretraining_output/model.ckpt-20
INFO:tensorflow:***** Eval results *****
INFO:tensorflow: global_step = 20
INFO:tensorflow: loss = 0.9312667
INFO:tensorflow: masked_lm_accuracy = 0.8223652
INFO:tensorflow: masked_lm_loss = 0.9282017
INFO:tensorflow: next_sentence_accuracy = 1.0
INFO:tensorflow: next_sentence_loss = 0.004190466
(asr) maybe@maybe1:~/bert$
Each 8GB card is a separate device, right? This code doesn't support MultiGPU so you would have to modify it or look for a fork that does.
yea, its separate GPU card with 8GB each. I'll look for a way to use multiple GPU for this task, I'll fork it if I find a solution soon. I'll close this issue for now.
Thanks @jacob
So Do you find a solution to use the multiple GPU?
Same here, also looking for multiple GPU version
Most helpful comment
Same here, also looking for multiple GPU version