Bert: Request to add multi GPUs support and plz don`t make TPU evil.

Created on 1 Dec 2018 · 7Comments · Source: google-research/bert

Awesome work!

TPU is good, push ppl to use TPU is "Do the right thing" but not "Dont be evil" since its really not cheap.

It`s pitty that if no plan to add multi GPUs support because for now multi GPUs support will definitely benefit much more ppl.

Source

huangxiangzhou

Most helpful comment

You can just apply https://github.com/uber/horovod with minimum code modifications to enjoy multi-mahcine, multi-gpu distributed training.

Thanks, but I didn`t mean that.

Acutally I have run the BERT on multi GPUs based on the repos and got pretty good results on multiple tasks:
https://github.com/CyberZHG/keras-bert
https://github.com/Separius/BERT-keras

with simply add codes "bert_model = multi_gpu_model(bert_model, gpus=GPUS)".
However I think the code from the origin paper should add multi GPUs support to allow more ppl to reproduce the result easier, but not just attempt to push everyone to use TPU.

huangxiangzhou on 2 Dec 2018

👍9

All 7 comments

You can just apply https://github.com/uber/horovod with minimum code modifications to enjoy multi-mahcine, multi-gpu distributed training.

yyht on 2 Dec 2018

You can just apply https://github.com/uber/horovod with minimum code modifications to enjoy multi-mahcine, multi-gpu distributed training.

Thanks, but I didn`t mean that.

Acutally I have run the BERT on multi GPUs based on the repos and got pretty good results on multiple tasks:
https://github.com/CyberZHG/keras-bert
https://github.com/Separius/BERT-keras

huangxiangzhou on 2 Dec 2018

👍9

@yyht We do need this.

zheolong on 21 Jan 2019

It will be great if the Google team can add official multi-GPU support to BERT. For now, here is a fork that uses horovod:
https://github.com/lambdal/bert

chuanli11 on 6 Feb 2019

👍2

Maybe you can modify the source code by adding tf.contrib.distribute.MirroredStrategy.

xinsu626 on 20 Jul 2019

👍1

Do these approaches also apply to _prediction_ tasks, e.g., for SQuAD 2? So far I have found that bert grabs both GPUs but only uses one. Horovod does not solve this - it just replicates the same processing on both GPUs.
And also, a large portion of the compute time is spent in cpu-only tasks. Horovod could help this, if the tasks were partitioned and sent to separate cpus/gpu.

mfeblowitz on 20 Aug 2019

@xinsu626 hello，could you please tell me how to use tf.contrib.distribute.MirroredStrategy for supporting multi-gpu training? I have try to pass arguments to train_distribute and eval_distribute in tf.contrib.tpu.RunConfig, but reports errors.