Bert: Request to add multi GPUs support and plz don`t make TPU evil.

Created on 1 Dec 2018  路  7Comments  路  Source: google-research/bert

Awesome work!

TPU is good, push ppl to use TPU is "Do the right thing" but not "Dont be evil" since its really not cheap.

It`s pitty that if no plan to add multi GPUs support because for now multi GPUs support will definitely benefit much more ppl.

Most helpful comment

You can just apply https://github.com/uber/horovod with minimum code modifications to enjoy multi-mahcine, multi-gpu distributed training.

Thanks, but I didn`t mean that.

Acutally I have run the BERT on multi GPUs based on the repos and got pretty good results on multiple tasks:
https://github.com/CyberZHG/keras-bert
https://github.com/Separius/BERT-keras

with simply add codes "bert_model = multi_gpu_model(bert_model, gpus=GPUS)".
However I think the code from the origin paper should add multi GPUs support to allow more ppl to reproduce the result easier, but not just attempt to push everyone to use TPU.

All 7 comments

You can just apply https://github.com/uber/horovod with minimum code modifications to enjoy multi-mahcine, multi-gpu distributed training.

You can just apply https://github.com/uber/horovod with minimum code modifications to enjoy multi-mahcine, multi-gpu distributed training.

Thanks, but I didn`t mean that.

Acutally I have run the BERT on multi GPUs based on the repos and got pretty good results on multiple tasks:
https://github.com/CyberZHG/keras-bert
https://github.com/Separius/BERT-keras

with simply add codes "bert_model = multi_gpu_model(bert_model, gpus=GPUS)".
However I think the code from the origin paper should add multi GPUs support to allow more ppl to reproduce the result easier, but not just attempt to push everyone to use TPU.

@yyht We do need this.

It will be great if the Google team can add official multi-GPU support to BERT. For now, here is a fork that uses horovod:
https://github.com/lambdal/bert

Maybe you can modify the source code by adding tf.contrib.distribute.MirroredStrategy.

Do these approaches also apply to _prediction_ tasks, e.g., for SQuAD 2? So far I have found that bert grabs both GPUs but only uses one. Horovod does not solve this - it just replicates the same processing on both GPUs.
And also, a large portion of the compute time is spent in cpu-only tasks. Horovod could help this, if the tasks were partitioned and sent to separate cpus/gpu.

@xinsu626 hello锛宑ould you please tell me how to use tf.contrib.distribute.MirroredStrategy for supporting multi-gpu training? I have try to pass arguments to train_distribute and eval_distribute in tf.contrib.tpu.RunConfig, but reports errors.
image

Was this page helpful?
0 / 5 - 0 ratings

Related issues

allenzhang010 picture allenzhang010  路  3Comments

awasthiabhijeet picture awasthiabhijeet  路  3Comments

wangwei7175878 picture wangwei7175878  路  4Comments

sharavsambuu picture sharavsambuu  路  3Comments

santhoshkolloju picture santhoshkolloju  路  3Comments