Maskrcnn-benchmark: Why training is much faster than other PyTorch implementation?

Created on 13 Dec 2018  ยท  1Comment  ยท  Source: facebookresearch/maskrcnn-benchmark

โ“ Questions and Help

Hi @fmassa
I'm interested in why the training speed of this implementation is much faster than other PyTorch implementation.

Is it because PyTorch 1.0 is faster? Or using torch.distributed.launch is faster than using DataParallel? Or there is any other reason?

Thanks!

question

Most helpful comment

Hi @bowenc0221 ,

It's a number of factors, but the main points are:

  • keep all the computations (or almost all of them) on the GPU. This might not make a difference for single GPU training, but when using multiple GPUs it turns out to make a significant difference. As an example, the prior implementation of grid_anchors was done on the CPU because some micro-benchmarks showed that it was faster than doing it on the GPU. But when scaling up to 8 GPUs, I'd only obtain ~4x speedup, but when I moved those operations to the GPU, I got 8x speedup when training RPN-only models.
  • DistributedDataParallel is faster than DataParallel
  • avoid loops in hot-paths by leveraging tensor operations on expanded dimensions

Let me know if you have further questions.

>All comments

Hi @bowenc0221 ,

It's a number of factors, but the main points are:

  • keep all the computations (or almost all of them) on the GPU. This might not make a difference for single GPU training, but when using multiple GPUs it turns out to make a significant difference. As an example, the prior implementation of grid_anchors was done on the CPU because some micro-benchmarks showed that it was faster than doing it on the GPU. But when scaling up to 8 GPUs, I'd only obtain ~4x speedup, but when I moved those operations to the GPU, I got 8x speedup when training RPN-only models.
  • DistributedDataParallel is faster than DataParallel
  • avoid loops in hot-paths by leveraging tensor operations on expanded dimensions

Let me know if you have further questions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kaaier picture kaaier  ยท  3Comments

zimenglan-sysu-512 picture zimenglan-sysu-512  ยท  3Comments

botcs picture botcs  ยท  3Comments

Nacho114 picture Nacho114  ยท  4Comments

mrteera picture mrteera  ยท  3Comments