Hi @fmassa
I'm interested in why the training speed of this implementation is much faster than other PyTorch implementation.
Is it because PyTorch 1.0 is faster? Or using torch.distributed.launch is faster than using DataParallel? Or there is any other reason?
Thanks!
Hi @bowenc0221 ,
It's a number of factors, but the main points are:
DistributedDataParallel is faster than DataParallelLet me know if you have further questions.
Most helpful comment
Hi @bowenc0221 ,
It's a number of factors, but the main points are:
DistributedDataParallelis faster thanDataParallelLet me know if you have further questions.