Pytorch-lightning: Tidy up returns from `ddp_train()` of accelerators

Created on 20 Oct 2020 · 2Comments · Source: PyTorchLightning/pytorch-lightning

🐛 Bug

There are currently inconsistent returns from the method ddp_train of accelerator classes.

For example, the method ddp_train of DDP2Accelerator returns results while ddp_train of DDPSpawnAccelerator returns None. Although I am not familiar with distributed training, it seems that both of the methods should return results (or both should return None) for consistency.
https://github.com/PyTorchLightning/pytorch-lightning/blob/eddf35a379efbd1b8fbedddb0d4b550e8e5ca48d/pytorch_lightning/accelerators/ddp_spawn_accelerator.py#L76
https://github.com/PyTorchLightning/pytorch-lightning/blob/f37444fa3e82011e4a71b5ca8dc897eff9ba0fa3/pytorch_lightning/accelerators/ddp2_accelerator.py#L120

Expected returns

All methods whose names are the same (i.e. ddp_train()) should return the same type of object (e.g. results).

Additional context

This inconsistency was found while handling #4232.

DDP Priority P0 bug / fix help wanted

Source

akihironitta

👍1

All 2 comments

I think this is also where more typehint coverage + Pyre would detect issues based on static analysis

ananthsub on 24 Oct 2020

I’m not sure this is accurate.

sometimes they can’t really return something or it doesn’t make sense to.

ddp spawn can’t return anything :)
the functions are in a subprocess, and you can’t share objects between the processes unless you use a queue.

so, if you notice, ddpspawn train does not return anything but i think it tries to use a queue to pull out info and put it back on the main process.

in the example you mentioned, at the end of ddp spawn there is no return BUT there is a transfer state function... this function does return a few things but via the correct way which is a queue.

i suggest to watch the video on how ddp works which will explain how forking into processes happens (ie: mp.spawn, and hence the name), and why the use of spawn introduces limitations and renders returns impossible