There are currently inconsistent returns from the method ddp_train of accelerator classes.
For example, the method ddp_train of DDP2Accelerator returns results while ddp_train of DDPSpawnAccelerator returns None. Although I am not familiar with distributed training, it seems that both of the methods should return results (or both should return None) for consistency.
https://github.com/PyTorchLightning/pytorch-lightning/blob/eddf35a379efbd1b8fbedddb0d4b550e8e5ca48d/pytorch_lightning/accelerators/ddp_spawn_accelerator.py#L76
https://github.com/PyTorchLightning/pytorch-lightning/blob/f37444fa3e82011e4a71b5ca8dc897eff9ba0fa3/pytorch_lightning/accelerators/ddp2_accelerator.py#L120
All methods whose names are the same (i.e. ddp_train()) should return the same type of object (e.g. results).
This inconsistency was found while handling #4232.
I think this is also where more typehint coverage + Pyre would detect issues based on static analysis
I鈥檓 not sure this is accurate.
sometimes they can鈥檛 really return something or it doesn鈥檛 make sense to.
ddp spawn can鈥檛 return anything :)
the functions are in a subprocess, and you can鈥檛 share objects between the processes unless you use a queue.
so, if you notice, ddpspawn train does not return anything but i think it tries to use a queue to pull out info and put it back on the main process.
in the example you mentioned, at the end of ddp spawn there is no return BUT there is a transfer state function... this function does return a few things but via the correct way which is a queue.
i suggest to watch the video on how ddp works which will explain how forking into processes happens (ie: mp.spawn, and hence the name), and why the use of spawn introduces limitations and renders returns impossible