Some users wish to launch their training program as a module with python -m some.module.py
We should evalute whether this is possible for ddp and support this option when possible.
We need to strip the -m argument and append it to the command with which we launch the child processes.
This feature was orginally reported as a bug: #3600
torchelastic would be a good option here: https://pytorch.org/elastic/0.2.0/quickstart.html
Hey @awaelchli, I would like to take this
that's cool with me! give it a shot!
Hey @awaelchli , sorry for not being clear about this earlier, will this require gpus for testing!
will this require gpus for testing!
the standard ddp with gpus yes. But if you don't have gpus, you can probably implement the feature for the ddp_cpu backend first (this kind of simulates ddp on the cpu).
To add to the discussion: I'm able to use DDP backend when running my training program as a module, eg: python -m seg_lapa.train, so long as I use absolute imports. See #4243.