Pytorch-lightning: `transfer_batch_to_device` doesn't work under DP/DDP/DDP2

Created on 24 Jun 2020 · 6Comments · Source: PyTorchLightning/pytorch-lightning

🐛 Bug

This is discussed under #1756 and I'm opening a separate issue here for visibility.

In the training loop, for DP/DDP/DDP2, we do not move the data to devices ourselves, but instead use the default scatter to transfer data. This results in transfer_batch_to_device not being called.

https://github.com/PyTorchLightning/pytorch-lightning/blob/16a7326e5259a3cdd20a508c34a0f84806d88f8e/pytorch_lightning/trainer/training_loop.py#L736-L737

Expected behavior

Ideally, we want transfer_batch_to_device to work in all settings. If it's not possible at all to override this behavior, at least a run-time warning and/or some warning in the doc should be given.

enhancement help wanted

Source

ZhaofengWu

Most helpful comment

Not sure if the earlier label removal counts towards a new "activity" by the stale bot, so commenting here to indicate that this is not stale and still needs to be addressed.

ZhaofengWu on 24 Oct 2020

❤1 👍1

All 6 comments

Ummm... yeah good point. i'm not sure we can add a hook here. Maybe @awaelchli can look into this

williamFalcon on 25 Jun 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 24 Aug 2020

@edenafek @awaelchli did we add a hook for this now?

williamFalcon on 23 Sep 2020

No it is not there yet. This would require a custom scatter/gather in LightningDataParallel/LightningDistributedDataParallel that the user defines. Here I am not sure what the recommended way is in Lightning. Should the user subclass these classes and init them in configure_ddp hook?

awaelchli on 23 Sep 2020

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

stale[bot] on 23 Oct 2020

Not sure if the earlier label removal counts towards a new "activity" by the stale bot, so commenting here to indicate that this is not stale and still needs to be addressed.

ZhaofengWu on 24 Oct 2020

❤1 👍1

Was this page helpful?

0 / 5 - 0 ratings