Pytorch-lightning: `transfer_batch_to_device` doesn't work under DP/DDP/DDP2

Created on 24 Jun 2020  路  6Comments  路  Source: PyTorchLightning/pytorch-lightning

馃悰 Bug

This is discussed under #1756 and I'm opening a separate issue here for visibility.

In the training loop, for DP/DDP/DDP2, we do not move the data to devices ourselves, but instead use the default scatter to transfer data. This results in transfer_batch_to_device not being called.

https://github.com/PyTorchLightning/pytorch-lightning/blob/16a7326e5259a3cdd20a508c34a0f84806d88f8e/pytorch_lightning/trainer/training_loop.py#L736-L737

Expected behavior

Ideally, we want transfer_batch_to_device to work in all settings. If it's not possible at all to override this behavior, at least a run-time warning and/or some warning in the doc should be given.

enhancement help wanted

Most helpful comment

Not sure if the earlier label removal counts towards a new "activity" by the stale bot, so commenting here to indicate that this is not stale and still needs to be addressed.

All 6 comments

Ummm... yeah good point. i'm not sure we can add a hook here. Maybe @awaelchli can look into this

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@edenafek @awaelchli did we add a hook for this now?

No it is not there yet. This would require a custom scatter/gather in LightningDataParallel/LightningDistributedDataParallel that the user defines. Here I am not sure what the recommended way is in Lightning. Should the user subclass these classes and init them in configure_ddp hook?

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

Not sure if the earlier label removal counts towards a new "activity" by the stale bot, so commenting here to indicate that this is not stale and still needs to be addressed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

baeseongsu picture baeseongsu  路  3Comments

DavidRuhe picture DavidRuhe  路  3Comments

jcreinhold picture jcreinhold  路  3Comments

awaelchli picture awaelchli  路  3Comments

anthonytec2 picture anthonytec2  路  3Comments