This is discussed under #1756 and I'm opening a separate issue here for visibility.
In the training loop, for DP/DDP/DDP2, we do not move the data to devices ourselves, but instead use the default scatter to transfer data. This results in transfer_batch_to_device not being called.
Ideally, we want transfer_batch_to_device to work in all settings. If it's not possible at all to override this behavior, at least a run-time warning and/or some warning in the doc should be given.
Ummm... yeah good point. i'm not sure we can add a hook here. Maybe @awaelchli can look into this
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@edenafek @awaelchli did we add a hook for this now?
No it is not there yet. This would require a custom scatter/gather in LightningDataParallel/LightningDistributedDataParallel that the user defines. Here I am not sure what the recommended way is in Lightning. Should the user subclass these classes and init them in configure_ddp hook?
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
Not sure if the earlier label removal counts towards a new "activity" by the stale bot, so commenting here to indicate that this is not stale and still needs to be addressed.
Most helpful comment
Not sure if the earlier label removal counts towards a new "activity" by the stale bot, so commenting here to indicate that this is not stale and still needs to be addressed.