Pytorch-lightning: multi-gpu ddp calls validation and testing loops too many times

Created on 16 Mar 2020  路  6Comments  路  Source: PyTorchLightning/pytorch-lightning

When using ddp with multiple gpus, each validation and test loop is called with the entire validation dataset for each gpu.

Expected behavior is that the dataset is divided appropriately across the gpus.

I am using current master (cloned Mar 14), Ubuntu 19.10, Cuda 10.1, python 3.7.5, pytorch 1.4, venv environment.

The problem appears to be in auto_add_sampler() in data_loading.py. It does not create a DistributedSampler for validation or test datasets.

bug / fix help wanted

All 6 comments

Latest pull - 1 hour ago, no longer this behavior. Closing.

Sorry - this issue still exists in some configurations. My proposed fix is not the total picture. Still investigating - will provide reproducible example.

Testing underway. Will make PR tomorrow.

Dont want to clutter up PR world if no one is interested in this. Let me know ...

that sounds a good contribution to me... mind send a PR?
Any suggestion @PyTorchLightning/core-contributors?
in a technical note when you refer some master state pls use coit hash as there can be multiple commits each day...

will do on both pr, and hash ref

Was this page helpful?
0 / 5 - 0 ratings