Pytorch-lightning: Dataloader starving the gpu

Created on 23 Mar 2020 · 3Comments · Source: PyTorchLightning/pytorch-lightning

Hey,

Thank you for this amazing library !

I'm using pytorch_lightning to train a segmentation model on 3D images. Augmentation on these images is quite slow, mostly because I do full volume elastic transforms, which takes ~2s per image on a single cpu.

I'm running a large unet with precision 16 and amp_level 'O2', I'm using pytorch 1.4, pytorch_lightning 7.1.

I noticed that these augmentations do lead to an underusage of the GPU which seems to be waiting for the data: not surprising if my augmentation is slow. But in fact, the time it takes to run an epoch is (much) larger than the time to run an epoch without augmentation + the time to do the augmentation.

To measure that, I changed my dataset so that it 'preloads' all the augmentations before the epoch starts i.e. loads in ram memory the tensors ready to be fed to the network, and then just return these tensors in the get_item method of my dataset. In this setting, I'm roughly 50% faster than originally, when doing the augmentation in the get_item method !

Is this expected behaviour ? Are the dataloaders pre-computing the get_item so that the GPU is fed as fast as possible ? More generally, what do you advise, when augmentation is slow, to best feed the gpu ?

_Additional question:_ I couldn't make pytorch_lightning profiler work, I tried profiler=True in the trainer declaration or to set it to the BaseProfiler or to AdvancedProfiler and nothing seems to happen. Is this the right way to use it ?

Best,
Maxime

bug / fix help wanted information needed won't fix

Source

maxime-louis

Most helpful comment

Please provide a minimal working example. You need to show at least how you instantiate the DataLoader, as well as how you process your images. This allows to pinpoint the exact issue.

In general, the DataLoader tries to prefetch several batches, such that in the ideal case when you finished processing a batch of data, a new batch of data is already available for transfer to the GPUs memory. If your data processing logic is slower than feeding a batch to your network, then you will throttle your GPU, since it has to wait for more data. This is often called _starving_.

So, when prefetching all data in one go, of course you do not starve your GPU anymore (at least not as much as before). The solution to this is either to speed up your augmentation code or to simply use more workers (read as: threads/processes) to prefetch your data. See https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader for more details.

Hint: Use num_workers > 0, when using a GPU pin_memory may also be interesting for you.

This is most probably not a Lightning issue.

tstumm on 23 Mar 2020

👍2

All 3 comments

Please provide a minimal working example. You need to show at least how you instantiate the DataLoader, as well as how you process your images. This allows to pinpoint the exact issue.

Hint: Use num_workers > 0, when using a GPU pin_memory may also be interesting for you.

This is most probably not a Lightning issue.

tstumm on 23 Mar 2020

👍2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.