Pytorch-lightning: How to load data every epoch

Created on 17 Sep 2019 · 11Comments · Source: PyTorchLightning/pytorch-lightning

hi,
because of my task, i must load new train_data every epoch. But in this package, data can only be loaded once at the beginning of training. How can i load data every epoch?
thanks.

enhancement help wanted

Source

sadbb

Most helpful comment

Thanks for help,
overview_flat
As you can see in your picture, data setup called before training loop. But in my task , like few-shot learning , data must be called between "for epoch" and "for batch" in this picture every new epoch. What can i do for this?
Apologize for my poor English.

sadbb on 18 Sep 2019

👍3

All 11 comments

could you explain more? do you have pseudocode?

williamFalcon on 17 Sep 2019

sadbb on 18 Sep 2019

👍3

I made some adjustments. Can you help me check it?
In decorators.py, i delete 'setattr(self, attr_name, value)' in line 25 like this:

And in trainer.py, i call data again in function '_train' in line 803 like this:

Am I doing this right? Or he has a bad influence elsewhere？
Thank you for help.

sadbb on 19 Sep 2019

What happens if you just don't use the @dataloader decorator? That should prevent the DataLoader from being cached, and you can just recompute it every time the class method is called (which should be once per epoch). Not sure what other effects that would have, though.

neggert on 23 Sep 2019

@neggert that's the way to do it. I'll add this to the docs.

williamFalcon on 1 Oct 2019

@sadbb actually just submitted a PR to enable this.
Once we merge into master, just remove the decorators in the dataloaders. Warning though, your loader will be called every epoch. if any of the loaders are slow to init your script will be very slow.

williamFalcon on 1 Oct 2019

I think it is pretty standard to create a dataloader at the beginning of each epoch. I think it should be the default.

yassersouri on 11 Oct 2019

it is already default. this PR is to support the non-default case

williamFalcon on 11 Oct 2019

One callout: When doing validation, val_dataloader gets called every step. This causes performance problems of you haven't memoized the dataloader.

neggert on 11 Oct 2019

The problem is here, in __evaluation_forward

        if test and len(self.get_test_dataloaders()) > 1:
            args.append(dataloader_idx)

        elif not test and len(self.get_val_dataloaders()) > 1:
            args.append(dataloader_idx)

Honestly, I never really liked passing different args to validation_step depending on the number of dataloaders anyway. Maybe we should think about changing the design slightly here.

neggert on 11 Oct 2019