Pytorch-lightning: How to load data every epoch

Created on 17 Sep 2019  Â·  11Comments  Â·  Source: PyTorchLightning/pytorch-lightning

hi,
because of my task, i must load new train_data every epoch. But in this package, data can only be loaded once at the beginning of training. How can i load data every epoch?
thanks.

enhancement help wanted

Most helpful comment

Thanks for help,
overview_flat
As you can see in your picture, data setup called before training loop. But in my task , like few-shot learning , data must be called between "for epoch" and "for batch" in this picture every new epoch. What can i do for this?
Apologize for my poor English.

All 11 comments

could you explain more? do you have pseudocode?

Thanks for help,
overview_flat
As you can see in your picture, data setup called before training loop. But in my task , like few-shot learning , data must be called between "for epoch" and "for batch" in this picture every new epoch. What can i do for this?
Apologize for my poor English.

I made some adjustments. Can you help me check it?
In decorators.py, i delete 'setattr(self, attr_name, value)' in line 25 like this:
1

And in trainer.py, i call data again in function '_train' in line 803 like this:
2

Am I doing this right? Or he has a bad influence elsewhere?
Thank you for help.

What happens if you just don't use the @dataloader decorator? That should prevent the DataLoader from being cached, and you can just recompute it every time the class method is called (which should be once per epoch). Not sure what other effects that would have, though.

@neggert that's the way to do it. I'll add this to the docs.

@sadbb actually just submitted a PR to enable this.
Once we merge into master, just remove the decorators in the dataloaders. Warning though, your loader will be called every epoch. if any of the loaders are slow to init your script will be very slow.

I think it is pretty standard to create a dataloader at the beginning of each epoch. I think it should be the default.

it is already default. this PR is to support the non-default case

One callout: When doing validation, val_dataloader gets called every step. This causes performance problems of you haven't memoized the dataloader.

The problem is here, in __evaluation_forward

        if test and len(self.get_test_dataloaders()) > 1:
            args.append(dataloader_idx)

        elif not test and len(self.get_val_dataloaders()) > 1:
            args.append(dataloader_idx)

Honestly, I never really liked passing different args to validation_step depending on the number of dataloaders anyway. Maybe we should think about changing the design slightly here.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DavidRuhe picture DavidRuhe  Â·  3Comments

justusschock picture justusschock  Â·  3Comments

Vichoko picture Vichoko  Â·  3Comments

maxime-louis picture maxime-louis  Â·  3Comments

chuong98 picture chuong98  Â·  3Comments