Pytorch-lightning: [DataModule] PyTorch datasets as DataModules out of the box

Created on 29 Jul 2020 · 9Comments · Source: PyTorchLightning/pytorch-lightning

🚀 Feature

PyTorch already has datasets (MNIST, CIFAR, etc). It would be very convenient to provide those datasets out of the box as DataModules

Motivation

To reduce the boilerplate. I mean, if I had the possibility not to reimplement / copy-paste the same code again, I would rather not do that, and I'd use the already implemented solutions. The entire PyTorchLightning was built with this in mind, so this is only natural.

Pitch

To have the ability to write something along the lines of

import pytorch_lightning as pl
import pytorch_lightning.datasets as pld

# implementation of model and trainer instantiation
trainer.fit(model, pld.MNIST())

Alternatives

Alternatively, it could be implemented as a PyTorchLightning Bolt, instead of here.

Additional context

None

Important discussion enhancement help wanted

Source

InCogNiTo124

All 9 comments

Hi! thanks for your contribution!, great first issue!

github-actions[bot] on 29 Jul 2020

I think it is a good suggestion, @PyTorchLightning/core-contributors

Borda on 29 Jul 2020

yes. this is i think what we have already started in bolts!

want to add the missing torchvision datasets to it?

williamFalcon on 29 Jul 2020

I'd like to, but I'm unsure if I have the time to do it in case this is very important. I could probably slowly do it over 2-3 weeks, if that's not an issue :)

InCogNiTo124 on 29 Jul 2020

no problem. Maybe create GH issues for each dataset?

and do one at a time?

williamFalcon on 29 Jul 2020

(gh issues in bolts)
fyi @nateraw

williamFalcon on 29 Jul 2020

👍1

Makes sense. I'll open a separate issue per dataset in Bolts.

Also, what do you thinkk about leaving this issue open until everything is implemented?

InCogNiTo124 on 29 Jul 2020

Not sure what the value is in having duplicate tickets, but we can leave this open for now until new issues are opened in Bolts. Make sense?

edenlightning on 29 Jul 2020

@InCogNiTo124 lets move the discussion to bolts repo for now. We're building out all sorts of support for different datasets there.

The datasets you mentioned aren't fromtorch, to my understanding. They're from torchvision, which isn't included as a requirement here. If we want to support torchvision or sklearn datasets directly in lightning, we can have that in a future PR.

Thanks for the feedback on the new LightningDataModule - Looking forward to hearing your thoughts on the bolts datamodules we've built out 😄