Pytorch-lightning: [DataModule] PyTorch datasets as DataModules out of the box

Created on 29 Jul 2020  路  9Comments  路  Source: PyTorchLightning/pytorch-lightning

馃殌 Feature


PyTorch already has datasets (MNIST, CIFAR, etc). It would be very convenient to provide those datasets out of the box as DataModules

Motivation


To reduce the boilerplate. I mean, if I had the possibility not to reimplement / copy-paste the same code again, I would rather not do that, and I'd use the already implemented solutions. The entire PyTorchLightning was built with this in mind, so this is only natural.

Pitch


To have the ability to write something along the lines of

import pytorch_lightning as pl
import pytorch_lightning.datasets as pld

# implementation of model and trainer instantiation
trainer.fit(model, pld.MNIST())

Alternatives


Alternatively, it could be implemented as a PyTorchLightning Bolt, instead of here.

Additional context


None

Important discussion enhancement help wanted

All 9 comments

Hi! thanks for your contribution!, great first issue!

I think it is a good suggestion, @PyTorchLightning/core-contributors

yes. this is i think what we have already started in bolts!

want to add the missing torchvision datasets to it?

I'd like to, but I'm unsure if I have the time to do it in case this is very important. I could probably slowly do it over 2-3 weeks, if that's not an issue :)

no problem. Maybe create GH issues for each dataset?

and do one at a time?

(gh issues in bolts)
fyi @nateraw

Makes sense. I'll open a separate issue per dataset in Bolts.

Also, what do you thinkk about leaving this issue open until everything is implemented?

Not sure what the value is in having duplicate tickets, but we can leave this open for now until new issues are opened in Bolts. Make sense?

@InCogNiTo124 lets move the discussion to bolts repo for now. We're building out all sorts of support for different datasets there.

The datasets you mentioned aren't fromtorch, to my understanding. They're from torchvision, which isn't included as a requirement here. If we want to support torchvision or sklearn datasets directly in lightning, we can have that in a future PR.

Thanks for the feedback on the new LightningDataModule - Looking forward to hearing your thoughts on the bolts datamodules we've built out 馃槃

Was this page helpful?
0 / 5 - 0 ratings

Related issues

polars05 picture polars05  路  3Comments

chuong98 picture chuong98  路  3Comments

jcreinhold picture jcreinhold  路  3Comments

maxime-louis picture maxime-louis  路  3Comments

Vichoko picture Vichoko  路  3Comments