PyTorch already has datasets (MNIST, CIFAR, etc). It would be very convenient to provide those datasets out of the box as DataModules
To reduce the boilerplate. I mean, if I had the possibility not to reimplement / copy-paste the same code again, I would rather not do that, and I'd use the already implemented solutions. The entire PyTorchLightning was built with this in mind, so this is only natural.
To have the ability to write something along the lines of
import pytorch_lightning as pl
import pytorch_lightning.datasets as pld
# implementation of model and trainer instantiation
trainer.fit(model, pld.MNIST())
Alternatively, it could be implemented as a PyTorchLightning Bolt, instead of here.
None
Hi! thanks for your contribution!, great first issue!
I think it is a good suggestion, @PyTorchLightning/core-contributors
yes. this is i think what we have already started in bolts!
want to add the missing torchvision datasets to it?
I'd like to, but I'm unsure if I have the time to do it in case this is very important. I could probably slowly do it over 2-3 weeks, if that's not an issue :)
no problem. Maybe create GH issues for each dataset?
and do one at a time?
(gh issues in bolts)
fyi @nateraw
Makes sense. I'll open a separate issue per dataset in Bolts.
Also, what do you thinkk about leaving this issue open until everything is implemented?
Not sure what the value is in having duplicate tickets, but we can leave this open for now until new issues are opened in Bolts. Make sense?
@InCogNiTo124 lets move the discussion to bolts repo for now. We're building out all sorts of support for different datasets there.
The datasets you mentioned aren't fromtorch, to my understanding. They're from torchvision, which isn't included as a requirement here. If we want to support torchvision or sklearn datasets directly in lightning, we can have that in a future PR.
Thanks for the feedback on the new LightningDataModule - Looking forward to hearing your thoughts on the bolts datamodules we've built out 馃槃