Pytorch-lightning: Rename train_dataloader or update docs for trainer.fit()?

Created on 25 Aug 2020 · 6Comments · Source: PyTorchLightning/pytorch-lightning

🚀 Feature

Either rename train_dataloder arg to something else, like train_data_source.
or
Update docs to make it clear that is trainer.fit() and other modules (e.g. trainer.lr_find()) where a LightningDataModule can be passed instead of DataLoader.

Motivation

The new datamodule allows users to decouple data and models.
After #2755, trainer.fit(model, dataloader) and trainer.fit(model, datamodule) both work. Basically there's a line that checks whether the 2nd arg is datamodule instance.

It's nice that we can do positional args without the need for named arguments. But, I think it also causes confusion since, train_dataloader doesn't have to be DataLoader and can be a LightningDataModule instead.

Could be just me, but I got confused when I found trainer.lr_find() was be fine with having a LightningDataModule as its 2nd arg. I had to look into the source code to see that the 2nd arg is just passed and handled with trainer.fit(). So I think the current state is not that friendly to beginners.

Pitch

With #2755 the first arg of trainer.fit() has conceptually become something different from train_dataloader. It can be either a DataLoader or a LightningDataModule.

Therefore I think it may be a good idea to rename the first arg of train_dataloader to something else like train_data_source or even introduce a higher-level concept.

However, regarding backwards compatibility, I'm not sure how appropriate such a change is. How can one rename an important arg with breaking backwards compatibility?

Alternatives

Just update the docs for now. I.e. add a note where a LightningDataModule can be passed instead of DataLoader.
The type hints are incorrect (haven't been updated) in some places so fix those as well.

Additional context

discussion enhancement help wanted won't fix

Source

AtomScott

All 6 comments

I like this, just the case is compatibility with past API, rename would break code of people who are setting parameters explicitly train_dataloder=some_dataloader_or_datamodule...
cc: @williamFalcon

Borda on 25 Aug 2020

maybe:

train_data:Iterable?

williamFalcon on 25 Aug 2020

👍1

If its a data_module then it could a val_dataloader so I'm not for having "train" in the arg name. Also, data_module isn't an iterable right?

As for backward compatibility, how about just keeping the keyword arg for train_dataloder? Just not have it as the second arg.

AtomScott on 27 Aug 2020

As for backward compatibility, how about just keeping the keyword arg for train_dataloder? Just not have it as the second arg.

well for comparability it shall be quite simple we can just rename the first parameter and then add train_dataloder as last and add some deprecation warning...

Borda on 27 Aug 2020

👍1

Yeah, that's what I meant.

AtomScott on 27 Aug 2020

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!