Pytorch-lightning: Rename train_dataloader or update docs for trainer.fit()?

Created on 25 Aug 2020  Â·  6Comments  Â·  Source: PyTorchLightning/pytorch-lightning

🚀 Feature

  1. Either rename train_dataloder arg to something else, like train_data_source.
    or
  2. Update docs to make it clear that is trainer.fit() and other modules (e.g. trainer.lr_find()) where a LightningDataModule can be passed instead of DataLoader.

Motivation

The new datamodule allows users to decouple data and models.
After #2755, trainer.fit(model, dataloader) and trainer.fit(model, datamodule) both work. Basically there's a line that checks whether the 2nd arg is datamodule instance.

It's nice that we can do positional args without the need for named arguments. But, I think it also causes confusion since, train_dataloader doesn't have to be DataLoader and can be a LightningDataModule instead.

Could be just me, but I got confused when I found trainer.lr_find() was be fine with having a LightningDataModule as its 2nd arg. I had to look into the source code to see that the 2nd arg is just passed and handled with trainer.fit(). So I think the current state is not that friendly to beginners.

Pitch

With #2755 the first arg of trainer.fit() has conceptually become something different from train_dataloader. It can be either a DataLoader or a LightningDataModule.

Therefore I think it may be a good idea to rename the first arg of train_dataloader to something else like train_data_source or even introduce a higher-level concept.

However, regarding backwards compatibility, I'm not sure how appropriate such a change is. How can one rename an important arg with breaking backwards compatibility?

Alternatives

  • Just update the docs for now. I.e. add a note where a LightningDataModule can be passed instead of DataLoader.
  • The type hints are incorrect (haven't been updated) in some places so fix those as well.

Additional context

discussion enhancement help wanted won't fix

All 6 comments

I like this, just the case is compatibility with past API, rename would break code of people who are setting parameters explicitly train_dataloder=some_dataloader_or_datamodule...
cc: @williamFalcon

maybe:

train_data:Iterable?

If its a data_module then it could a val_dataloader so I'm not for having "train" in the arg name. Also, data_module isn't an iterable right?

As for backward compatibility, how about just keeping the keyword arg for train_dataloder? Just not have it as the second arg.

As for backward compatibility, how about just keeping the keyword arg for train_dataloder? Just not have it as the second arg.

well for comparability it shall be quite simple we can just rename the first parameter and then add train_dataloder as last and add some deprecation warning...

Yeah, that's what I meant.

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

as754770178 picture as754770178  Â·  3Comments

monney picture monney  Â·  3Comments

srush picture srush  Â·  3Comments

iakremnev picture iakremnev  Â·  3Comments

Vichoko picture Vichoko  Â·  3Comments