Pytorch-lightning: Support IterableDatasets for validation and test, not just train set [blocked by #953]

Created on 26 Feb 2020  路  12Comments  路  Source: PyTorchLightning/pytorch-lightning

馃殌 Feature

Currently Lightning supports IterableDatasets only in the training set (see code). This makes them second-class citizens compared to the map-style datasets, and supporting them seems a low hanging fruit.

Motivation

This enables having larger test sets that may not fit into a machine's memory (they could be very large in production settings, or of modest size running in a student's cheap laptop). Moreover,
datasets are usually generated together (eg train, val, test can come from the same process). It is very likely that the same process has the same signature, so you may end up having IterableDatasets even when their size may not deem it strictly necessary.

Pitch


Changing a few lines of code by bringing in the checks we are doing for training should be enough unless I'm missing something.

Additional context


Are there any gotchas that make this harder than it looks?

enhancement help wanted

All 12 comments

Hey, thanks for your contribution! Great first issue!

@Darktex this looks straightforward! I can鈥檛 think if any gotchas right now. The only thing would be if you don鈥檛 have the length of a dataset up front but i think we鈥檙e refactoring to clear that up right now.

want to do a PR?

@ethanwharris @jeffling thoughts?

fyi @srush @luiscape

It seems there's an opportunity to clean stuff up a bit here. Really the only check we need is to see if len(dataloader) raises an error. If it does, then check if number of steps to run is set elsewhere and throw a warning if not (i.e. if not set elsewhere this will just run forever). That way you could get rid of the check for whether IterableDataset exists and the dependence on DataLoader.dataset, solving several issues.

maybe step 1 is to refactor the code to minimize the len(dataloader) calls? we likely only need them to:

  • figure out when to do validation checks (percent into epoch)
  • set the tqdm bar length

Agreed. Then it would be easier to see where the IterableDataset stuff will fall over, and just do something different when len is not available.

Ok, #953 is blocking this issue at the moment.

@ethanwharris @Darktex i think 0.7.1 fixed this problem. Mind checking now?

@williamFalcon Not quite, still tires to call len on val / test dataloders - will PR in a bit

is the easier thing to try catch for the len exception and set to inf if caught?

then when the epoch ends, set the length when we know it?

is the easier thing to try catch for the len exception and set to inf if caught?

then when the epoch ends, set the length when we know it?

Yeah, that's the plan - currently have the is_infinite_dataloader method which tries to call len and catches the exception, just need to get the TQDM stuff to not do total=float('inf') as that raises an error

Not sure about setting the lenght once we know it - maybe in a seperate PR?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maxime-louis picture maxime-louis  路  3Comments

williamFalcon picture williamFalcon  路  3Comments

jcreinhold picture jcreinhold  路  3Comments

baeseongsu picture baeseongsu  路  3Comments

williamFalcon picture williamFalcon  路  3Comments