Pytorch-lightning: Support IterableDatasets for validation and test, not just train set [blocked by #953]

Created on 26 Feb 2020 · 12Comments · Source: PyTorchLightning/pytorch-lightning

🚀 Feature

Currently Lightning supports IterableDatasets only in the training set (see code). This makes them second-class citizens compared to the map-style datasets, and supporting them seems a low hanging fruit.

Motivation

This enables having larger test sets that may not fit into a machine's memory (they could be very large in production settings, or of modest size running in a student's cheap laptop). Moreover,
datasets are usually generated together (eg train, val, test can come from the same process). It is very likely that the same process has the same signature, so you may end up having IterableDatasets even when their size may not deem it strictly necessary.

Pitch

Changing a few lines of code by bringing in the checks we are doing for training should be enough unless I'm missing something.

Additional context

Are there any gotchas that make this harder than it looks?

enhancement help wanted

Source

Darktex

All 12 comments

Hey, thanks for your contribution! Great first issue!

github-actions[bot] on 26 Feb 2020

@Darktex this looks straightforward! I can’t think if any gotchas right now. The only thing would be if you don’t have the length of a dataset up front but i think we’re refactoring to clear that up right now.

want to do a PR?

@ethanwharris @jeffling thoughts?

fyi @srush @luiscape

williamFalcon on 26 Feb 2020

It seems there's an opportunity to clean stuff up a bit here. Really the only check we need is to see if len(dataloader) raises an error. If it does, then check if number of steps to run is set elsewhere and throw a warning if not (i.e. if not set elsewhere this will just run forever). That way you could get rid of the check for whether IterableDataset exists and the dependence on DataLoader.dataset, solving several issues.

ethanwharris on 26 Feb 2020

maybe step 1 is to refactor the code to minimize the len(dataloader) calls? we likely only need them to:

figure out when to do validation checks (percent into epoch)
set the tqdm bar length

williamFalcon on 26 Feb 2020

Agreed. Then it would be easier to see where the IterableDataset stuff will fall over, and just do something different when len is not available.

ethanwharris on 26 Feb 2020

Ok, #953 is blocking this issue at the moment.

williamFalcon on 26 Feb 2020

👍1

@ethanwharris @Darktex i think 0.7.1 fixed this problem. Mind checking now?

williamFalcon on 7 Mar 2020

🎉1

@williamFalcon Not quite, still tires to call len on val / test dataloders - will PR in a bit

ethanwharris on 9 Mar 2020

is the easier thing to try catch for the len exception and set to inf if caught?

then when the epoch ends, set the length when we know it?

williamFalcon on 9 Mar 2020

is the easier thing to try catch for the len exception and set to inf if caught?

then when the epoch ends, set the length when we know it?

williamFalcon on 9 Mar 2020

Yeah, that's the plan - currently have the is_infinite_dataloader method which tries to call len and catches the exception, just need to get the TQDM stuff to not do total=float('inf') as that raises an error

ethanwharris on 9 Mar 2020

Not sure about setting the lenght once we know it - maybe in a seperate PR?

ethanwharris on 9 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Dataloader starving the gpu

maxime-louis · 3Comments

Early stopping + checkpoint key

williamFalcon · 3Comments

NumpyMetric not mapping back to GPU in multi-GPU training

jcreinhold · 3Comments

Add "epoch" options to basic templates

baeseongsu · 3Comments

Fix .test() on ddp

williamFalcon · 3Comments