Let's minimize len(dataset) calls and do it as late in the training as we can (ie: ideally right before any training loop). This way, we can open up the path to support iterable datasets more cleanly.
Getting the length prematurely calls datasets at the wrong time often causing double loads.
This is a blocker to #948
@williamFalcon I'm happy to take a look at this if needed, just let me know :)
Perfect!
In this function, auto_add_sampler() is always called.
And inside, even though the comment says
what it does is create a new pytorch DataLoader. I think this logic is flawed.
What I suggest is that in the default setting, only call len(loader) to maybe determine the size.
Ok, have a look at #955 - should fix a few things and make it easy to add support for iterable datasets everywhere
Most helpful comment
Ok, have a look at #955 - should fix a few things and make it easy to add support for iterable datasets everywhere