@JorDikk and I recently found out that Trainer figures out the total number of batches per epoch though the Sampler __len__ and not Dataset __len__.
While for most cases the size of sampler would correspond to the total number of indices in the dataset (train and val),
we were using a hierarchical dataset, where each individual dataset was a collection of smaller datasets.
Our sampler too, then was a collection of smaller samplers. This created a problem as for our base
sampler, the size was the number of smaller datasets, rather than the data indices.
The fix was very easy, but it would help to mention it somewhere in the Docs to avoid much confusion.
Hi! thanks for your contribution!, great first issue!
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
Most helpful comment
Hi! thanks for your contribution!, great first issue!