The RAM is getting filled up after a few iterations, this causes a problem in handling large datasets. I am not sure but I think it is loading the entire data into Memory.
I faced the same OOM issue. It's caused by ThreadPoolExecutor in DataLoader class defined in fastai/dataloader.py. I am able to run the code after replacing the thread pool by a single execution thread. Here's the diff:
``` def __iter__(self):
After a little bit of analysis, the OOM error is caused because ThreadPoolExecutor creates all batches and stores them in memory and then returns a generator to fetch the batches from. This behavior has changed from Python 3.5 to 3.6


Linking a related forum thread. The pool executor change doesn't seem to solve issue for me. 30GB RAM is used up in no time by Kaggle's google landmark dataset.
EDIT: Maybe a slightly different issue since this one was reported to only occur after a few iterations..
Best off using the forum for this.
with ThreadPoolExecutor
making num workers =0 will execute the code that you are putting there is a if condition in a dataloader...
so that we dont have to modify the loader at all
Most helpful comment
I faced the same OOM issue. It's caused by ThreadPoolExecutor in
DataLoaderclass defined in fastai/dataloader.py. I am able to run the code after replacing the thread pool by a single execution thread. Here's the diff:``` def __iter__(self):
```
I tried to set max_workers to 1 too but that didn't work. Understanding where memory is being lost will require more ananlysis.