Detectron2: How to speed up the data loader?

Created on 10 Nov 2019  ยท  4Comments  ยท  Source: facebookresearch/detectron2

โ“ Questions and Help

I followed the Tutorials to use custom datasets(DeepFashion 2). But I found it very slow when run trainer = DefaultTrainer(cfg). It seems that it takes much time in loading datas, maybe 20-30mins. I changed cfg.DATALOADER.NUM_WORKERS from 1 to 4 to 12, it didn't help(my cpu have 12 cores), results shows even worse. So how to speed up the data loader?
I write a script to test loading data with multi-thread, finding it only takes 4-5mins. Is it pytorch problems?

All 4 comments

I am experiencing exactly the same issue. It seems like the dataloader is trying to load all data into memory before it starts training. When I was training on a 8 GPU machine with a train size of 50000, it consumed more than 400GB of memory to load the dataset before the training process starts.
When I have a training dataset with 110000 data samples, I have to reduce the number of workers in to avoid out of memory issues.

Maybe you read image matrix in custom datasets. In my case, I just write image file name in custom instead of image itself. It just loads slow but costs little memory. And I think the reasons why it runs slowly is that thereโ€™re too many small config files.
I write a script to convert my datasets to coco format config (just one file). Now it run much faster than before.

@invisprints I did not put images into my json files. You are probably right I should merge all json files into one and see how that goes. Thank you very much.

The original code that loads COCO dataset is fast enough for training builtin models.

If you found the dataloader slow for your dataset, then the reason could be in your dataset, your custom dataloader, your machine. Without any details provided about what you did or what you observed, we cannot give a valid response. So closing.

Was this page helpful?
0 / 5 - 0 ratings