Ignite: Ways or Features to improve data loading performance

Created on 26 Feb 2019 · 13Comments · Source: pytorch/ignite

I was wondering if there is any planned feature to improve data loading performance. The standard Pytorch Dataset and DataLoader implementation rely on random I/O, which can be bottleneck for image classification for small input, even with a M2 SSD. It's possible to convert the dataset into a binary data file like LMDB but I was wondering if there is a standard or perhaps better way in Pytorch, ideally supported out-of-box by high level training framework like Ignite.

Source

pkdogcom

Most helpful comment

It seems like from 3.4.2 onward OpenCV will default to use libjpeg-turbo instead of libjpeg. So for newer version of OpenCV it might be easier to rely on OpenCV (with most image libs enabled during compilation) as an efficient overall image loading library.

pkdogcom on 28 Feb 2019

👍2

All 13 comments

Or is manually moving the data into RAM the best option? If they can fit into memory?

pkdogcom on 26 Feb 2019

@pkdogcom we didn't planned any features on that. If you have more details on that we can discuss more what can be introduced into Ignite.

vfdev-5 on 26 Feb 2019

@pkdogcom actually maybe Nvidia/DALI can be interesting to test if improve dataflow performances.

vfdev-5 on 26 Feb 2019

@vfdev-5 Thanks for the information. On a second test with more data processing threads, I was able to reduce a great amount of data pre-processing time which means for a M2 SSD case CPU can potentially be the bottleneck instead of just the I/O.
I did a quick research on DALI and it seems to support LMDB/TFRecord types of data loading but not Pytorch type of image random access. Although I believe Pytorch training pipeline can still benefit from other data augmentation support in DALI, there seems to be non-trivial amount of work to get the right recipe of integrating DALI with Pytorch pipeline, especially Ignite.
I'll keep the eye open for other solutions and will let you know if I find anything useful.

pkdogcom on 27 Feb 2019

It turns out that with a fast disk (such as M2 SSD), the bottleneck of data flow will most likely be decoding images with CPU if using the default pytorch pil_loader. A quick and effective fix will be using a faster image decoder in place of pil_loader. I've tried jpeg4py, which is a wrapper of libjpeg-turbo, and I was able to reduce the overhead of data pre-processing, in my experiments, from 0.6s per batch on top of the 0.2s model forward/backward time to 0s, completely hidden by the model processing time.

FYI, Here is the codes of my image loader:

from PIL import Image
import jpeg4py as jpeg
import imghdr

def fast_img_loader(path):
    with open(path.encode('utf-8'), 'rb') as f:
        # Use (wrapper of) libjpeg-turbo for faster JPEG decode
        try:
            if imghdr.what(f) == 'jpeg':  # Test image format by file prefix 
                img = jpeg.JPEG(f).decode()  # Decode as 'RGB' by default
                return Image.fromarray(img)
        except Exception as e:
            logging.warn('Failed to decode image {} as jpeg: {}'.format(path, e))

        # Fall back to PIL image loader for non-JPEG images or in case of exception
        img = Image.open(f)
        return img.convert('RGB')

Of course, you will need to install libjpeg-turbo and jpeg4py (with pip).

pkdogcom on 28 Feb 2019

👍2

@pkdogcom yes, that's true that torchvision backended with Pillow is not the fastest data loading/processing. Have you tried Pillow-SIMD or OpenCV ? Opencv intrernally should use turbo jpeg, I think.

vfdev-5 on 28 Feb 2019

I've been using Pillow-SIMD and OpenCV and libjpeg-turbo still gives a big performance boost. I will need to check if OpenCV can be compiled with libjpeg-turbo

pkdogcom on 28 Feb 2019

👍2

@pkdogcom I'll close this issue. Feel free to reopen if we can improve this from ignite side.

vfdev-5 on 8 Mar 2019

Sure. What about implementing a OpenCV/libjpeg-turbo image loader (and maybe converted to PIL in the loader for compatibility in downstream processing) and let the user have better awareness of this issue?

pkdogcom on 10 Mar 2019

@pkdogcom IMO the dataflow stuff (data reading, augs, batching etc) is out of scope of ignite as there is a plenty of libs who are managing some part of it. Soon I'll think to provide a contrib handler for some basic time profiling: batch creation, time passed in handlers, time of processing function. In some notes of this future handler we can mention about these accelerated ways of reading images in case of dataflow bottleneck. Another thing which may improve the dataflow could be a sort of memory caching of loaded data.

vfdev-5 on 10 Mar 2019

Agree. I think having some of these best pratices either implemented or mentioned in the contrib module should be enough

pkdogcom on 10 Mar 2019

FWIW you can build Pillow-SIMD against libjpeg-turbo and this greatly improves its performance without having to abandon torchvision: https://docs.fast.ai/performance.html#installation