Vision: [Feature Request] LMDB Dataset for ImageNet

Created on 19 May 2019  路  9Comments  路  Source: pytorch/vision

Is it possible to support LMDB for ImageNet as this one https://github.com/pytorch/vision/blob/master/torchvision/datasets/lsun.py#L22. One benefit is that we do not need to save 1.28 M small images on the disk, instead, we can save the whole ImageNet into one single file (maybe several files).

awaiting response datasets needs discussion

Most helpful comment

I'm using a system that can not handle many small files (such as 1M PNG images). Therefore, I can not use raw ImageNet images, but have to use one or a few large files to save the whole dataset.

All 9 comments

I think that if you have already worked on getting a LMDB out of ImageNet individual files, then writing a custom dataset for it should be straightforward, right?

If we already have an LMDB file, yes. Is that possible to integrate building LMDB in the initial function of ImageNet-LMDB dataset class?

We could.

But LMDB also has some downsides, like https://github.com/pytorch/vision/issues/619.
I'm not sure if we would like to encourage its use, at least not as of now.

@fmassa Thanks for your reply. Do you have some recommendations for database for ImageNet?

doesn't the current ImageNet format (unzipped images) work for you?

I'm using a system that can not handle many small files (such as 1M PNG images). Therefore, I can not use raw ImageNet images, but have to use one or a few large files to save the whole dataset.

Something that kind of helped me with Imagenet data loading was using TensorFlow Dataset's from_numpy method, along with TFDS prepackaged ImageNet dataset in TFRecords format. It wasn't ideal and there was room for further optimization for PyTorch ingestion, but it did speed up dataloading a ton on a machine with only HDD available.

I came across the same problems.. too many small files...
I need to find a way to speed up the dataloading. tried lmdb lsun.py LMDB example not working.
@RicCu could you share more tips of combining TFDS into dataloader?

Hi,
I haven't worked that much more on getting tfds to work great with dataloaders, but you might wanna take a look at @vahidk's tfrecords reader. It has nice interop with PyTorch's dataloaders without depending on TF.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

varunagrawal picture varunagrawal  路  45Comments

Jonas1312 picture Jonas1312  路  23Comments

soldierofhell picture soldierofhell  路  36Comments

JingyunLiang picture JingyunLiang  路  26Comments

h6197627 picture h6197627  路  23Comments