Vision: Use ImageNet dataset for classification reference?

Created on 16 Jul 2019  路  4Comments  路  Source: pytorch/vision

Right now we use ImageFolder dataset for the classification reference

https://github.com/pytorch/vision/blob/8837e0efbe16dc07ccdd4f1d06460643c7b41c50/references/classification/train.py#L117-L124

although we have an implementation for ImageNet. Is this intended?

awaiting response reference scripts needs discussion classification

Most helpful comment

By the time I originally wrote this reference script, ImageNet dataset class was not around and I never got into changing this.

I think that many users would like to use the reference code to train it on their own datasets, in which case changing it to ImageNet would just be less practical for many users.

Also, one slight annoyance with ImageNet is that if the users have ImageNet data already downloaded somewhere in a read-only system, but not the meta file, they won't be able to use their already downloaded copy of ImageNet with the ImageNet dataset, because they won't be able to write to the same place where ImageNet lives (this is the case with the data that we have in here for example)

For this reason, I'd rather let everything as it currently is.

Thoughts?

All 4 comments

By the time I originally wrote this reference script, ImageNet dataset class was not around and I never got into changing this.

I think that many users would like to use the reference code to train it on their own datasets, in which case changing it to ImageNet would just be less practical for many users.

Also, one slight annoyance with ImageNet is that if the users have ImageNet data already downloaded somewhere in a read-only system, but not the meta file, they won't be able to use their already downloaded copy of ImageNet with the ImageNet dataset, because they won't be able to write to the same place where ImageNet lives (this is the case with the data that we have in here for example)

For this reason, I'd rather let everything as it currently is.

Thoughts?

I was under the impression that the reference scripts are for "internal" validation of the models only. If that is not the case I agree to leave it as it is.

I think that many users would like to use the reference code to train it on their own datasets, in which case changing it to ImageNet would just be less practical for many users.

This only applies to users that also use a ImageFolder datasets, which is split into train and val folders, right?

Also, one slight annoyance with ImageNet is that if the users have ImageNet data already downloaded somewhere in a read-only system, but not the meta file, they won't be able to use their already downloaded copy of ImageNet with the ImageNet dataset, because they won't be able to write to the same place where ImageNet lives (this is the case with the data that we have in here for example)

Is this a common use case? If that is the case, it could be maybe smart to make the meta file optional for the ImageNet dataset.

I was under the impression that the reference scripts are for "internal" validation of the models only. If that is not the case I agree to leave it as it is.

The reference scripts are a starting point for everyone wanting to reproduce the models in torchvision, and can serve as a basis for training their own models. So it's not only for internal validation, but reference training / evaluation scripts

This only applies to users that also use a ImageFolder datasets, which is split into train and val folders, right?

Yes, but the code is simple enough so that they know what / where to change if they need something else. This would still be the case with the ImageNet dataset, but due to the issue I mentioned afterwards about the read-only filesystem, this would be more of an annoyance (at least to me and other people in the same system) than not.

Is this a common use case? If that is the case, it could be maybe smart to make the meta file optional for the ImageNet dataset.

I think given the size of ImageNet one generally download it beforehand, and not in the python interpreter (but I might be wrong). So I think having the pre-downloaded imagenet somewhere (potentially in a read-only location) seems fairly common to me.

I think given the size of ImageNet one generally download it beforehand, and not in the python interpreter (but I might be wrong). So I think having the pre-downloaded imagenet somewhere (potentially in a read-only location) seems fairly common to me.

In that case I will try to get around that by not relying on the meta file. I might re-open this issue for further discussions afterwards. Anyway, thanks for the clarifications.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chinglamchoi picture chinglamchoi  路  3Comments

300LiterPropofol picture 300LiterPropofol  路  3Comments

datumbox picture datumbox  路  3Comments

ArashJavan picture ArashJavan  路  3Comments

Wadaboa picture Wadaboa  路  3Comments