Vision: Invalid hash error on ImageNet dataset

Created on 25 Apr 2019  路  5Comments  路  Source: pytorch/vision

In my environment, md5 value of meta.bin used in ImageNet dataset is different from the value defined in imagenet.py.

meta.bin is generated by torch.save in the code. I found python2 and3 generate different files.

md5sum hashes are as follows.

  • (defined) 7e0d3cf156177e4fc47011cdd30ce706
  • (Python 2.7.16, Ubuntu) a36fd93cf3900286d99e24ad0a73ce04
  • (Python 3.7.3, Ubuntu) ca981e8aac175178e80e7949d90ee85c

https://github.com/pytorch/vision/blob/9a481d0bec2700763a799ff148fe2e083b575441/torchvision/datasets/imagenet.py#L23-L26

https://github.com/pytorch/vision/blob/9a481d0bec2700763a799ff148fe2e083b575441/torchvision/datasets/imagenet.py#L117-L118

bug needs discussion

Most helpful comment

So I think we should not have just different hashes per python version, because it could also be system-dependent.

Agreed.

This might be due to the fact that the information in meta is a python dict, which doesn't guarantee any order of how the keys are stored.

I will try if we get different hashes if we use an OrderedDict. If that is still the case:

[...] remove the md5 checking from meta.bin

All 5 comments

cc @pmeier

This might be due to the fact that the information in meta is a python dict, which doesn't guarantee any order of how the keys are stored.
So I think we should not have just different hashes per python version, because it could also be system-dependent.

Maybe the simplest is to remove the md5 checking from meta.bin?

So I think we should not have just different hashes per python version, because it could also be system-dependent.

Agreed.

This might be due to the fact that the information in meta is a python dict, which doesn't guarantee any order of how the keys are stored.

I will try if we get different hashes if we use an OrderedDict. If that is still the case:

[...] remove the md5 checking from meta.bin

Using OrderedDicts does not fix this. I'll send a fix soon.

@fmassa Is the logic in check_integrity correct?

https://github.com/pytorch/vision/blob/6716fc514c9524abed4f8ca00e4424553990e315/torchvision/datasets/utils.py#L20-L24

Shouldn't we check if the fpath exist regardless of the md5 check? Otherwise this function does no checking at all if md5=None.

@pmeier yes, it seems that this logic it flawed in check_integrity. Can you send a PR fixing the order of the checks?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chinglamchoi picture chinglamchoi  路  3Comments

ArashJavan picture ArashJavan  路  3Comments

iacolippo picture iacolippo  路  4Comments

bodokaiser picture bodokaiser  路  3Comments

martinarjovsky picture martinarjovsky  路  4Comments