Vision: Is the VGG here about 2% lower in accuracy compared to the original caffe version?

Created on 10 Aug 2017 · 4Comments · Source: pytorch/vision

The caffe version says the accuracy is about ~7% but in the documentation of pytorch, it says it is ~9%. Just to confirm, is the accuracy of the pytorch version about 2% lower?

Source

yxlabs

Most helpful comment

No -- the difference is in how the models are evaluated, not accuracy. We report all accuracy numbers for a single-crop with the size of the receptive field (in this case 224x224). This keeps comparisons between models fair. (InceptionV3 is evaluated on a 299x299 image, since it has a larger receptive field).

Caffe (and the VGG paper) use multi-scale evaluation to get to the 7.5% number, which is more computationally expensive. If you evaluate the PyTorch pre-trained models at multiple scales, you'll get better results too.

Initialization from VGG-C doesn't matter -- you just need a good learning rate schedule.

colesbury on 17 Aug 2017

👍6

All 4 comments

This is correct. From what I know the VGG-D in Caffe version is not trained from scratch, but initialized first from VGG-C and fine-tuned. We trained the pytorch version from scratch.