The caffe version says the accuracy is about ~7% but in the documentation of pytorch, it says it is ~9%. Just to confirm, is the accuracy of the pytorch version about 2% lower?
This is correct. From what I know the VGG-D in Caffe version is not trained from scratch, but initialized first from VGG-C and fine-tuned. We trained the pytorch version from scratch.
@soumith Just to confirm, it is also trained with RGB images and not BGR images, which is opposite of Caffe's version of VGG, right?
that is correct. rgb 0 to 1, instead of bgr 0 to 255
No -- the difference is in how the models are evaluated, not accuracy. We report all accuracy numbers for a single-crop with the size of the receptive field (in this case 224x224). This keeps comparisons between models fair. (InceptionV3 is evaluated on a 299x299 image, since it has a larger receptive field).
Caffe (and the VGG paper) use multi-scale evaluation to get to the 7.5% number, which is more computationally expensive. If you evaluate the PyTorch pre-trained models at multiple scales, you'll get better results too.
Initialization from VGG-C doesn't matter -- you just need a good learning rate schedule.
Most helpful comment
No -- the difference is in how the models are evaluated, not accuracy. We report all accuracy numbers for a single-crop with the size of the receptive field (in this case 224x224). This keeps comparisons between models fair. (InceptionV3 is evaluated on a 299x299 image, since it has a larger receptive field).
Caffe (and the VGG paper) use multi-scale evaluation to get to the 7.5% number, which is more computationally expensive. If you evaluate the PyTorch pre-trained models at multiple scales, you'll get better results too.
Initialization from VGG-C doesn't matter -- you just need a good learning rate schedule.