For Inception_v3, the model provided by torchvision has top-1 accuracy of 77.45%. However, accuracy using the training scripts (in classification folder) barely reaches 77% top-1 (+-0.1% with different seeds).
Is the pre-trained model provided on the website trained from scratch using the torchvision training script? If not, any idea what changes need to be done to model and/or hyper-params to get to that accuracy?
No, inception_v3 uses the weights from Google's pre-trained model.
A few possible sources of differences:
1) Train for longer. Maybe 120 epochs instead of 90 (if that's what you're doing).
2) We might be incorrectly scaling inputs. See https://github.com/pytorch/vision/issues/926. Although, I'd expect this to be more of a difference for evaluating a pre-trained model than one trained from scratch.
3) Using the batch norm statistics computed across the entire dataset (instead of the exponentially weighted running average) often gives 0.1-0.2% points in my experience with ResNet models.
See https://discuss.pytorch.org/t/population-statistics-for-batchnorm-instead-of-running-average/20559 particularly the last post for some code (I haven't tried code, but it looks reasonable to me)..
As @colesbury mentioned, inception v3 weights have been ported from Google's pre-trained model.
IIRC, they also use auxiliary classifiers to improve training of the model.
I haven't tried myself to reproduce the training of their models, but the hints from @colesbury would definitely bring you closer.
@a-maci I think it would be valuable for the community to have the knowledge/code to train Inception v3 from scratch. So if you figure out more, please post.