Vision: Accuracy regression on MobileNetV2

Created on 25 Jul 2019  路  9Comments  路  Source: pytorch/vision

Reported by @andravin in https://github.com/pytorch/vision/pull/818#issuecomment-509337263

With PyTorch 1.1 and torchvision 0.3, we are able to reach 71.878 top1 accuracy on ImageNet for MobileNetV2.
The training command is the following:

cd references/classification

python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
     --model mobilenet_v2 --epochs 300 --lr 0.045 --wd 0.00004\
     --lr-step-size 1 --lr-gamma 0.98

with best accuracy at epoch 285.

@andravin tried running the same code with a more recent version of PyTorch and torchvision, and got 71.536 (@andravin do you have maybe the specific versions?), which is too high to just be random variations.

Investigate (and fix) the cause of this.

A few related changes (in torchvision) which I have looked into, but didn't find anything particularly suspicious:

Note: it takes ~35h to train the model on 8-GPU machines.

bug help wanted models reference scripts classification

Most helpful comment

@andravin

My advice would be to have a separate page for each model that documents the hyperparameters used for training (ie the exact train.py commandline used, hopefully that program was used for all the models!). Additionally, would be great to know the mean accuracy and variance.

Totally, you know what, I'll be putting a README now with the hyperparameters that I used to train the the models that we have in the modelzoo. Thanks!

All 9 comments

Here are the software versions used: https://github.com/pytorch/vision/pull/818#issuecomment-508428115

>>> torch.cuda.nccl.version()
2406
>>> torch.version.cuda
'10.1.168'
>>> torch.backends.cudnn.version()
7601
>>> torch.__version__
'1.2.0a0+ffa15d2'

I would think it enough to reproduce the error in HEAD. If it works now, then either the error was fixed, or I did something wrong.

@andravin @fmassa I dug into this a little by running some training jobs from scratch on 8 V100's and using different combinations of pytorch and torchvision versions.

The results were as follows:

| pytorch | torchvision | Best top1 acc | Epoch |
|---------|-------------|--------|-------|
| master | master | 71.806 | 292 |
| master | 0.3 | 71.638 | 279 |
| 1.1 | master | 71.764 | 300 |
| 1.1 | 0.3 | 71.676 | 289 |
| 1.1 | 0.3 | 71.674 | 278 |
| 1.1 | 0.3 | 71.692 | 284 |
| 1.1 | 0.3 | 71.512 | 281 |
| 1.1 | 0.3 | 71.828 | 300 |
| 1.1 | 0.3 | 71.584 | 295 |
| 1.1 | 0.3 | 71.874 | 298 |

There are a few points to note here. First, the pytorch master and torchvision master run I ran was able to attain 71.806 top1 accuracy.

Next, I tried running a lot of pytorch 1.1 and torchvision 0.3 runs for 300 epochs each. Most of the time, these were not able to attain numbers close to the advertised 71.878, but some of the runs came close at 71.828 and 71.874. This suggests that there is a lot of variance during training that is probably due to different random initializations and non-determinism.

Finally, I took a look through the PyTorch commit history from 1.1 to master while waiting for the jobs to finish. No commits related to the ops run in MobileNetV2 jumped out to me as suspicious, but it's possible that I missed some more subtle changes.

Here were the nccl/cuda/cudnn versions I used:

>>> torch.cuda.nccl.version()
2402
>>> torch.version.cuda
'10.0.130'
>>> torch.backends.cudnn.version()
7501

closing based on @zou3519 's conclusion. It seems to be more around variance (+-0.2%) than any other factors. Also, he verified that master actually converges if you are on good initialization

Thanks a lot for the investigation @zou3519 !

It might be a good idea to document the expected accuracy. @zou3519 's 7 experiments on pytorch 1.1 and torchvision 0.3 have a mean and standard deviation of 71.691 +/- 0.127.

@andravin this is a very good point. We unfortunately only have point metrics instead of distributions for the numbers. This is the case for most papers as well up to date, but there are some work proposing different ways of reporting metrics for evaluating families of models, e.g., https://arxiv.org/abs/1905.13214

@fmassa yah I think it is good that you currently report the ImageNet accuracy for the pretrained weights here https://pytorch.org/docs/stable/torchvision/models.html

I was making the point that the user does not know what accuracy to expect if they train the model from scratch.

But apparently there is no documentation about how any of the models were trained. So the user really has no way to reproduce your results.

My advice would be to have a separate page for each model that documents the hyperparameters used for training (ie the exact train.py commandline used, hopefully that program was used for all the models!). Additionally, would be great to know the mean accuracy and variance.

I would think that pytorch developers also need this information for regression testing.

@andravin

My advice would be to have a separate page for each model that documents the hyperparameters used for training (ie the exact train.py commandline used, hopefully that program was used for all the models!). Additionally, would be great to know the mean accuracy and variance.

Totally, you know what, I'll be putting a README now with the hyperparameters that I used to train the the models that we have in the modelzoo. Thanks!

Would also be good to know the training time and hardware spec (eg 8xV100).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

datumbox picture datumbox  路  3Comments

alpha-gradient picture alpha-gradient  路  3Comments

ArashJavan picture ArashJavan  路  3Comments

iacolippo picture iacolippo  路  4Comments

zhang-zhenyu picture zhang-zhenyu  路  3Comments