Faster-rcnn.pytorch: Runtime Error when resuming trained model

Created on 5 Feb 2018 · 9Comments · Source: jwyang/faster-rcnn.pytorch

Hello, I have trained a model, when I want to resume it in a bigger dataset, I encounter this problem:

loading checkpoint ./trained_models/vgg16/pascal_voc/faster_rcnn_1_1_41.pth
loaded checkpoint ./trained_models/vgg16/pascal_voc/faster_rcnn_1_1_41.pth
/home/shin/faster-rcnn.pytorch/lib/model/rpn/rpn.py:68: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
/home/shin/faster-rcnn.pytorch/lib/model/faster_rcnn/faster_rcnn.py:98: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  cls_prob = F.softmax(cls_score)
Traceback (most recent call last):
  File "trainval_net.py", line 335, in <module>
    optimizer.step()
  File "/usr/local/lib/python3.5/dist-packages/torch/optim/sgd.py", line 94, in step
    buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:271

The training parameters are same. In fact, I train a model for 1 epoch and then resume it, this issue also happened.....

Source

shinshiner

👍5

All 9 comments

I have make sure the version is valid and I can use this model in demo.py.

shinshiner on 5 Feb 2018

it seems that, the number of categories is changed on your bigger dataset. In this case, the size would not match. One simple solution is partially loading the pre-trained model layer-by-layer.

jwyang on 5 Feb 2018

👎1

I do not add or delete any categories in my bigger dataset. In fact, I found if I comment these two lines, everything would be ok.

shinshiner on 6 Feb 2018

👍1

@shinshiner great!

jwyang on 6 Feb 2018

👎1

@shinshiner
Hi,
which two lines have you commented out?
The link above is just one line.
Thank you!

wjx2 on 30 Jun 2018

@wjx2 The 286 and 287 lines

shinshiner on 1 Jul 2018

Anyone know the reason ? I also encountered this problem when resume with batchsize=1 from the model trained with batchsize=64. If I keep batchsize=64, it would be fine.
@jwyang Can you reopen the issue ? Commenting two lines is not perfect, since optimizer cannot be resumed.

Liu0329 on 30 Jul 2018

@Liu0329 @shinshiner hi,guys,did you fix this problem? i also encountered this problem when i want to use the pretrained model faster_rcnn_1_7_10021.pth on my own dataset,i have tried to comment these two lines
# if args.mGPUs:
# fasterRCNN = nn.DataParallel(fasterRCNN)
but it did no work, what should i do?Thank you !!!!

xwjBupt on 7 Sep 2018

👍1

@Liu0329 @shinshiner hi,guys,did you fix this problem? i also encountered this problem when i want to use the pretrained model faster_rcnn_1_7_10021.pth on my own dataset,i have tried to comment these two lines

if args.mGPUs:

fasterRCNN = nn.DataParallel(fasterRCNN)

but it did no work, what should i do?Thank you !!!!

have you solved it?
I also meet this problem.
And comment these two doesn't work.