Faster-rcnn.pytorch: RuntimeError when resume a pretrained model.

Created on 30 Jun 2018 · 14Comments · Source: jwyang/faster-rcnn.pytorch

I want to finetune a model, but when I resume a pretrained model ,it get error below:
Called with args:
Namespace(batch_size=1, checkepoch=20, checkpoint=3557, checkpoint_interval=10000, checksession=1, class_agnostic=False, cuda='--cuda', dataset='pascal_voc', disp_interval=100, large_scale=False, lr=0.0005, lr_decay_gamma=0.1, lr_decay_step=5, mGPUs=False, max_epochs=26, net='vgg16', num_workers=0, optimizer='sgd', resume=True, save_dir='/home/smartdsp/new_home/faster-rcnn.pytorch/models', session=1, start_epoch=1, use_tfboard=False)
Using config:
{'ANCHOR_RATIOS': [0.5, 1, 2],
'ANCHOR_SCALES': [8, 16, 32],
'CROP_RESIZE_WITH_MAX_POOL': False,
'CUDA': False,
'DATA_DIR': '/home/smartdsp/new_home/faster-rcnn.pytorch/data',
'DEDUP_BOXES': 0.0625,
'EPS': 1e-14,
'EXP_DIR': 'vgg16',
'FEAT_STRIDE': [16],
'GPU_ID': 0,
'MATLAB': 'matlab',
'MAX_NUM_GT_BOXES': 20,
'MOBILENET': {'DEPTH_MULTIPLIER': 1.0,
'FIXED_LAYERS': 5,
'REGU_DEPTH': False,
'WEIGHT_DECAY': 4e-05},
'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]),
'POOLING_MODE': 'align',
'POOLING_SIZE': 7,
'RESNET': {'FIXED_BLOCKS': 1, 'MAX_POOL': False},
'RNG_SEED': 3,
'ROOT_DIR': '/home/smartdsp/new_home/faster-rcnn.pytorch',
'TEST': {'BBOX_REG': True,
'HAS_RPN': True,
'MAX_SIZE': 1000,
'MODE': 'nms',
'NMS': 0.3,
'PROPOSAL_METHOD': 'gt',
'RPN_MIN_SIZE': 16,
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'RPN_TOP_N': 5000,
'SCALES': [600],
'SVM': False},
'TRAIN': {'ASPECT_GROUPING': False,
'BATCH_SIZE': 256,
'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_NORMALIZE_TARGETS': True,
'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
'BBOX_REG': True,
'BBOX_THRESH': 0.5,
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'BIAS_DECAY': False,
'BN_TRAIN': False,
'DISPLAY': 10,
'DOUBLE_BIAS': True,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'GAMMA': 0.1,
'HAS_RPN': True,
'IMS_PER_BATCH': 1,
'LEARNING_RATE': 0.01,
'MAX_SIZE': 1000,
'MOMENTUM': 0.9,
'PROPOSAL_METHOD': 'gt',
'RPN_BATCHSIZE': 256,
'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_MIN_SIZE': 8,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 2000,
'RPN_PRE_NMS_TOP_N': 12000,
'SCALES': [600],
'SNAPSHOT_ITERS': 5000,
'SNAPSHOT_KEPT': 3,
'SNAPSHOT_PREFIX': 'res101_faster_rcnn',
'STEPSIZE': [30000],
'SUMMARY_INTERVAL': 180,
'TRIM_HEIGHT': 600,
'TRIM_WIDTH': 600,
'TRUNCATED': False,
'USE_ALL_GT': True,
'USE_FLIPPED': True,
'USE_GT': False,
'WEIGHT_DECAY': 0.0005},
'USE_GPU_NMS': True}
Loaded dataset voc_2007_trainval for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from /home/smartdsp/new_home/faster-rcnn.pytorch/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
before filtering, there are 2372 images...
after filtering, there are 2372 images...
2372 roidb entries
Loading pretrained weights from data/pretrained_model/vgg16_caffe.pth
loading checkpoint /home/smartdsp/new_home/faster-rcnn.pytorch/models/vgg16/pascal_voc/vgg16_baseline/faster_rcnn_1_20_3557.pth
loaded checkpoint /home/smartdsp/new_home/faster-rcnn.pytorch/models/vgg16/pascal_voc/vgg16_baseline/faster_rcnn_1_20_3557.pth
lib/model/rpn/rpn.py:68: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
lib/model/faster_rcnn/faster_rcnn.py:98: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
cls_prob = F.softmax(cls_score)
/home/smartdsp/new_home/faster-rcnn.pytorch/trainval_net_finetune.py:330: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
loss_temp += loss.data[0]
Traceback (most recent call last):

File "", line 1, in
runfile('/home/smartdsp/new_home/faster-rcnn.pytorch/trainval_net_finetune.py', wdir='/home/smartdsp/new_home/faster-rcnn.pytorch')

File "/home/smartdsp/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)

File "/home/smartdsp/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile
builtins.execfile(filename, *where)

File "/home/smartdsp/new_home/faster-rcnn.pytorch/trainval_net_finetune.py", line 337, in
optimizer.step()

File "/home/smartdsp/anaconda2/lib/python2.7/site-packages/torch/optim/sgd.py", line 101, in step
buf.mul_(momentum).add_(1 - dampening, d_p)

RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #4 'other'

Source

wjx2

Most helpful comment

@wjx2 @babyjie57 This update is due to the new pytorch 0.4.

you can re-initialise the weights manually using this

model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.SGD(model.parameters(), momentum = 0.9, weight_decy = 0.0001)
optimizer.load_state_dict(checkpoint['optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

Worulz on 19 Jul 2018

👍9 😄1

All 14 comments

@wjx2 see the error in last row. it is because the mismatch of cpu data and cpu data. use cuda when you run the code.

jwyang on 30 Jun 2018

👎9 👍1

@wjx2, Hi, I met the same question. Have you solved it yet?

babyjie57 on 4 Jul 2018

@babyjie57 yeah, I change my torch version from 0.4.0 to 0.3.0. And the problem is solved.

wjx2 on 4 Jul 2018

@wjx2 @babyjie57 This update is due to the new pytorch 0.4.

you can re-initialise the weights manually using this

model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.SGD(model.parameters(), momentum = 0.9, weight_decy = 0.0001)
optimizer.load_state_dict(checkpoint['optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

Worulz on 19 Jul 2018

👍9 😄1

Additionally for others who may encounter this problem with the adam optimizer. Use this

        optimizer.load_state_dict(checkpoint['optimizer'])

        lr = optimizer.param_groups[0]['lr']
        weight_decay = optimizer.param_groups[0]['weight_decay']
        double_bias = True
        bias_decay = True

        params = []
        for key, value in dict(fasterRCNN.named_parameters()).items():
            if value.requires_grad:
                if 'bias' in key:
                    params += [{'params':[value],'lr':lr*(double_bias + 1), \
                            'weight_decay': bias_decay and weight_decay or 0}]
                else:
                    params += [{'params':[value],'lr':lr, 'weight_decay': weight_decay}]

        optimizer = torch.optim.Adam(params)

Using this, you'll ensure you are loading in the same weight decay and learning rates from the saved move. it's kind of crude, but I'm sure you'll be able to fit it in nicely.

insert it into these lines.

https://github.com/jwyang/faster-rcnn.pytorch/blob/28db6d0b313220d200b739f4e22410fbe35529f4/trainval_net.py#L286-L287

Worulz on 19 Jul 2018

👍5

torch 0.4.0
I put these two lines before if args.resume:, and it works well.

ChengpengChen on 24 Sep 2018

👍4

torch 0.4.0
I put these two lines before if args.resume:, and it works well.
whats the "two lines" you said above

ssli23 on 7 Oct 2018

@ssli23 I believe he's talking about this. https://github.com/jwyang/faster-rcnn.pytorch/blob/309319998760927be35adb90747087da1da75e1f/trainval_net.py#L289-L290

then moving it here. https://github.com/jwyang/faster-rcnn.pytorch/blob/309319998760927be35adb90747087da1da75e1f/trainval_net.py#L271

Worulz on 8 Oct 2018

👍2

use the pytorch 0.3 can solve this problem.

wjx2 on 17 Nov 2018

Also solved. https://github.com/choasup/pytorch-fasterRCNN/issues/1#issuecomment-447836926

choasup on 17 Dec 2018

@wjx2 @babyjie57 This update is due to the new pytorch 0.4.

you can re-initialise the weights manually using this

model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.SGD(model.parameters(), momentum = 0.9, weight_decy = 0.0001)
optimizer.load_state_dict(checkpoint['optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

hi,I tried this solution,
but it didn't work for me,
what is the 'model' means?

fangInFBI on 29 Dec 2018

@wjx2 @babyjie57 This update is due to the new pytorch 0.4.

you can re-initialise the weights manually using this

model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.SGD(model.parameters(), momentum = 0.9, weight_decy = 0.0001)
optimizer.load_state_dict(checkpoint['optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

hi，I tried this solution, but didn't work for me

python3
pytorch0.4.0

summerZXH on 20 Feb 2019

@wjx2 @babyjie57 This update is due to the new pytorch 0.4.

you can re-initialise the weights manually using this

model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.SGD(model.parameters(), momentum = 0.9, weight_decy = 0.0001)
optimizer.load_state_dict(checkpoint['optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

torch 1.0.1
cuda10.0
I just put one line: fasterRCNN.cuda() after fasterRCNN.load_state_dict(checkpoint['model']), and it works well.

zcunyi on 22 Mar 2019

@ssli23 I believe he's talking about this.

https://github.com/jwyang/faster-rcnn.pytorch/blob/309319998760927be35adb90747087da1da75e1f/trainval_net.py#L289-L290

then moving it here.

https://github.com/jwyang/faster-rcnn.pytorch/blob/309319998760927be35adb90747087da1da75e1f/trainval_net.py#L271

That works, thank you.