Faster-rcnn.pytorch: resume the training

Created on 30 Sep 2018 · 7Comments · Source: jwyang/faster-rcnn.pytorch

when i resumed the trainval_net.py,with the args
args.resume=True
args.checksession=1
args.checkepoch=24
args.checkpoint=8958
but met the error below:
Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for
argument #4'other'

my pytorch version==0.4.0 and python==2.7
i have change the pytorch version to 0.3.0 and 0.3.1,but it doesn't work.
Does anyone know how to solve it?

Source

vickersmith

👍2

Most helpful comment

As suggested by @ssli23 and many others, I have moved these two line above.

jwyang on 8 Oct 2018

👍5

All 7 comments

I think the model was trained using gpu but you are not using the gpu to continue training. Have you tried using --cuda option ?

cancam on 30 Sep 2018

i have set the option args.cuda=True,but it doesn't work.here is my option:

args.cuda=True
args.gpu_id=1
args.max_epochs=100
args.net='vgg16'

args.resume=True
args.checksession=1
args.checkepoch=24
args.checkpoint=8958
@cancam

vickersmith on 30 Sep 2018

How about trying mGPUs option? @vickersmith

cancam on 30 Sep 2018

i tried,it still doesn't work.Also,by the way,the pytorch version==0.4.0 doesn't support muitiple gpus training by mGPU options. i tried and other issues in this github also showed that multiple gpus can not be run in the code. @cancam

vickersmith on 30 Sep 2018

@vickersmith
did you try adding this in your train script?

import torch._utils
try:
    torch._utils._rebuild_tensor_v2
except AttributeError:
    def _rebuild_tensor_v2(storage, storage_offset, size, stride, requires_grad, backward_hooks):
        tensor = torch._utils._rebuild_tensor(storage, storage_offset, size, stride)
        tensor.requires_grad = requires_grad
        tensor._backward_hooks = backward_hooks
        return tensor
    torch._utils._rebuild_tensor_v2 = _rebuild_tensor_v2