Faster-rcnn.pytorch: strange runtime error: dimension specified as 0 but tensor has no dimensions

Created on 3 Jul 2018 · 5Comments · Source: jwyang/faster-rcnn.pytorch

I have 4 GPU on my machine, running training with
--dataset pascal_voc --net res101 --bs 8 --nw 4 --lr 4e-3 --lr_decay_step 8 --cuda --mGPUs
but get error:

Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth
/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py:24: UserWarning: 
    There is an imbalance between your GPUs. You may want to exclude GPU 0 which
    has less than 75% of the memory or cores of GPU 1. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
    environment variable.
  warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
/home/user/prj/pytorch-faster-rcnn/lib/model/rpn/rpn.py:68: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
/home/user/prj/pytorch-faster-rcnn/lib/model/faster_rcnn/faster_rcnn.py:98: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  cls_prob = F.softmax(cls_score)
Traceback (most recent call last):
  File "/home/user/.pycharm_helpers/pydev/pydevd.py", line 1664, in <module>
    main()
  File "/home/user/.pycharm_helpers/pydev/pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/home/user/.pycharm_helpers/pydev/pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/user/prj/pytorch-faster-rcnn/trainval_net.py", line 323, in <module>
    rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward
    return self.gather(outputs, self.output_device)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    return gather_map(outputs)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
    ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
  File "/home/user/anaconda2/envs/tensorflow/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in <lambda>
    ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
RuntimeError: dimension specified as 0 but tensor has no dimensions

Source

twangnh

Most helpful comment

@wtl-zju thank you. Works. using python3 with pytorch 0.4 in virtualenv.

Slight error in @wtl-zju.

To clarify, add these lines just before returning the values in lib/model/faster_rcnn/faster_rcnn.py

if self.training:
rpn_loss_cls = torch.unsqueeze(rpn_loss_cls, 0)
rpn_loss_bbox = torch.unsqueeze(rpn_loss_bbox, 0)
RCNN_loss_cls = torch.unsqueeze(RCNN_loss_cls, 0)
RCNN_loss_bbox = torch.unsqueeze(RCNN_loss_bbox, 0)

it is placed in the self.training as it shouldn't be training these when testing / predicting. Additionally, the variable is set to 0 which can be seen a few lines above the code.

Worulz on 4 Jul 2018

👍37

All 5 comments

Are you using Pytorch 0.4? Mine also crashed when using Pytorch 0.4 with multiple GPUs. https://github.com/pytorch/pytorch/issues/5552

wtl-zju on 3 Jul 2018

from the post you linked, seems it has been fixed and merged, however with the newest pytorch, I still get the error

twangnh on 4 Jul 2018

Same problem here. I'm not too sure how to fix this yet. using pytorch 0.4

Worulz on 4 Jul 2018

I just fixed this problem by unsqueezing RCNN_loss_cls, RCNN_loss_bbox, rpn_loss_cls, rpn_loss_cls in lib/model/faster_rcnn/faster_rcnn.py. Basically, scalar tensor in Pytorch 0.4 caused the error so you need to add one more dimension: rpn_loss_cls = torch.unsqueeze(rpn_loss_cls, 0) ... BTW I compiled Pytorch 0.4 from the source but I think it should also work if you install from conda.

wtl-zju on 4 Jul 2018

🎉9

@wtl-zju thank you. Works. using python3 with pytorch 0.4 in virtualenv.

Slight error in @wtl-zju.

To clarify, add these lines just before returning the values in lib/model/faster_rcnn/faster_rcnn.py

it is placed in the self.training as it shouldn't be training these when testing / predicting. Additionally, the variable is set to 0 which can be seen a few lines above the code.

Worulz on 4 Jul 2018

👍37

Was this page helpful?

0 / 5 - 0 ratings