Maskrcnn-benchmark: RuntimeError: The size of tensor a (81) must match the size of tensor b (4) at non-singleton dimension 0

Created on 3 Jan 2019 · 9Comments · Source: facebookresearch/maskrcnn-benchmark

❓ Questions and Help

Hi,
i am trying to use detectron to train my own dataset (voc style). And i had changed the MODEL.ROI_BOX_HEAD.NUM_CLASSES to 4, renamed the layers name in roi_box_predictors.
The config file is e2e_faster_rcnn_R_50_FPN_1x.yaml.
But when i try to train it with pre_train model, it give error during FPN forward process:
2019-01-03 15:10:33,414 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/WorkSpace/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 77, in do_train
optimizer.step()
File "/usr/local/miniconda2/envs/pytorchenv/lib/python3.6/site-packages/torch/optim/sgd.py", line 101, in step
buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: The size of tensor a (81) must match the size of tensor b (4) at non-singleton dimension 0

My English is not good . Please forgive me.

Source

GuoxiongZhang

Most helpful comment

This error occurred because the pre-trained optimizer and scheduler were load when the model was load. I fix it by comment the code in utils/checkpoint.py as follows:
def load(self, f=None):
if self.has_checkpoint():
# override argument with existing checkpoint
f = self.get_checkpoint_file()
if not f:
# no checkpoint could be found
self.logger.info("No checkpoint found. Initializing model from scratch")
return {}
self.logger.info("Loading checkpoint from {}".format(f))
checkpoint = self._load_file(f)
self._load_model(checkpoint)
# if "optimizer" in checkpoint and self.optimizer:
# self.logger.info("Loading optimizer from {}".format(f))
# self.optimizer.load_state_dict(checkpoint.pop("optimizer"))
# if "scheduler" in checkpoint and self.scheduler:
# self.logger.info("Loading scheduler from {}".format(f))
# self.scheduler.load_state_dict(checkpoint.pop("scheduler"))

    # return any further checkpoint data
    return checkpoint

GuoxiongZhang on 10 Jan 2019

👍3

All 9 comments

Hi,

Did you fix your problem?

fmassa on 7 Jan 2019

Hi Guoxiong and fmassa.
I got the same problem about tensor shape mismatch after change NUM_CLASSES = 6. Would you mind help me on this?

File "/usr/local/miniconda2/envs/pytorchenv/lib/python3.6/site-packages/torch/optim/sgd.py", line 101, in step
buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: The size of tensor a (81) must match the size of tensor b (6) at non-singleton dimension 0

impromptuRong on 10 Jan 2019

    # return any further checkpoint data
    return checkpoint

GuoxiongZhang on 10 Jan 2019

👍3

Thank you so much for your help! That's really really helpful!!
Instead of touching the code, I removed the 'optimizer', 'scheduler', 'iteration' from pretrained weights.

impromptuRong on 10 Jan 2019

👍1

Cool, glad to understand what the issue (and the solution) were.

fmassa on 10 Jan 2019

I meet the same eoor!
copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([2]).

chenxu93 on 11 Jan 2019

You need to do some model surgery in the pre-trained weights of your model if you want to change the number of classes while using a pre-trained detection model.
See https://github.com/facebookresearch/maskrcnn-benchmark/pull/324 for example

fmassa on 11 Jan 2019

Hi,
I also have same issue. Could you please tell how to fix it?

tehreemnaqvi on 15 Jul 2020

------------------ 原始邮件 ------------------
发件人: "facebookresearch/maskrcnn-benchmark" <[email protected]>;
发送时间: 2020年7月15日(星期三) 晚上7:14
收件人: "facebookresearch/maskrcnn-benchmark"<[email protected]>;
抄送: "Guoxiong Zhang"<[email protected]>;"State change"<[email protected]>;
主题: Re: [facebookresearch/maskrcnn-benchmark] RuntimeError: The size of tensor a (81) must match the size of tensor b (4) at non-singleton dimension 0 (#317)

I don't know where your mistake is, it might help you.

override argument with existing checkpoint

f = self.get_checkpoint_file()
if not f:

no checkpoint could be found

self.logger.info("No checkpoint found. Initializing model from scratch")
return {}
self.logger.info("Loading checkpoint from {}".format(f))
checkpoint = self._load_file(f)
self._load_model(checkpoint)

if "optimizer" in checkpoint and self.optimizer:

self.logger.info("Loading optimizer from {}".format(f))

self.optimizer.load_state_dict(checkpoint.pop("optimizer"))

if "scheduler" in checkpoint and self.scheduler:

self.logger.info("Loading scheduler from {}".format(f))

self.scheduler.load_state_dict(checkpoint.pop("scheduler"))

# return any further checkpoint data     return checkpoint

Hi,
I also have same issue. Could you please tell how to fix it?

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or unsubscribe.

GuoxiongZhang on 16 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Run coco panoptic dataset

YuShen1116 · 4Comments

Loss is nan when trying to fine-tune all layers (FREEZE_CONV_BODY_AT: 0)

mrteera · 3Comments

OSError: [Errno 24] Too many open files

zimenglan-sysu-512 · 3Comments

size mismatch

CF2220160244 · 3Comments

Get 0 AP and AR when testing, and the inference result is very bad.

KuribohG · 3Comments