Maskrcnn-benchmark: RuntimeError: The size of tensor a (81) must match the size of tensor b (4) at non-singleton dimension 0

Created on 3 Jan 2019  ·  9Comments  ·  Source: facebookresearch/maskrcnn-benchmark

❓ Questions and Help

Hi,
i am trying to use detectron to train my own dataset (voc style). And i had changed the MODEL.ROI_BOX_HEAD.NUM_CLASSES to 4, renamed the layers name in roi_box_predictors.
The config file is e2e_faster_rcnn_R_50_FPN_1x.yaml.
But when i try to train it with pre_train model, it give error during FPN forward process:
2019-01-03 15:10:33,414 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/WorkSpace/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 77, in do_train
optimizer.step()
File "/usr/local/miniconda2/envs/pytorchenv/lib/python3.6/site-packages/torch/optim/sgd.py", line 101, in step
buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: The size of tensor a (81) must match the size of tensor b (4) at non-singleton dimension 0

My English is not good . Please forgive me.

Most helpful comment

This error occurred because the pre-trained optimizer and scheduler were load when the model was load. I fix it by comment the code in utils/checkpoint.py as follows:
def load(self, f=None):
if self.has_checkpoint():
# override argument with existing checkpoint
f = self.get_checkpoint_file()
if not f:
# no checkpoint could be found
self.logger.info("No checkpoint found. Initializing model from scratch")
return {}
self.logger.info("Loading checkpoint from {}".format(f))
checkpoint = self._load_file(f)
self._load_model(checkpoint)
# if "optimizer" in checkpoint and self.optimizer:
# self.logger.info("Loading optimizer from {}".format(f))
# self.optimizer.load_state_dict(checkpoint.pop("optimizer"))
# if "scheduler" in checkpoint and self.scheduler:
# self.logger.info("Loading scheduler from {}".format(f))
# self.scheduler.load_state_dict(checkpoint.pop("scheduler"))

    # return any further checkpoint data
    return checkpoint

All 9 comments

Hi,

Did you fix your problem?

Hi Guoxiong and fmassa.
I got the same problem about tensor shape mismatch after change NUM_CLASSES = 6. Would you mind help me on this?

File "/usr/local/miniconda2/envs/pytorchenv/lib/python3.6/site-packages/torch/optim/sgd.py", line 101, in step
buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: The size of tensor a (81) must match the size of tensor b (6) at non-singleton dimension 0

This error occurred because the pre-trained optimizer and scheduler were load when the model was load. I fix it by comment the code in utils/checkpoint.py as follows:
def load(self, f=None):
if self.has_checkpoint():
# override argument with existing checkpoint
f = self.get_checkpoint_file()
if not f:
# no checkpoint could be found
self.logger.info("No checkpoint found. Initializing model from scratch")
return {}
self.logger.info("Loading checkpoint from {}".format(f))
checkpoint = self._load_file(f)
self._load_model(checkpoint)
# if "optimizer" in checkpoint and self.optimizer:
# self.logger.info("Loading optimizer from {}".format(f))
# self.optimizer.load_state_dict(checkpoint.pop("optimizer"))
# if "scheduler" in checkpoint and self.scheduler:
# self.logger.info("Loading scheduler from {}".format(f))
# self.scheduler.load_state_dict(checkpoint.pop("scheduler"))

    # return any further checkpoint data
    return checkpoint

Thank you so much for your help! That's really really helpful!!
Instead of touching the code, I removed the 'optimizer', 'scheduler', 'iteration' from pretrained weights.

Cool, glad to understand what the issue (and the solution) were.

I meet the same eoor!
copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([2]).

You need to do some model surgery in the pre-trained weights of your model if you want to change the number of classes while using a pre-trained detection model.
See https://github.com/facebookresearch/maskrcnn-benchmark/pull/324 for example

Hi,
I also have same issue. Could you please tell how to fix it?

image

------------------ 原始邮件 ------------------
发件人: "facebookresearch/maskrcnn-benchmark" <[email protected]>;
发送时间: 2020年7月15日(星期三) 晚上7:14
收件人: "facebookresearch/maskrcnn-benchmark"<[email protected]>;
抄送: "Guoxiong Zhang"<[email protected]>;"State change"<[email protected]>;
主题: Re: [facebookresearch/maskrcnn-benchmark] RuntimeError: The size of tensor a (81) must match the size of tensor b (4) at non-singleton dimension 0 (#317)

 I don't know where your mistake is, it might help you.

This error occurred because the pre-trained optimizer and scheduler were load when the model was load. I fix it by comment the code in utils/checkpoint.py as follows:
def load(self, f=None):
if self.has_checkpoint():

override argument with existing checkpoint

f = self.get_checkpoint_file()
if not f:

no checkpoint could be found

self.logger.info("No checkpoint found. Initializing model from scratch")
return {}
self.logger.info("Loading checkpoint from {}".format(f))
checkpoint = self._load_file(f)
self._load_model(checkpoint)

if "optimizer" in checkpoint and self.optimizer:

self.logger.info("Loading optimizer from {}".format(f))

self.optimizer.load_state_dict(checkpoint.pop("optimizer"))

if "scheduler" in checkpoint and self.scheduler:

self.logger.info("Loading scheduler from {}".format(f))

self.scheduler.load_state_dict(checkpoint.pop("scheduler"))

# return any further checkpoint data     return checkpoint

Hi,
I also have same issue. Could you please tell how to fix it?


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or unsubscribe.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

YuShen1116 picture YuShen1116  ·  4Comments

mrteera picture mrteera  ·  3Comments

zimenglan-sysu-512 picture zimenglan-sysu-512  ·  3Comments

CF2220160244 picture CF2220160244  ·  3Comments

KuribohG picture KuribohG  ·  3Comments