Mmdetection: Weird errors when testing with TTA

Created on 9 Apr 2020 · 7Comments · Source: open-mmlab/mmdetection

I've found two potential bugs when running tools/test.py with test time augmentation.

I am using Python 3.7.7, Pytorch 1.4.0, CUDA 10.0, CUDNN 7.6.3 and MMDet V2.0 (commit 7ed8d51).

I trained a Mask R-CNN on my custom dataset, and I'm sure that training and validating with EvalHooks (without TTA) works very well.

I'd like to test my model with TTA, so I simply set flip=True for MultiScaleFlipAug in the config file and run tools/test.py, and the error traceback is as follows.

Traceback (most recent call last):
  File "tools/test.py", line 170, in <module>
    main()
  File "tools/test.py", line 148, in main
    outputs = single_gpu_test(model, data_loader, args.show)
  File "/home/ly/mmdetection/mmdet/apis/test.py", line 19, in single_gpu_test
    result = model(return_loss=False, rescale=not show, **data)
  File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ly/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/ly/mmdetection/mmdet/models/detectors/base.py", line 156, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/home/ly/mmdetection/mmdet/models/detectors/base.py", line 141, in forward_test
    return self.aug_test(imgs, img_metas, **kwargs)
  File "/home/ly/mmdetection/mmdet/models/detectors/two_stage.py", line 205, in aug_test
    x, proposal_list, img_metas, rescale=rescale)
  File "/home/ly/mmdetection/mmdet/models/roi_heads/standard_roi_head.py", line 268, in aug_test
    self.test_cfg)
  File "/home/ly/mmdetection/mmdet/models/roi_heads/test_mixins.py", line 100, in aug_test_bboxes
    aug_bboxes, aug_scores, img_metas, rcnn_test_cfg)
  File "/home/ly/mmdetection/mmdet/core/post_processing/merge_augs.py", line 65, in merge_aug_bboxes
    bboxes = torch.stack(recovered_bboxes).mean(dim=0)
RuntimeError: stack expects a non-empty TensorList

I also noticed that the error above is raised because the input feats of aug_test_bboxes is empty. So I debugged it by adding print(list(x)) before and after the following line
https://github.com/open-mmlab/mmdetection/blob/2082430bfca2fe677e674e9ae0dfaf9707210269/mmdet/models/detectors/two_stage.py#L203
Before this line, the extracted features can successfully be printed, but the value of list(x) becomes [] after this line.

Simply replacing
https://github.com/open-mmlab/mmdetection/blob/2082430bfca2fe677e674e9ae0dfaf9707210269/mmdet/models/detectors/base.py#L49
with return [self.extract_feat(img) for img in imgs] solves the problem, but I'm still confused about why it would happen.

After that, another error was raised (my custom dataset has 15 classes):

Traceback (most recent call last):
  File "tools/test.py", line 170, in <module>
    main()
  File "tools/test.py", line 148, in main
    outputs = single_gpu_test(model, data_loader, args.show)
  File "/home/ly/mmdetection/mmdet/apis/test.py", line 19, in single_gpu_test
    result = model(return_loss=False, rescale=not show, **data)
  File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ly/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/ly/mmdetection/mmdet/models/detectors/base.py", line 155, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/home/ly/mmdetection/mmdet/models/detectors/base.py", line 140, in forward_test
    return self.aug_test(imgs, img_metas, **kwargs)
  File "/home/ly/mmdetection/mmdet/models/detectors/two_stage.py", line 205, in aug_test
    x, proposal_list, img_metas, rescale=rescale)
  File "/home/ly/mmdetection/mmdet/models/roi_heads/standard_roi_head.py", line 268, in aug_test
    self.test_cfg)
  File "/home/ly/mmdetection/mmdet/models/roi_heads/test_mixins.py", line 100, in aug_test_bboxes
    aug_bboxes, aug_scores, img_metas, rcnn_test_cfg)
  File "/home/ly/mmdetection/mmdet/core/post_processing/merge_augs.py", line 63, in merge_aug_bboxes
    bboxes = bbox_mapping_back(bboxes, img_shape, scale_factor, flip)
  File "/home/ly/mmdetection/mmdet/core/bbox/transforms.py", line 146, in bbox_mapping_back
    new_bboxes = new_bboxes / new_bboxes.new_tensor(scale_factor)
RuntimeError: The size of tensor a (60) must match the size of tensor b (4) at non-singleton dimension 1

Replacing
https://github.com/open-mmlab/mmdetection/blob/2082430bfca2fe677e674e9ae0dfaf9707210269/mmdet/core/bbox/transforms.py#L145
with new_bboxes = new_bboxes / new_bboxes.new_tensor(scale_factor).repeat(1, int(new_bboxes.shape[1] / len(scale_factor))) also solves the problem.

Source

yeliudev

Most helpful comment

I encountered this problem too, and the problem seems to be caused in the following piece of code in models/detectors/two_stage.py

def aug_test(self, imgs, img_metas, rescale=False):
    """Test with augmentations.

    If rescale is False, then returned bboxes and masks will fit the scale
    of imgs[0].
    """
    # recompute feats to save memory
    x = self.extract_feats(imgs)
    proposal_list = self.aug_test_rpn(x, img_metas)
    return self.roi_head.aug_test(
        x, proposal_list, img_metas, rescale=rescale)

x is a generator and passed to self.aug_test_rpn. When passing it to self.roi_head.aug_test, it has reached its end. I simply fix it with the following.

def aug_test(self, imgs, img_metas, rescale=False):
    """Test with augmentations.

    If rescale is False, then returned bboxes and masks will fit the scale
    of imgs[0].
    """
    # recompute feats to save memory
    x = self.extract_feats(imgs)
    y = self.extract_feats(imgs)
    proposal_list = self.aug_test_rpn(x, img_metas)
    return self.roi_head.aug_test(
        y, proposal_list, img_metas, rescale=rescale)

zsy0016 on 17 May 2020

👍4 😄1

All 7 comments

Was it not solved in latest version? I faced same issue when I turned on flip aug or multiscale aug.
I tried @c1aris 's solution which replace to return [self.extract_feat(img) for img in imgs], but somehow performance was significantly droped.

bamps53 on 13 May 2020

I encountered this problem too, and the problem seems to be caused in the following piece of code in models/detectors/two_stage.py

def aug_test(self, imgs, img_metas, rescale=False):
    """Test with augmentations.

    If rescale is False, then returned bboxes and masks will fit the scale
    of imgs[0].
    """
    # recompute feats to save memory
    x = self.extract_feats(imgs)
    proposal_list = self.aug_test_rpn(x, img_metas)
    return self.roi_head.aug_test(
        x, proposal_list, img_metas, rescale=rescale)

x is a generator and passed to self.aug_test_rpn. When passing it to self.roi_head.aug_test, it has reached its end. I simply fix it with the following.

def aug_test(self, imgs, img_metas, rescale=False):
    """Test with augmentations.

    If rescale is False, then returned bboxes and masks will fit the scale
    of imgs[0].
    """
    # recompute feats to save memory
    x = self.extract_feats(imgs)
    y = self.extract_feats(imgs)
    proposal_list = self.aug_test_rpn(x, img_metas)
    return self.roi_head.aug_test(
        y, proposal_list, img_metas, rescale=rescale)

zsy0016 on 17 May 2020

👍4 😄1

@bamps53 Thanks for your reporting! Have you found that why the performance drops?

yeliudev on 17 May 2020

@c1aris I haven't figured out yet, but it's due to flip aug, not resize. It seems only image or label was fliped. Still investigating.

bamps53 on 22 May 2020

@bamps53 Okay, thanks a lot!

yeliudev on 22 May 2020

@c1aris By the way did you find why this code originally yield single image feature??
https://github.com/open-mmlab/mmdetection/blob/2082430bfca2fe677e674e9ae0dfaf9707210269/mmdet/models/detectors/base.py#L48-L49

I replaced with return [self.extract_feat(img) for img in imgs] as you recommended and it seems working, but I couldn't understand original intension..

bamps53 on 22 May 2020

@bamps53 I have no idea either, maybe you can refer to @zsy0016 's explanation.

yeliudev on 23 May 2020

Was this page helpful?

0 / 5 - 0 ratings