I've found two potential bugs when running tools/test.py with test time augmentation.
I am using Python 3.7.7, Pytorch 1.4.0, CUDA 10.0, CUDNN 7.6.3 and MMDet V2.0 (commit 7ed8d51).
I trained a Mask R-CNN on my custom dataset, and I'm sure that training and validating with EvalHooks (without TTA) works very well.
I'd like to test my model with TTA, so I simply set flip=True for MultiScaleFlipAug in the config file and run tools/test.py, and the error traceback is as follows.
Traceback (most recent call last):
File "tools/test.py", line 170, in <module>
main()
File "tools/test.py", line 148, in main
outputs = single_gpu_test(model, data_loader, args.show)
File "/home/ly/mmdetection/mmdet/apis/test.py", line 19, in single_gpu_test
result = model(return_loss=False, rescale=not show, **data)
File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/ly/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
return old_func(*args, **kwargs)
File "/home/ly/mmdetection/mmdet/models/detectors/base.py", line 156, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/ly/mmdetection/mmdet/models/detectors/base.py", line 141, in forward_test
return self.aug_test(imgs, img_metas, **kwargs)
File "/home/ly/mmdetection/mmdet/models/detectors/two_stage.py", line 205, in aug_test
x, proposal_list, img_metas, rescale=rescale)
File "/home/ly/mmdetection/mmdet/models/roi_heads/standard_roi_head.py", line 268, in aug_test
self.test_cfg)
File "/home/ly/mmdetection/mmdet/models/roi_heads/test_mixins.py", line 100, in aug_test_bboxes
aug_bboxes, aug_scores, img_metas, rcnn_test_cfg)
File "/home/ly/mmdetection/mmdet/core/post_processing/merge_augs.py", line 65, in merge_aug_bboxes
bboxes = torch.stack(recovered_bboxes).mean(dim=0)
RuntimeError: stack expects a non-empty TensorList
I also noticed that the error above is raised because the input feats of aug_test_bboxes is empty. So I debugged it by adding print(list(x)) before and after the following line
https://github.com/open-mmlab/mmdetection/blob/2082430bfca2fe677e674e9ae0dfaf9707210269/mmdet/models/detectors/two_stage.py#L203
Before this line, the extracted features can successfully be printed, but the value of list(x) becomes [] after this line.
Simply replacing
https://github.com/open-mmlab/mmdetection/blob/2082430bfca2fe677e674e9ae0dfaf9707210269/mmdet/models/detectors/base.py#L49
with return [self.extract_feat(img) for img in imgs] solves the problem, but I'm still confused about why it would happen.
After that, another error was raised (my custom dataset has 15 classes):
Traceback (most recent call last):
File "tools/test.py", line 170, in <module>
main()
File "tools/test.py", line 148, in main
outputs = single_gpu_test(model, data_loader, args.show)
File "/home/ly/mmdetection/mmdet/apis/test.py", line 19, in single_gpu_test
result = model(return_loss=False, rescale=not show, **data)
File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/ly/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/ly/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
return old_func(*args, **kwargs)
File "/home/ly/mmdetection/mmdet/models/detectors/base.py", line 155, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/ly/mmdetection/mmdet/models/detectors/base.py", line 140, in forward_test
return self.aug_test(imgs, img_metas, **kwargs)
File "/home/ly/mmdetection/mmdet/models/detectors/two_stage.py", line 205, in aug_test
x, proposal_list, img_metas, rescale=rescale)
File "/home/ly/mmdetection/mmdet/models/roi_heads/standard_roi_head.py", line 268, in aug_test
self.test_cfg)
File "/home/ly/mmdetection/mmdet/models/roi_heads/test_mixins.py", line 100, in aug_test_bboxes
aug_bboxes, aug_scores, img_metas, rcnn_test_cfg)
File "/home/ly/mmdetection/mmdet/core/post_processing/merge_augs.py", line 63, in merge_aug_bboxes
bboxes = bbox_mapping_back(bboxes, img_shape, scale_factor, flip)
File "/home/ly/mmdetection/mmdet/core/bbox/transforms.py", line 146, in bbox_mapping_back
new_bboxes = new_bboxes / new_bboxes.new_tensor(scale_factor)
RuntimeError: The size of tensor a (60) must match the size of tensor b (4) at non-singleton dimension 1
Replacing
https://github.com/open-mmlab/mmdetection/blob/2082430bfca2fe677e674e9ae0dfaf9707210269/mmdet/core/bbox/transforms.py#L145
with new_bboxes = new_bboxes / new_bboxes.new_tensor(scale_factor).repeat(1, int(new_bboxes.shape[1] / len(scale_factor))) also solves the problem.
Was it not solved in latest version? I faced same issue when I turned on flip aug or multiscale aug.
I tried @c1aris 's solution which replace to return [self.extract_feat(img) for img in imgs], but somehow performance was significantly droped.
I encountered this problem too, and the problem seems to be caused in the following piece of code in models/detectors/two_stage.py
def aug_test(self, imgs, img_metas, rescale=False):
"""Test with augmentations.
If rescale is False, then returned bboxes and masks will fit the scale
of imgs[0].
"""
# recompute feats to save memory
x = self.extract_feats(imgs)
proposal_list = self.aug_test_rpn(x, img_metas)
return self.roi_head.aug_test(
x, proposal_list, img_metas, rescale=rescale)
x is a generator and passed to self.aug_test_rpn. When passing it to self.roi_head.aug_test, it has reached its end. I simply fix it with the following.
def aug_test(self, imgs, img_metas, rescale=False):
"""Test with augmentations.
If rescale is False, then returned bboxes and masks will fit the scale
of imgs[0].
"""
# recompute feats to save memory
x = self.extract_feats(imgs)
y = self.extract_feats(imgs)
proposal_list = self.aug_test_rpn(x, img_metas)
return self.roi_head.aug_test(
y, proposal_list, img_metas, rescale=rescale)
@bamps53 Thanks for your reporting! Have you found that why the performance drops?
@c1aris I haven't figured out yet, but it's due to flip aug, not resize. It seems only image or label was fliped. Still investigating.
@bamps53 Okay, thanks a lot!
@c1aris By the way did you find why this code originally yield single image feature??
https://github.com/open-mmlab/mmdetection/blob/2082430bfca2fe677e674e9ae0dfaf9707210269/mmdet/models/detectors/base.py#L48-L49
I replaced with return [self.extract_feat(img) for img in imgs] as you recommended and it seems working, but I couldn't understand original intension..
@bamps53 I have no idea either, maybe you can refer to @zsy0016 's explanation.
Most helpful comment
I encountered this problem too, and the problem seems to be caused in the following piece of code in
models/detectors/two_stage.pyxis a generator and passed toself.aug_test_rpn. When passing it toself.roi_head.aug_test, it has reached its end. I simply fix it with the following.