Detectron2: Can't export cascade_mask_rcnn model to caffe2

Created on 5 Mar 2020 · 8Comments · Source: facebookresearch/detectron2

When loading Misc/cascade_mask_rcnn_R_50_FPN_1x.yaml and using caffe2_converter.py, error occur.

/work/detectron2_repo/detectron2/export/c10.py:29: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert tensor.dim() == 2 and tensor.size(-1) in [4, 5], tensor.size()
/work/detectron2_repo/detectron2/export/c10.py:92: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return len(self.indices)
/work/detectron2_repo/detectron2/modeling/roi_heads/fast_rcnn.py:270: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
num_pred = len(self.proposals)
Traceback (most recent call last):
File "/work/detectron2_repo/detectron2/export/caffe2_export.py", line 60, in export_onnx_model
operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK,
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/__init__.py", line 148, in export
strip_doc_string, dynamic_axes, keep_initializers_as_inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 66, in export
dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 416, in _export
fixed_batch_size=fixed_batch_size)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 279, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 236, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(model, args, _force_outplace=True, _return_inputs_states=True)
File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 277, in _get_trace_graph
outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(args, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(input, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 360, in forward
self._force_outplace,
File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 347, in wrapper
outs.append(self.inner(trace_inputs))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 530, in __call__
result = self._slow_forward(input, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(input, *kwargs)
File "/opt/conda/lib/python3.7/contextlib.py", line 74, in inner
return func(args, *kwds)
File "/work/detectron2_repo/detectron2/export/caffe2_modeling.py", line 267, in forward
detector_results, _ = self._wrapped_model.roi_heads(images, features, proposals)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 530, in __call__
result = self._slow_forward(input, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(input, **kwargs)
File "/work/detectron2_repo/detectron2/modeling/roi_heads/cascade_rcnn.py", line 95, in forward
pred_instances = self._forward_box(features, proposals)
File "/work/detectron2_repo/detectron2/modeling/roi_heads/cascade_rcnn.py", line 116, in _forward_box
head_outputs[-1].predict_boxes(), image_sizes
File "/work/detectron2_repo/detectron2/modeling/roi_heads/fast_rcnn.py", line 304, in predict_boxes
return self._predict_boxes().split(self.num_preds_per_image, dim=0)
File "/work/detectron2_repo/detectron2/modeling/roi_heads/fast_rcnn.py", line 274, in _predict_boxes
self.pred_proposal_deltas.view(num_pred * K, B),
RuntimeError: shape '[0, 5]' is invalid for input of size 4000

enhancement

Source

unyaaaa

Most helpful comment

While there is no official Cascade R-CNN export support, you're welcome to use this patch, which I wrote for personal usage. Note though, that I'm not planning to support it any form, so you could probably need to do some work as well.

ArutyunovG on 29 May 2020

👍5 ❤1

All 8 comments

Converting a Cascade R-CNN to caffe2 is not yet supported.

ppwwyyxx on 5 Mar 2020

@ppwwyyxx hi, any plan for Cascade R-CNN to caffe2?

congjianting on 7 May 2020

@unyaaaa have you found a solution for "converting cascade rcnn to caffe2 pb"?

congjianting on 8 May 2020

ArutyunovG on 29 May 2020

👍5 ❤1

@ArutyunovG thanks for your great work, i will try this patch.

congjianting on 1 Jun 2020

Hi @ArutyunovG Thanks for sharing your solution. You are amazing.

I did spent a while reading through your branch and I can see the overall idea is to keep overwriting a few functions/operations under FastRCNNOutputLayer, CascadeROIHeads etc.

I tried your solution and it seems working okay for some cases.

In the most of cases I got nan and inf from the predicted box delta, especially at the 3rd cascade stage.

This never happened in the torch model. It feels so weird.

I wonder whether you have ever come across this issue.

It would be great if you can provide some of your opinions.

Thanks,
Ruoding

ruodingt on 27 Aug 2020

Hi @ruodingt

Since I wrote this patch for a particular usage case and spent only two days it is untested and issues are natural.

I tried your solution and it seems working okay for some cases.

In the most of cases I got nan and inf from the predicted box delta, especially at the 3rd cascade stage.

I wonder whether you have ever come across this issue.

No, I didn't come across such behaviour.

Given how deltas are calculcated, you could probably want to check if predicted/anchor boxes have reasanoble values. For example if we obtain a predicted box with zero height log(0/h_a) will result in infinity.

This is of course just some general advice, to start looking at intermediate stages and finding when/if boxes got corrupted. As wrote before, this patch is not something I'm going to support.

Best,
Grigory

ArutyunovG on 27 Aug 2020

Thank you so much @ArutyunovG, general advice is what I am looking for.

After some debugging I have located that the nan actually comes from Caffe2ROIPooler. But I have no idea why this would happen. (box with zero area?)

https://github.com/facebookresearch/detectron2/blob/f887420be01c2b28a50172a31641237f4d6503aa/detectron2/export/c10.py#L264

More specifically, nan comes after the operation torch.ops._caffe2.RoIAlign

https://github.com/facebookresearch/detectron2/blob/f887420be01c2b28a50172a31641237f4d6503aa/detectron2/export/c10.py#L325

https://github.com/facebookresearch/detectron2/blob/f887420be01c2b28a50172a31641237f4d6503aa/detectron2/export/c10.py#L328

Hi @ppwwyyxx, may I have some general guidance from you on how to avoid getting nan from torch.ops._caffe2.RoIAlign ?

Thanks,
Ruoding

ruodingt on 27 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings