Detectron2: Can't export cascade_mask_rcnn model to caffe2

Created on 5 Mar 2020  路  8Comments  路  Source: facebookresearch/detectron2

When loading Misc/cascade_mask_rcnn_R_50_FPN_1x.yaml and using caffe2_converter.py, error occur.

/work/detectron2_repo/detectron2/export/c10.py:29: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert tensor.dim() == 2 and tensor.size(-1) in [4, 5], tensor.size()
/work/detectron2_repo/detectron2/export/c10.py:92: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return len(self.indices)
/work/detectron2_repo/detectron2/modeling/roi_heads/fast_rcnn.py:270: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
num_pred = len(self.proposals)
Traceback (most recent call last):
File "/work/detectron2_repo/detectron2/export/caffe2_export.py", line 60, in export_onnx_model
operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK,
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/__init__.py", line 148, in export
strip_doc_string, dynamic_axes, keep_initializers_as_inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 66, in export
dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 416, in _export
fixed_batch_size=fixed_batch_size)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 279, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 236, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(model, args, _force_outplace=True, _return_inputs_states=True)
File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 277, in _get_trace_graph
outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(args, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(input, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 360, in forward
self._force_outplace,
File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 347, in wrapper
outs.append(self.inner(trace_inputs))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 530, in __call__
result = self._slow_forward(
input, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(
input, *kwargs)
File "/opt/conda/lib/python3.7/contextlib.py", line 74, in inner
return func(
args, *kwds)
File "/work/detectron2_repo/detectron2/export/caffe2_modeling.py", line 267, in forward
detector_results, _ = self._wrapped_model.roi_heads(images, features, proposals)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 530, in __call__
result = self._slow_forward(
input, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(
input, **kwargs)
File "/work/detectron2_repo/detectron2/modeling/roi_heads/cascade_rcnn.py", line 95, in forward
pred_instances = self._forward_box(features, proposals)
File "/work/detectron2_repo/detectron2/modeling/roi_heads/cascade_rcnn.py", line 116, in _forward_box
head_outputs[-1].predict_boxes(), image_sizes
File "/work/detectron2_repo/detectron2/modeling/roi_heads/fast_rcnn.py", line 304, in predict_boxes
return self._predict_boxes().split(self.num_preds_per_image, dim=0)
File "/work/detectron2_repo/detectron2/modeling/roi_heads/fast_rcnn.py", line 274, in _predict_boxes
self.pred_proposal_deltas.view(num_pred * K, B),
RuntimeError: shape '[0, 5]' is invalid for input of size 4000

enhancement

Most helpful comment

While there is no official Cascade R-CNN export support, you're welcome to use this patch, which I wrote for personal usage. Note though, that I'm not planning to support it any form, so you could probably need to do some work as well.

All 8 comments

Converting a Cascade R-CNN to caffe2 is not yet supported.

@ppwwyyxx hi, any plan for Cascade R-CNN to caffe2?

@unyaaaa have you found a solution for "converting cascade rcnn to caffe2 pb"?

While there is no official Cascade R-CNN export support, you're welcome to use this patch, which I wrote for personal usage. Note though, that I'm not planning to support it any form, so you could probably need to do some work as well.

@ArutyunovG thanks for your great work, i will try this patch.

Hi @ArutyunovG Thanks for sharing your solution. You are amazing.

I did spent a while reading through your branch and I can see the overall idea is to keep overwriting a few functions/operations under FastRCNNOutputLayer, CascadeROIHeads etc.

I tried your solution and it seems working okay for some cases.

In the most of cases I got nan and inf from the predicted box delta, especially at the 3rd cascade stage.

This never happened in the torch model. It feels so weird.

I wonder whether you have ever come across this issue.

It would be great if you can provide some of your opinions.

Thanks,
Ruoding

Hi @ruodingt

Since I wrote this patch for a particular usage case and spent only two days it is untested and issues are natural.

I tried your solution and it seems working okay for some cases.

In the most of cases I got nan and inf from the predicted box delta, especially at the 3rd cascade stage.

I wonder whether you have ever come across this issue.

No, I didn't come across such behaviour.

Given how deltas are calculcated, you could probably want to check if predicted/anchor boxes have reasanoble values. For example if we obtain a predicted box with zero height log(0/h_a) will result in infinity.

This is of course just some general advice, to start looking at intermediate stages and finding when/if boxes got corrupted. As wrote before, this patch is not something I'm going to support.

Best,
Grigory

Thank you so much @ArutyunovG, general advice is what I am looking for.

After some debugging I have located that the nan actually comes from Caffe2ROIPooler. But I have no idea why this would happen. (box with zero area?)

https://github.com/facebookresearch/detectron2/blob/f887420be01c2b28a50172a31641237f4d6503aa/detectron2/export/c10.py#L264

More specifically, nan comes after the operation torch.ops._caffe2.RoIAlign

https://github.com/facebookresearch/detectron2/blob/f887420be01c2b28a50172a31641237f4d6503aa/detectron2/export/c10.py#L325

https://github.com/facebookresearch/detectron2/blob/f887420be01c2b28a50172a31641237f4d6503aa/detectron2/export/c10.py#L328

Hi @ppwwyyxx, may I have some general guidance from you on how to avoid getting nan from torch.ops._caffe2.RoIAlign ?

Thanks,
Ruoding

Was this page helpful?
0 / 5 - 0 ratings

Related issues

GiovanniPasq picture GiovanniPasq  路  3Comments

LotharTUM picture LotharTUM  路  3Comments

choasup picture choasup  路  3Comments

aminekechaou picture aminekechaou  路  3Comments

kl720 picture kl720  路  3Comments