When loading Misc/cascade_mask_rcnn_R_50_FPN_1x.yaml and using caffe2_converter.py, error occur.
/work/detectron2_repo/detectron2/export/c10.py:29: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert tensor.dim() == 2 and tensor.size(-1) in [4, 5], tensor.size()
/work/detectron2_repo/detectron2/export/c10.py:92: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return len(self.indices)
/work/detectron2_repo/detectron2/modeling/roi_heads/fast_rcnn.py:270: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
num_pred = len(self.proposals)
Traceback (most recent call last):
File "/work/detectron2_repo/detectron2/export/caffe2_export.py", line 60, in export_onnx_model
operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK,
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/__init__.py", line 148, in export
strip_doc_string, dynamic_axes, keep_initializers_as_inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 66, in export
dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 416, in _export
fixed_batch_size=fixed_batch_size)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 279, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 236, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(model, args, _force_outplace=True, _return_inputs_states=True)
File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 277, in _get_trace_graph
outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(args, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(input, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 360, in forward
self._force_outplace,
File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 347, in wrapper
outs.append(self.inner(trace_inputs))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 530, in __call__
result = self._slow_forward(input, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(input, *kwargs)
File "/opt/conda/lib/python3.7/contextlib.py", line 74, in inner
return func(args, *kwds)
File "/work/detectron2_repo/detectron2/export/caffe2_modeling.py", line 267, in forward
detector_results, _ = self._wrapped_model.roi_heads(images, features, proposals)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 530, in __call__
result = self._slow_forward(input, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(input, **kwargs)
File "/work/detectron2_repo/detectron2/modeling/roi_heads/cascade_rcnn.py", line 95, in forward
pred_instances = self._forward_box(features, proposals)
File "/work/detectron2_repo/detectron2/modeling/roi_heads/cascade_rcnn.py", line 116, in _forward_box
head_outputs[-1].predict_boxes(), image_sizes
File "/work/detectron2_repo/detectron2/modeling/roi_heads/fast_rcnn.py", line 304, in predict_boxes
return self._predict_boxes().split(self.num_preds_per_image, dim=0)
File "/work/detectron2_repo/detectron2/modeling/roi_heads/fast_rcnn.py", line 274, in _predict_boxes
self.pred_proposal_deltas.view(num_pred * K, B),
RuntimeError: shape '[0, 5]' is invalid for input of size 4000
Converting a Cascade R-CNN to caffe2 is not yet supported.
@ppwwyyxx hi, any plan for Cascade R-CNN to caffe2?
@unyaaaa have you found a solution for "converting cascade rcnn to caffe2 pb"?
While there is no official Cascade R-CNN export support, you're welcome to use this patch, which I wrote for personal usage. Note though, that I'm not planning to support it any form, so you could probably need to do some work as well.
@ArutyunovG thanks for your great work, i will try this patch.
Hi @ArutyunovG Thanks for sharing your solution. You are amazing.
I did spent a while reading through your branch and I can see the overall idea is to keep overwriting a few functions/operations under FastRCNNOutputLayer, CascadeROIHeads etc.
I tried your solution and it seems working okay for some cases.
In the most of cases I got nan and inf from the predicted box delta, especially at the 3rd cascade stage.
This never happened in the torch model. It feels so weird.
I wonder whether you have ever come across this issue.
It would be great if you can provide some of your opinions.
Thanks,
Ruoding
Hi @ruodingt
Since I wrote this patch for a particular usage case and spent only two days it is untested and issues are natural.
I tried your solution and it seems working okay for some cases.
In the most of cases I got
nanandinffrom the predicted box delta, especially at the 3rd cascade stage.
I wonder whether you have ever come across this issue.
No, I didn't come across such behaviour.
Given how deltas are calculcated, you could probably want to check if predicted/anchor boxes have reasanoble values. For example if we obtain a predicted box with zero height log(0/h_a) will result in infinity.
This is of course just some general advice, to start looking at intermediate stages and finding when/if boxes got corrupted. As wrote before, this patch is not something I'm going to support.
Best,
Grigory
Thank you so much @ArutyunovG, general advice is what I am looking for.
After some debugging I have located that the nan actually comes from Caffe2ROIPooler. But I have no idea why this would happen. (box with zero area?)
More specifically, nan comes after the operation torch.ops._caffe2.RoIAlign
Hi @ppwwyyxx, may I have some general guidance from you on how to avoid getting nan from torch.ops._caffe2.RoIAlign ?
Thanks,
Ruoding
Most helpful comment
While there is no official Cascade R-CNN export support, you're welcome to use this patch, which I wrote for personal usage. Note though, that I'm not planning to support it any form, so you could probably need to do some work as well.