vision 🚀 - [JIT] Not supported for maskrcnn_resnet50_fpn

this actually looks like a bug in scale = 2 ** torch.tensor(approx_scale).log2().round().item() in torchvision/ops/poolers.py.

If approx_scale here is an exact integer, the tensor will be a LongTensor, which is unexpected.

That should be changed to torch.tensor(approx_scale, dtype=torch.float32)

soumith on 6 Jun 2019

👍1

@rbrigden as mentioned in the release notes, the detection models do not yet support JIT, in particular because we use custom ops which are not registered with the TorchScript ops.

We plan to add full JIT support for the detection models in follow-up releases.

fmassa on 6 Jun 2019

And @soumith good catch about the location of the error.
But this looks like a problem with tracing, because in https://github.com/pytorch/vision/blob/aa32c9376c46eb284f2b091f3eb98aec4fd64b03/torchvision/ops/poolers.py#L100
we force approx_scale to be a float, so the JIT should take that into account.
But a workaround solution could be to explicitly force a dtype in torch.tensor, as you mentioned

fmassa on 6 Jun 2019

@fmassa dear fmassa, what time does the detection models support support JIT? thank you

lzp0916 on 7 Jul 2019

@lzp0916 A first PyTorch PR that would enable us to start making the model TorchScript friendly has just been sent to PyTorch https://github.com/pytorch/pytorch/pull/22582

But I'd say it will still take a few months to get the detection models to support TorchScript.

cc @fbbradheintz

fmassa on 8 Jul 2019

@soumith , @fmassa I change the the code to torch.tensor(approx_scale, dtype=torch.float32) in torchvision/ops/poolers.py as soumith said.
It worked for that error. But there came another error. I think it's about the TorchScript is not supporting maskrcnn's output format
here are the logging:
RuntimeError: Only tensors or tuples of tensors can be output from traced functions (getNestedOutputTrace at /opt/conda/conda-bld/pytorch_1556653099582/work/torch/csrc/jit/tracer.cpp:200) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f7bb5b1adc5 in /home/lxs/anaconda3/envs/torchscript/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: torch::jit::tracer::getNestedOutputTrace(std::shared_ptr<torch::jit::tracer::TracingState> const&, c10::IValue const&) + 0x23e (0x7f7bb39d5cee in /home/lxs/anaconda3/envs/torchscript/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #2: torch::jit::tracer::exit(std::vector<c10::IValue, std::allocator<c10::IValue> > const&) + 0x2f (0x7f7bb39d5dbf in /home/lxs/anaconda3/envs/torchscript/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #3: <unknown function> + 0x447ab3 (0x7f7be4e3eab3 in /home/lxs/anaconda3/envs/torchscript/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: <unknown function> + 0x45a8b4 (0x7f7be4e518b4 in /home/lxs/anaconda3/envs/torchscript/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #5: <unknown function> + 0x12ce4a (0x7f7be4b23e4a in /home/lxs/anaconda3/envs/torchscript/lib/python3.6/site-packages/torch/lib/libtorch_python.so) <omitting python frames> frame #20: __libc_start_main + 0xe7 (0x7f7bf41f0b97 in /lib/x86_64-linux-gnu/libc.so.6)
And it seems too hard for me to work around it, torchvision.models.detection is such a great work, it make my code a lot easier. hope this problem can be fixed soon : )

XushengLee on 30 Jul 2019

👍1

@XushengLee adding support for TorchScript for all models in torchvision is in the plans, but it will still take a few months before we are there.

fmassa on 30 Jul 2019

@XushengLee you can fix the second error if you change how the outputs of the inference are put into a dictionary but rather just pass the tensors directly

rmzr7 on 30 Jul 2019

@remzr7 thank you for your help and I tried that, and it solved the problem of output.
But there is another error, and the logging is not as clear as before.
However, I think it regards the input format. I find out the maskrcnn in torchvision.models.detection takes in a list of channel-first image tensors at least during the evaluation, not a typical 4-D tensor.

# this snippet is from engine.py of the torchvion.models.detection 
for images, targets in metric_logger.log_every(data_loader, print_freq, header):
    images = list(image.to(device) for image in images)
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
    loss_dict = model(images, targets)
    losses = sum(loss for loss in loss_dict.values())

XushengLee on 31 Jul 2019

Oh yes, I think you can also disable the GeneralizedRCNN Transforms that
the underlying GeneralizedRCNN Class applies, but instead perform the
transformations (i.e resize/to_tensor) before you do model.forward()

On Tue, Jul 30, 2019 at 11:39 PM XushengLee notifications@github.com
wrote:

@remzr7 https://github.com/remzr7 thank you for your help and I tried
that, and it solved the problem of output.
But there is another error, and the logging is not as clear as before.
However, I think it regards the input format. I find out the maskrcnn in
torchvision.models.detection takes in a list of channel-first image
tensors at least during the evaluation, not a typical 4-D tensor.

this snippet is from engine.py of the torchvion.models.detection

for images, targets in metric_logger.log_every(data_loader, print_freq, header):
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pytorch/vision/issues/1002?email_source=notifications&email_token=ABJKSRAP22V4664N5N7KUWTQCEXRNA5CNFSM4HVHGJW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3GIAKY#issuecomment-516718635,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABJKSREGTHSXJJQHDLAJIVTQCEXRNANCNFSM4HVHGJWQ
.

rmzr7 on 31 Jul 2019

@remzr7 It doesn't seem that simple. I took the transforms in GeneralizedRCNN outside and changed the output of GeneralizedRCNN to tuples instead of a dict.

Now, it seems that I would have to change the outputs of all modules recursively, i.e.,

IntermediateLayerGetter(..) returns an OrderedDict
FeaturePyramidNetwork(..) returns an OrderedDict
BackboneWithFPN(..) returns an OrderedDict
and so on..

I changed the outputs of all of them to tuples of tensors except for IntermediateLayerGetter(..)
I have not been able to get around IntermediateLayerGetter(..) by changing the OrderedDict structure being used because torchscript at this point cannot deal with OrderedDict outputs.

@soumith @fmassa since OrderedDict outputs are being used everywhere in detection, maybe it would be easier to add torchscript support for returning OrderedDicts? Is there a quick workaround to solve this problem?

cted18 on 27 Aug 2019

@cted18 yes, OrderedDict support in torchscript is something that should be added.

And we are starting to work on adding support for maskrcnn_resnet50_fpn to work on torchscript / traceable, a first PR in this series has been sent in https://github.com/pytorch/vision/pull/1267

cc @eellison for OrderedDict support in torchscript

fmassa on 29 Aug 2019

@cted18 Yes i'll be working on adding OrderedDict to support fcn_resnet101. I think together with op support added in https://github.com/pytorch/vision/pull/1267 it shouldn't be too hard to support in script.

eellison on 29 Aug 2019

👍6

@fmassa dear fmassa, I am using torch.jit.trace to encounter an error as follows:
"RuntimeError: Tried to trace <__torch__.torchvision.ops.misc.FrozenBatchNorm2d object at 0000029EB0B365E0> but it is not part of the active trace. Modules that are called during a trace must be registered as submodules of the thing being traced."
How can I solve this problem?
windows
pytorch:1.3.0.dev20190920
torchvision:0.5.0.dev20190924
model:fasterrcnn_resnet50_fpn

lzp0916 on 24 Sep 2019

@lzp0916 this error will be solved when https://github.com/pytorch/vision/pull/1329 is merged

fmassa on 24 Sep 2019

The issue is critical for putting the model into production system. Thanks for working on this.

hhbyyh on 15 Oct 2019

2 ** torch.tensor(approx_scale).log2().round()

can someone explain why here if approx_scale < 1 it doesnt got rounded to integer? It's some hack or normal behavior?

creotiv on 18 Oct 2019

@creotiv it's an approximation, that avoids us having to manually specify what's the downscaling for layer n.

fmassa on 18 Oct 2019

@fmassa no i understand that. i mean why function round() not rounding 0.123 for example to zero(only after log function)?
Cause i dont see anything like that in docs https://pytorch.org/docs/stable/torch.html?highlight=round#torch.round, and it looking like bug

creotiv on 18 Oct 2019

And also torch.log2(2**torch.tensor(0.123,dtype=torch.float64)).round() return 0.

creotiv on 18 Oct 2019

@creotiv FYI this is unrelated to the issue (which is that maskrcnn_resnet50_fpn is not yet scriptable), but I don't understand your point.

Can you open a new issue describing with an example what you think is the problem?

fmassa on 18 Oct 2019

@fmassa already https://github.com/pytorch/pytorch/issues/28284

creotiv on 18 Oct 2019

RuntimeError: Only tensors or tuples of tensors can be output from traced functions

@XushengLee how did you get rid of the error "RuntimeError: Only tensors or tuples of tensors can be output from traced functions"? I am currently having the same issue when trying to trace Maskrcnn model from trochvision with the following script

`
import torch
import torchvision

model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
test_data = torch.rand(1, 3, 480, 640)
traced_model = torch.jit.trace(model, test_data)
`

gemmit on 22 Oct 2019

@gemmit support for tracing / scripting maskrcnn is coming soon, check https://github.com/pytorch/vision/pull/1407 and https://github.com/pytorch/vision/pull/1461

fmassa on 22 Oct 2019

@fmassa okay, thanks for the info. Will check the links

gemmit on 30 Oct 2019

@gemmit ~tracing should already be supported for maskrcnn~. Using torch.jit.script will be supported in the coming weeks

@lara-hdr I've just tried tracing maskrcnn, and I got an error

import torch, torchvision
m = torchvision.models.detection.maskrcnn_resnet50_fpn()
m.eval()

traced_model = torch.jit.trace(m, [[torch.rand(3, 300, 300)]]

I get the following error

RuntimeError: Only tensors or tuples of tensors can be output from traced functions (getOutput at /Users/distiller/project/conda/conda-bld/pytorch_1572429967983/work/torch/csrc/jit/tracer.cpp:211)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x112b608b7 in libc10.dylib)
frame #1: torch::jit::tracer::TracingState::getOutput(c10::IValue const&) + 1593 (0x11b1d8549 in libtorch.dylib)
frame #2: torch::jit::tracer::trace(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >, std::__1::function<std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> > (std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)> const&, std::__1::function<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > (torch::autograd::Variable const&)>, bool, torch::jit::script::Module*) + 1792 (0x11b1d90b0 in libtorch.dylib)
frame #3: torch::jit::tracer::createGraphByTracing(pybind11::function const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >, pybind11::function const&, bool, torch::jit::script::Module*) + 361 (0x1121829b9 in libtorch_python.dylib)
frame #4: void pybind11::cpp_function::initialize<torch::jit::script::initJitScriptBindings(_object*)::$_16, void, torch::jit::script::Module&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, pybind11::function, pybind11::tuple, pybind11::function, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(torch::jit::script::initJitScriptBindings(_object*)::$_16&&, void (*)(torch::jit::script::Module&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, pybind11::function, pybind11::tuple, pybind11::function, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) + 319 (0x1121bd20f in libtorch_python.dylib)
frame #5: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3324 (0x111c9f3fc in libtorch_python.dylib)
<omitting python frames>
frame #61: start + 1 (0x7fff6fa6d3d5 in libdyld.dylib)
frame #62: 0x0 + 2 (0x2 in ???)

I just now realized that ONNX export does not call into torch.jit.trace, but torch.jit.get_trace_graph. Hum, this is unfortunate :-/

fmassa on 30 Oct 2019

@XushengLee adding support for TorchScript for all models in torchvision is in the plans, but it will still take a few months before we are there.

Any progress about support the detection models by JIT?Thanks

stereomatchingkiss on 24 Nov 2019

@stereomatchingkiss Yes, it's almost ready, just need to fix some unrelated ONNX issues and it will be merged this week

fmassa on 25 Nov 2019

👍2

@stereomatchingkiss Yes, it's almost ready, just need to fix some unrelated ONNX issues and it will be merged this week

Thanks, glad to hear that, could we convert the model to onnx format after this merged?

stereomatchingkiss on 1 Dec 2019

@stereomatchingkiss ONNX and JIT support for Mask R-CNN in torchvision has been merged into master, and is available if you compile from source.

fmassa on 2 Dec 2019

❤3 👍1

I still cannot trace the Maskrcnn model from the latest branch.

I get this error out of the box:

scale = 2 ** float(torch.tensor(approx_scale).log2().round()) RuntimeError: log2_vml_cpu not implemented for 'Long'
Then I make changes suggested by @soumith

this actually looks like a bug in scale = 2 ** torch.tensor(approx_scale).log2().round().item() in torchvision/ops/poolers.py.

If approx_scale here is an exact integer, the tensor will be a LongTensor, which is unexpected.

That should be changed to torch.tensor(approx_scale, dtype=torch.float32)

Now I have this:

File "/../python3.6/site-packages/torchvision-0.5.0a0+5b1716a-py3.6-linux-x86_64.egg/torchvision/ops/poolers.py", line 164, in setup_scales self.map_levels = initLevelMapper(int(lvl_min), int(lvl_max)) OverflowError: cannot convert float infinity to integer

cted18 on 3 Dec 2019

@cted18 can you print torchvision.__version__? I suspect you are in an old version

fmassa on 3 Dec 2019

Sure.

torchvision.__version__ '0.5.0a0+5b1716a'

I just built it from the master.

cted18 on 3 Dec 2019

@cted18 can you share a script that reproduces the error you have?

fmassa on 4 Dec 2019

I am trying to accelerate the maskrcnn_resnet50_fpn pretrained model using JIT tracing provided by pytorch. It appears that some operations present in this model are not supported by pytorch JIT.

Are these models supposed to have JIT support officially? If not, would you be able to provide advice for a workaround?

To replicate, running:
import torch
import torchvision
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
traced_net = torch.jit.trace(model, torch.rand(1, 3,800, 800))
produces

RuntimeError: log2_vml_cpu not implemented for 'Long

Thank you.

Yes. It is the exact same script as from @rbrigden

Ubuntu 16.04
python 3.6.7
torch.__version__ '1.3.0a0+de394b6'
torchvision.__version__ '0.5.0a0+cec7ea7'

cted18 on 4 Dec 2019

@cted18 this should be fixed when https://github.com/pytorch/vision/pull/1639 get's merged

fmassa on 5 Dec 2019

Still cannot convert fasterrcnn_resnet50_fpn

Version(print(torchvision.__version__)) :

0.5.0.dev20191206

Codes:

import torch
import torchvision

print(torchvision.__version__)

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(True)
model.eval()
example = torch.rand(1, 3, 300, 400)
traced_script_module = torch.jit.trace(model, example)

Error messages:

RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
'incorrect results).', category=RuntimeWarning)
Traceback (most recent call last):
File "pytorch_conversion.py", line 14, in
traced_script_module = torch.jit.trace(model, example)
File "C:\Users\yyyy\Anaconda3\envs\pytorch_preview\lib\site-packages\torch\jit__init__.py", line 877, in trace
check_tolerance, _force_outplace, _module_class)
File "C:\Users\yyyy\Anaconda3\envs\pytorch_preview\lib\site-packages\torch\jit__init__.py", line 1029, in trace_module
module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace)
RuntimeError: Only tensors or tuples of tensors can be output from traced functions (getOutput at ..\torch\csrc\jit\tracer.cpp:212)
(no backtrace available)

OS : windows 10 64bits
installed by anaconda :

conda create --name pytorch_n python=3.7
conda activate pytorch_n
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch-nightly -c defaults -c conda-forge

Models I need:

keypointrcnn_resnet50_fpn, fasterrcnn_resnet50_fpn

stereomatchingkiss on 7 Dec 2019

@stereomatchingkiss use torch.jit.script instead of torch.jit.trace, and it should work.

model = torch.jit.script(model)

fmassa on 7 Dec 2019

@stereomatchingkiss use torch.jit.script instead of torch.jit.trace, and it should work.
model = torch.jit.script(model)

Thanks, this work, but fail to load the model of fasterrcnn_resnet50_fpn by the c++ api.
OS : ubuntu18.0.4.3 LTS 64bits
libtorch : nightly(2019/12/07)

main.cpp

#include <torch/script.h>

#include <iostream>
#include <memory>

int main(int argc, const char* argv[])
{
    if(argc != 2){
        std::cerr << "usage: example-app <path-to-exported-script-module>\n";
        return -1;
    }


    torch::jit::script::Module module;
    try {
        // Deserialize the ScriptModule from a file using torch::jit::load().
        module = torch::jit::load(argv[1]);
    }
    catch (const c10::Error& e) {
        std::cerr << "error loading the model\n";
        return -1;
    }

    std::cout << "ok\n";
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.5)

project(pytorch_test LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

find_package(Torch REQUIRED)

add_executable(pytorch_test main.cpp)
target_link_libraries(pytorch_test "${TORCH_LIBRARIES}")
set_property(TARGET pytorch_test PROPERTY CXX_STANDARD 14)

Error message:

terminate called after throwing an instance of 'torch::jit::script::ErrorReport'
  what():  
Unknown builtin op: torchvision::_new_empty_tensor_op.
Could not find any similar ops to torchvision::_new_empty_tensor_op. This op may not exist or may not be currently supported in TorchScript.
:
  File "C:\Users\yyyy\Anaconda3\envs\pytorch_preview\lib\site-packages\torchvision\ops\new_empty_tensor.py", line 16
        output (Tensor)
    """
    return torch.ops.torchvision._new_empty_tensor_op(x, shape)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Serialized   File "code/__torch__/torchvision/ops/new_empty_tensor.py", line 4
def _new_empty_tensor(x: Tensor,
    shape: List[int]) -> Tensor:
  _0 = ops.torchvision._new_empty_tensor_op(x, shape)
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  return _0
'_new_empty_tensor' is being compiled since it was called from 'interpolate'
Serialized   File "code/__torch__/torchvision/ops/misc.py", line 25
    align_corners: Optional[bool]=None) -> Tensor:
  _1 = __torch__.torchvision.ops.misc._output_size
  _2 = __torch__.torchvision.ops.new_empty_tensor._new_empty_tensor
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  _3 = uninitialized(Tensor)
  if torch.gt(torch.numel(input), 0):
'interpolate' is being compiled since it was called from 'GeneralizedRCNNTransform.resize'
Serialized   File "code/__torch__/torchvision/models/detection/transform.py", line 79
    target: Optional[Dict[str, Tensor]]) -> Tuple[Tensor, Optional[Dict[str, Tensor]]]:
    _18 = __torch__.torchvision.models.detection.transform.resize_boxes
    _19 = __torch__.torchvision.ops.misc.interpolate
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _20 = __torch__.torchvision.models.detection.transform.resize_keypoints
    _21 = uninitialized(Tuple[Tensor, Optional[Dict[str, Tensor]]])
'GeneralizedRCNNTransform.resize' is being compiled since it was called from 'GeneralizedRCNNTransform.forward'
  File "C:\Users\yyyy\Anaconda3\envs\pytorch_preview\lib\site-packages\torchvision\models\detection\transform.py", line 47
                                 "of shape [C, H, W], got {}".format(image.shape))
            image = self.normalize(image)
            image, target_index = self.resize(image, target_index)
                                  ~~~~~~~~~~~ <--- HERE
            images[i] = image
            if targets is not None and target_index is not None:
Serialized   File "code/__torch__/torchvision/models/detection/transform.py", line 29
        pass
      image0 = (self).normalize(image, )
      _2 = (self).resize(image0, target_index, )
                                 ~~~~~~~~~~~~ <--- HERE
      image1, target_index0, = _2
      _3 = torch._set_item(images0, i, image1)

Aborted (core dumped)

Edit : I download the cpp package(cpu only) about one hour ago.

stereomatchingkiss on 7 Dec 2019

@stereomatchingkiss use torch.jit.script instead of torch.jit.trace, and it should work.
model = torch.jit.script(model)

I found a solution from issue #1407, but I have another question, how could I know which op I need to register? Or I should not care about this part because in the future these op would not need to register by the end users? Thanks

static auto registry =
        torch::RegisterOperators()
                .op("torchvision::nms", &nms)
                .op("torchvision::roi_align(Tensor input, Tensor rois, float spatial_scale, int pooled_height, int pooled_width, int sampling_ratio) -> Tensor",
                    &roi_align)
                .op("torchvision::roi_pool", &roi_pool)
                .op("torchvision::_new_empty_tensor_op", &new_empty_tensor)
                .op("torchvision::ps_roi_align", &ps_roi_align)
                .op("torchvision::ps_roi_pool", &ps_roi_pool);

stereomatchingkiss on 8 Dec 2019

@stereomatchingkiss

how could I know which op I need to register?

that's a good question. I don't yet have a good answer for that, I'll discuss with @eellison to see if we can find a good solution to it

fmassa on 8 Dec 2019

@stereomatchingkiss

how could I know which op I need to register?

that's a good question. I don't yet have a good answer for that, I'll discuss with @eellison to see if we can find a good solution to it

When I copy the codes, I find another question, where could I find following headers

#include "torchvision/PSROIAlign.h"
#include "torchvision/PSROIPool.h"
#include "torchvision/ROIAlign.h"
#include "torchvision/ROIPool.h"
#include "torchvision/empty_tensor_op.h"
#include "torchvision/nms.h"

Are they generated when I compiled from source?

stereomatchingkiss on 8 Dec 2019

@stereomatchingkiss

how could I know which op I need to register?

that's a good question. I don't yet have a good answer for that, I'll discuss with @eellison to see if we can find a good solution to it

Check issue #1407 again, looks like I need to change the make file and compile it by myself in order to generate the files.
Any good news of using the models by c++ api?

stereomatchingkiss on 10 Dec 2019

@stereomatchingkiss

Any good news of using the models by c++ api?

We will be improving the experience of using the torchvision models with the C++ API over time. We have just enabled support for Mask R-CNN models to be torchscripted, and will be refining the C++ export over time

fmassa on 10 Dec 2019

👍1

@fmassa
I can script Maskrcnn parts and load them in cpp using this

model = models.detection.maskrcnn_resnet50_fpn(pretrained=True).eval()
backbone_script = torch.jit.script(model.backbone)

but when I add a wrapper around the attributes (backbone eg.) and load it on cpp, it cannot find torchvision operators.
Why might this happen?

class BackboneWrapper(torch.nn.Module):
    def __init__(self, model):
        super(BackboneWrapper, self).__init__()
        self.transform = model.transform
        self.backbone = model.backbone

    def forward(self, images, targets=None):
        # type: (List[Tensor], Optional[List[Dict[str, Tensor]]]) -> Dict[str, Dict[str, Tensor]]
        images, _ = self.transform(images, targets)
        features = self.backbone(images.tensors)
        return {'features': features}

Error:

Unknown builtin op: torchvision::_new_empty_tensor_op.
Could not find any similar ops to torchvision::_new_empty_tensor_op. This op may not exist or may not be currently supported in TorchScript.
: torchvision-0.4.2-py3.6-linux-x86_64.egg/torchvision/ops/new_empty_tensor.py", line 16
        output (Tensor)
    """
    return torch.ops.torchvision._new_empty_tensor_op(x, shape)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Serialized   File "code/__torch__/torchvision/ops/new_empty_tensor.py", line 4

cted18 on 6 Jan 2020

@cted18 I believe the solution you are looking for can be found in https://github.com/pytorch/vision/issues/1002#issuecomment-562915463 and https://github.com/pytorch/vision/pull/1407#issuecomment-563048240

If you are still facing issues, can you open a new issue with a full reproducible example of the problem?

fmassa on 8 Jan 2020

👍1

@fmassa
I used the comments from #1407, but the problem still exists.
Opened a new issue #1730
Thanks

cted18 on 8 Jan 2020

@cted18 I believe the solution you are looking for can be found in #1002 (comment) and #1407 (comment)

If you are still facing issues, can you open a new issue with a full reproducible example of the problem?

@fmassa How could I get a torchscript version of torchvision.models.detection. maskrcnn_resnet50_fpn?

torch.jit.script and torch.jit.tarce are not working with this model

With torch.jit.script

model = torch.load(modelname+"-best.pth")
model=model.cuda()
model.eval()
print(img)
with torch.no_grad():
    print(model(img))
    traced_cell = torch.jit.script(model, (img))
torch.jit.save(traced_cell, modelname+"-torchscript.pth")

loaded_trace = torch.jit.load(modelname+"-torchscript.pth")
loaded_trace.eval()
with torch.no_grad():
    print(loaded_trace(img))

TensorMask(torch.argmax(loaded_trace(img),1)).show()

Output:

TensorImage([[[[0.8961, 0.9132, 0.8789,  ..., 0.2453, 0.1939, 0.2282],
          [0.8276, 0.9132, 0.8618,  ..., 0.2282, 0.1939, 0.2282],
          [0.8961, 0.9132, 0.8789,  ..., 0.2282, 0.2282, 0.2453],
          ...,
          [0.8961, 0.8618, 0.9132,  ..., 0.4508, 0.4166, 0.3994],
          [0.9303, 0.9132, 0.9474,  ..., 0.4166, 0.4166, 0.4508],
          [0.9646, 0.8789, 0.9303,  ..., 0.3994, 0.3994, 0.3994]],

         [[1.0455, 1.0630, 1.0280,  ..., 0.3803, 0.3277, 0.3627],
          [0.9755, 1.0630, 1.0105,  ..., 0.3627, 0.3277, 0.3627],
          [1.0455, 1.0630, 1.0280,  ..., 0.3627, 0.3627, 0.3803],
          ...,
          [1.0455, 1.0105, 1.0630,  ..., 0.5903, 0.5553, 0.5378],
          [1.0805, 1.0630, 1.0980,  ..., 0.5553, 0.5553, 0.5903],
          [1.1155, 1.0280, 1.0805,  ..., 0.5378, 0.5378, 0.5378]],

         [[1.2631, 1.2805, 1.2457,  ..., 0.6008, 0.5485, 0.5834],
          [1.1934, 1.2805, 1.2282,  ..., 0.5834, 0.5485, 0.5834],
          [1.2631, 1.2805, 1.2457,  ..., 0.5834, 0.5834, 0.6008],
          ...,
          [1.2631, 1.2282, 1.2805,  ..., 0.8099, 0.7751, 0.7576],
          [1.2980, 1.2805, 1.3154,  ..., 0.7751, 0.7751, 0.8099],
          [1.3328, 1.2457, 1.2980,  ..., 0.7576, 0.7576, 0.7576]]]],
       device='cuda:0')
[{'boxes': tensor([[412.5222, 492.3208, 619.7662, 620.9233]], device='cuda:0'), 'labels': tensor([1], device='cuda:0'), 'scores': tensor([0.1527], device='cuda:0'), 'masks': tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]], device='cuda:0')}]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-23-7216a0dac5a0> in <module>
     12 loaded_trace.eval()
     13 with torch.no_grad():
---> 14     print(loaded_trace(img))
     15 
     16 TensorMask(torch.argmax(loaded_trace(img),1)).show()

~/anaconda3/envs/pro1/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    556             result = self._slow_forward(*input, **kwargs)
    557         else:
--> 558             result = self.forward(*input, **kwargs)
    559         for hook in self._forward_hooks.values():
    560             hook_result = hook(self, input, result)

RuntimeError: forward() Expected a value of type 'List[Tensor]' for argument 'images' but instead found type 'TensorImage'.
Position: 1
Value: TensorImage([[[[0.8961, 0.9132, 0.8789,  ..., 0.2453, 0.1939, 0.2282],
          [0.8276, 0.9132, 0.8618,  ..., 0.2282, 0.1939, 0.2282],
          [0.8961, 0.9132, 0.8789,  ..., 0.2282, 0.2282, 0.2453],
          ...,
          [0.8961, 0.8618, 0.9132,  ..., 0.4508, 0.4166, 0.3994],
          [0.9303, 0.9132, 0.9474,  ..., 0.4166, 0.4166, 0.4508],
          [0.9646, 0.8789, 0.9303,  ..., 0.3994, 0.3994, 0.3994]],

         [[1.0455, 1.0630, 1.0280,  ..., 0.3803, 0.3277, 0.3627],
          [0.9755, 1.0630, 1.0105,  ..., 0.3627, 0.3277, 0.3627],
          [1.0455, 1.0630, 1.0280,  ..., 0.3627, 0.3627, 0.3803],
          ...,
          [1.0455, 1.0105, 1.0630,  ..., 0.5903, 0.5553, 0.5378],
          [1.0805, 1.0630, 1.0980,  ..., 0.5553, 0.5553, 0.5903],
          [1.1155, 1.0280, 1.0805,  ..., 0.5378, 0.5378, 0.5378]],

         [[1.2631, 1.2805, 1.2457,  ..., 0.6008, 0.5485, 0.5834],
          [1.1934, 1.2805, 1.2282,  ..., 0.5834, 0.5485, 0.5834],
          [1.2631, 1.2805, 1.2457,  ..., 0.5834, 0.5834, 0.6008],
          ...,
          [1.2631, 1.2282, 1.2805,  ..., 0.8099, 0.7751, 0.7576],
          [1.2980, 1.2805, 1.3154,  ..., 0.7751, 0.7751, 0.8099],
          [1.3328, 1.2457, 1.2980,  ..., 0.7576, 0.7576, 0.7576]]]],
       device='cuda:0')
Declaration: forward(__torch__.torchvision.models.detection.mask_rcnn.___torch_mangle_1723.MaskRCNN self, Tensor[] images, Dict(str, Tensor)[]? targets=None) -> ((Dict(str, Tensor), Dict(str, Tensor)[]))
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)

With torch.jit.trace

modelname="maskrcnn"
model = torch.load(modelname+"-best.pth")
model=model.cuda()
model.eval()
print(img)
with torch.no_grad():
    print(model(img))
    traced_cell = torch.jit.trace(model, (img))
torch.jit.save(traced_cell, modelname+"-torchscript.pth")

loaded_trace = torch.jit.load(modelname+"-torchscript.pth")
loaded_trace.eval()
with torch.no_grad():
    print(loaded_trace(img))

TensorMask(torch.argmax(loaded_trace(img),1)).show()

Output

TensorImage([[[[0.8961, 0.9132, 0.8789,  ..., 0.2453, 0.1939, 0.2282],
          [0.8276, 0.9132, 0.8618,  ..., 0.2282, 0.1939, 0.2282],
          [0.8961, 0.9132, 0.8789,  ..., 0.2282, 0.2282, 0.2453],
          ...,
          [0.8961, 0.8618, 0.9132,  ..., 0.4508, 0.4166, 0.3994],
          [0.9303, 0.9132, 0.9474,  ..., 0.4166, 0.4166, 0.4508],
          [0.9646, 0.8789, 0.9303,  ..., 0.3994, 0.3994, 0.3994]],

         [[1.0455, 1.0630, 1.0280,  ..., 0.3803, 0.3277, 0.3627],
          [0.9755, 1.0630, 1.0105,  ..., 0.3627, 0.3277, 0.3627],
          [1.0455, 1.0630, 1.0280,  ..., 0.3627, 0.3627, 0.3803],
          ...,
          [1.0455, 1.0105, 1.0630,  ..., 0.5903, 0.5553, 0.5378],
          [1.0805, 1.0630, 1.0980,  ..., 0.5553, 0.5553, 0.5903],
          [1.1155, 1.0280, 1.0805,  ..., 0.5378, 0.5378, 0.5378]],

         [[1.2631, 1.2805, 1.2457,  ..., 0.6008, 0.5485, 0.5834],
          [1.1934, 1.2805, 1.2282,  ..., 0.5834, 0.5485, 0.5834],
          [1.2631, 1.2805, 1.2457,  ..., 0.5834, 0.5834, 0.6008],
          ...,
          [1.2631, 1.2282, 1.2805,  ..., 0.8099, 0.7751, 0.7576],
          [1.2980, 1.2805, 1.3154,  ..., 0.7751, 0.7751, 0.8099],
          [1.3328, 1.2457, 1.2980,  ..., 0.7576, 0.7576, 0.7576]]]],
       device='cuda:0')
[{'boxes': tensor([[412.5222, 492.3208, 619.7662, 620.9233]], device='cuda:0'), 'labels': tensor([1], device='cuda:0'), 'scores': tensor([0.1527], device='cuda:0'), 'masks': tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]], device='cuda:0')}]
/opt/conda/conda-bld/pytorch_1587452831668/work/torch/csrc/utils/python_arg_parser.cpp:760: UserWarning: This overload of nonzero is deprecated:
    nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
    nonzero(Tensor input, *, bool as_tuple)
/home/david/anaconda3/envs/proy/lib/python3.7/site-packages/torch/tensor.py:467: RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  'incorrect results).', category=RuntimeWarning)
/home/david/anaconda3/envs/proy/lib/python3.7/site-packages/fastai2/torch_core.py:272: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  res = getattr(super(TensorBase, self), fn)(*args, **kwargs)
/opt/conda/conda-bld/pytorch_1587452831668/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
/home/david/anaconda3/envs/proy/lib/python3.7/site-packages/torchvision/models/detection/rpn.py:164: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(image_size[1] / g[1], dtype=torch.int64, device=device)] for g in grid_sizes]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-15-44b7a9360e87> in <module>
      6 with torch.no_grad():
      7     print(model(img))
----> 8     traced_cell = torch.jit.trace(model, (img))
      9 torch.jit.save(traced_cell, modelname+"-torchscript.pth")
     10 

~/anaconda3/envs/proy/lib/python3.7/site-packages/torch/jit/__init__.py in trace(func, example_inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit)
    881         return trace_module(func, {'forward': example_inputs}, None,
    882                             check_trace, wrap_check_inputs(check_inputs),
--> 883                             check_tolerance, strict, _force_outplace, _module_class)
    884 
    885     if (hasattr(func, '__self__') and isinstance(func.__self__, torch.nn.Module) and

~/anaconda3/envs/proy/lib/python3.7/site-packages/torch/jit/__init__.py in trace_module(mod, inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit)
   1035             func = mod if method_name == "forward" else getattr(mod, method_name)
   1036             example_inputs = make_tuple(example_inputs)
-> 1037             module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, strict, _force_outplace)
   1038             check_trace_method = module._c._get_method(method_name)
   1039 

~/anaconda3/envs/proy/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    554                 input = result
    555         if torch._C._get_tracing_state():
--> 556             result = self._slow_forward(*input, **kwargs)
    557         else:
    558             result = self.forward(*input, **kwargs)

~/anaconda3/envs/proy/lib/python3.7/site-packages/torch/nn/modules/module.py in _slow_forward(self, *input, **kwargs)
    540                 recording_scopes = False
    541         try:
--> 542             result = self.forward(*input, **kwargs)
    543         finally:
    544             if recording_scopes:

~/anaconda3/envs/proy/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
     68         if isinstance(features, torch.Tensor):
     69             features = OrderedDict([('0', features)])
---> 70         proposals, proposal_losses = self.rpn(images, features, targets)
     71         detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
     72         detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

~/anaconda3/envs/proy/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    554                 input = result
    555         if torch._C._get_tracing_state():
--> 556             result = self._slow_forward(*input, **kwargs)
    557         else:
    558             result = self.forward(*input, **kwargs)

~/anaconda3/envs/proy/lib/python3.7/site-packages/torch/nn/modules/module.py in _slow_forward(self, *input, **kwargs)
    540                 recording_scopes = False
    541         try:
--> 542             result = self.forward(*input, **kwargs)
    543         finally:
    544             if recording_scopes:

~/anaconda3/envs/proy/lib/python3.7/site-packages/torchvision/models/detection/rpn.py in forward(self, images, features, targets)
    486         proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
    487         proposals = proposals.view(num_images, -1, 4)
--> 488         boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
    489 
    490         losses = {}

~/anaconda3/envs/proy/lib/python3.7/site-packages/torchvision/models/detection/rpn.py in filter_proposals(self, proposals, objectness, image_shapes, num_anchors_per_level)
    392 
    393         # select top_n boxes independently per level before applying nms
--> 394         top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)
    395 
    396         image_range = torch.arange(num_images, device=device)

~/anaconda3/envs/proy/lib/python3.7/site-packages/torchvision/models/detection/rpn.py in _get_top_n_idx(self, objectness, num_anchors_per_level)
    372                 pre_nms_top_n = min(self.pre_nms_top_n(), num_anchors)
    373             _, top_n_idx = ob.topk(pre_nms_top_n, dim=1)
--> 374             r.append(top_n_idx + offset)
    375             offset += num_anchors
    376         return torch.cat(r, dim=1)

RuntimeError: expected device cuda:0 but got device cpu

WaterKnight1998 on 28 May 2020

@WaterKnight1998 's issue also tracked here with potential solution.

ptrblck on 29 May 2020

@WaterKnight1998 to complement @ptrblck comment, it seems that your input is a TensorImage (which is not something that we provide in torchvision I believe)
If you pass instead a list of 3d tensors, it should work.

fmassa on 29 May 2020

@WaterKnight1998 to complement @ptrblck comment, it seems that your input is a TensorImage (which is not something that we provide in torchvision I believe)
If you pass instead a list of 3d tensors, it should work.

TensorImage is just a normal Tensor obtained from fastai that just add show function.

The problem that we are finding is that after tracing the output gets changed!

You can find the concrete output here

WaterKnight1998 on 29 May 2020

@WaterKnight1998 I would recommend converting the TensorImage into a Tensor before feeding the image, and making it be a list of tensors of 3 dimensions.

fmassa on 29 May 2020

@WaterKnight1998 I would recommend converting the TensorImage into a Tensor before feeding the image, and making it be a list of tensors of 3 dimensions.

I tried using a list of 3d tensors and I am getting the strange empty dict.

({}, [{'scores': tensor([0.0570], grad_fn=<IndexBackward>), 'labels': tensor([1]), 'boxes': tensor([[165.8691, 434.1203, 527.4108, 714.6182]], grad_fn=<StackBackward>), 'masks': tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],

WaterKnight1998 on 4 Jun 2020

@WaterKnight1998 your output seems ok to me, MaskRCNN detected only a single object, with low confidence.

I would make sure that I'm feeding the inputs in the right forward (the images should be in the range 0-1)

fmassa on 5 Jun 2020

your output seems ok to me

@fmassa mask-rcnn withotuh scripting just output the second element of the tuple. Is normal that after tracing it, it returns a tuple with first element of tuple being an empty dict?

WaterKnight1998 on 5 Jun 2020

@WaterKnight1998 yes, it is.
We raise a warning in https://github.com/pytorch/vision/blob/11a39aaab5b55a3c116c2e8d8001bad94a96f99d/torchvision/models/detection/generalized_rcnn.py#L108
explaining the differences. It's a limitation of torchscript that we can't have different return types depending on the self.training, so we always return both the losses and the detections, although only one of them will be activated.

fmassa on 5 Jun 2020

It's a limitation of torchscript that we can't have different return types depending on the self.training, so we always return both the losses and the detections, although only one of them will be activated.

@fmassa Thank you very much for your explanation. It gave me the intuition that I needed!

WaterKnight1998 on 5 Jun 2020

👍1

Hello @fmassa,

Is there any updates on this issue?

torch==1.7.1
torchaudio==0.7.2
torchvision==0.8.2

Traceback (most recent call last):
  File "D:/Projects/tester/main.py", line 62, in <module>
    torch_out = script_module(x)
  File "D:\Projects\tester\venv\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
RuntimeError: forward() Expected a value of type 'List[Tensor]' for argument 'images' but instead found type 'Tensor'.
Position: 1
Value: tensor([[[[-1.3924, -0.3426,  0.1565,  ..., -1.0010, -0.1127,  0.2637],
          [ 0.1392, -1.3978,  0.4600,  ..., -1.7351, -1.3514, -0.4097],
          [ 1.1242, -0.2859,  0.0956,  ..., -0.9409,  0.6421, -0.0713],
          ...,
          [ 0.4488,  0.1756,  1.9472,  ...,  1.3395,  0.0882,  0.2821],
          [ 1.2623,  0.0925, -2.4398,  ..., -0.9513, -2.2078,  1.7615],
          [-0.0645, -0.4522,  1.2193,  ..., -0.3644,  0.0360, -0.1954]],

         [[ 1.1202, -1.4459, -1.7245,  ..., -1.2972, -0.0717,  0.4818],
          [ 0.8732, -0.1661, -0.1113,  ...,  1.9476, -0.4579,  1.1956],
          [-2.1614,  0.3758, -0.7581,  ..., -1.0231, -0.8411, -0.1101],
          ...,
          [ 0.5501,  0.3279, -0.8761,  ..., -0.8433, -0.2146, -1.6229],
          [ 0.6187, -1.9583, -3.2449,  ...,  1.4666, -0.0826,  1.5495],
          [-1.4143,  0.3092, -0.3439,  ...,  0.8020, -0.5509,  0.0355]],

         [[ 0.7972,  0.5274, -1.5208,  ..., -0.6306,  0.5713, -1.0178],
          [ 0.4690,  0.6849,  0.0668,  ..., -0.5453, -1.1445,  0.2774],
          [-0.0832,  1.3775, -0.8812,  ..., -2.3852,  0.5324,  1.5018],
          ...,
          [ 0.6334,  0.4894,  0.3861,  ...,  0.9698,  1.0560, -0.8113],
          [-0.8962,  1.7035, -0.8178,  ..., -0.1556,  1.7010, -0.4338],
          [ 0.0149, -0.4869, -1.8882,  ..., -1.3715,  0.9658, -0.3530]]]])
Declaration: forward(__torch__.torchvision.models.detection.faster_rcnn.FasterRCNN self, Tensor[] images, Dict(str, Tensor)[]? targets=None) -> ((Dict(str, Tensor), Dict(str, Tensor)[]))
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)

bulatnv on 23 Dec 2020

@bulatnv torchscript should be supported for MaskRCNN models, but they only support the List[Tensor] interface, and not the Tensor.

So instead of doing

model(torch.rand(1, 3, 300, 300))

do instead

model([torch.rand(3, 300, 300)])

fmassa on 20 Jan 2021

Vision: [JIT] Not supported for maskrcnn_resnet50_fpn

Most helpful comment

All 59 comments

this snippet is from engine.py of the torchvion.models.detection

Related issues