How to improve performance using tensorRT?
I believe @LeviViana has tried using TensorRT with maskrcnn-benchmark, and could potentially give some insights.
Currently it is kind of tricky to do this mainly because you can't trace a detector (i.e. the pipeline contains operations which aren't jit compatible). However, you can trace the backbone, which is the most time-consuming step in the detection.
So, the steps are:
torch.jitTensorRT has an API for Python, which depends on pycuda. There are a lot of tutorials out there explaining how you can take a model in onnx format and serialize it in TensorRT's format.
I can't help that much for the moment mainly because I'm not using the Python API (I'm calling it directly in C++). Sorry but I've got a lot of work in this moment, I won't have the time to post snippets and benchmarks in the next few weeks I think.
Has anyone tried serving a maskrcnn-benchmark model using nvidia's tensorrt-inference-server? Some recent changes may have simplified the serving of torch models, but not sure if it plays nice with maskrcnn. Interested to hear from anyone who has tried.
@dcyoung Does this lib contains maskrcnn needed op supported?
any update for supporting tensorrt?
hi @mengjiexu
thanks for sharing. i have a question, do u try dynamic input for trt?
@zimenglan-sysu-512 Dynamic is not supported for TensorRT 5.1. But supported in TensorRT 6.0
Even though, still have some error when converting...
Most helpful comment
Has anyone tried serving a
maskrcnn-benchmarkmodel using nvidia's tensorrt-inference-server? Some recent changes may have simplified the serving of torch models, but not sure if it plays nice with maskrcnn. Interested to hear from anyone who has tried.