Tensorrt: Reuse serialized engines on different platforms or TensorRT versions

Created on 8 Aug 2019 · 1Comment · Source: NVIDIA/TensorRT

Hi experts,

I found some places have below note:

Note: Serialized engines are not portable across platforms or TensorRT versions.

like here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html#platform-matrix

And I found the explanation is:

TensorRT includes import methods to help you express your trained deep learning model for TensorRT to optimize and run. It is an optimization tool that applies graph optimization and layer fusion and finds the fastest implementation of that model leveraging a diverse collection of highly optimized kernels, and a runtime that you can use to execute this network in an inference context.

My questions are:

Could I serialize and reuse the engines on different machines (include same platforms and different platforms)? Will they work?
If they can work, how much loss they would have?

Any comments will be appreciated.
Thanks!

Source

ljayx

Most helpful comment

For Q1, there is no guarantee the serialized engine will work on different platforms, as the optimized graph uses kernels specific to the GPU. If you are running on different machines with the Same GPU architecture/OS it should work. You can support multiple GPU's as backends, with multiple saved serialized engines with TRT-IS (Inference Server) if you are trying to support a larger collection of GPU's as one suggestion.