Hi experts,
I found some places have below note:
Note: Serialized engines are not portable across platforms or TensorRT versions.
like here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html#platform-matrix
And I found the explanation is:
TensorRT includes import methods to help you express your trained deep learning model for TensorRT to optimize and run. It is an optimization tool that applies graph optimization and layer fusion and finds the fastest implementation of that model leveraging a diverse collection of highly optimized kernels, and a runtime that you can use to execute this network in an inference context.
My questions are:
Any comments will be appreciated.
Thanks!
For Q1, there is no guarantee the serialized engine will work on different platforms, as the optimized graph uses kernels specific to the GPU. If you are running on different machines with the Same GPU architecture/OS it should work. You can support multiple GPU's as backends, with multiple saved serialized engines with TRT-IS (Inference Server) if you are trying to support a larger collection of GPU's as one suggestion.
Most helpful comment
For Q1, there is no guarantee the serialized engine will work on different platforms, as the optimized graph uses kernels specific to the GPU. If you are running on different machines with the Same GPU architecture/OS it should work. You can support multiple GPU's as backends, with multiple saved serialized engines with TRT-IS (Inference Server) if you are trying to support a larger collection of GPU's as one suggestion.