Tensorrt: How to make large batch inferences

Created on 28 Jun 2019 · 6Comments · Source: NVIDIA/TensorRT

Hello, I have a yolov3 trt model and a lot of test images, how to enter multiple images at the same time? not a single image. what should I do？

question

Source

float123

Most helpful comment

If you are using c++ API.

First make sure the trt model you built was using IBuilder::setMaxBatchSize(maxBatchSize), where you inference batch size is smaller than the maxBatchSize.
When doing inference. You need to call IExecutionContext::enqueue(...) to push you input data to GPU. Please make sure the binding are allocated at right batch size, and the batch_size arg is correctly match with the binding you allocated.

For detailed usage of TensorRT, please see
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html

You can start with studying the mnist sample:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#README-sampleMNIST

litaotju on 30 Jun 2019

👍3

All 6 comments

If you are using c++ API.

First make sure the trt model you built was using IBuilder::setMaxBatchSize(maxBatchSize), where you inference batch size is smaller than the maxBatchSize.
When doing inference. You need to call IExecutionContext::enqueue(...) to push you input data to GPU. Please make sure the binding are allocated at right batch size, and the batch_size arg is correctly match with the binding you allocated.

For detailed usage of TensorRT, please see
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html

You can start with studying the mnist sample:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#README-sampleMNIST

litaotju on 30 Jun 2019

👍3

@litaotju
Hi,I'm using the Python API and test the sample onnx_resnet50.py which located in xx/samples/python/introductory_parser_samples. Everything works well but I want to enter multiple images too when testing the TensorRT model ,and I want to test different batch_size, what should I do? (maximum batch size、maximum workspace size、batch_size I almost confuse them ) Thank you so much!!!

sanmudaxia on 9 Jul 2019

Could pls read the document? All these concepts are either in the doc or in the code comments if the tensorrt include .h files.

litaotju on 9 Jul 2019

Hi @sanmudaxia,

max_batch_size is the max batch size that your TensorRT engine will accept, you can execute a batch of sizes from 1,2,..., up to max_batch_size. The TensorRT engine will also be optimized for max_batch_size for an implicit batch network. For an explicit batch network, you can create serveral optimization profiles to optimize for various batch sizes.

Please see the docs for these terms: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/Core/Builder.html

max_batch_size – int The maximum batch size which can be used at execution time, 
and also the batch size for which the ICudaEngine will be optimized.

max_workspace_size – int The maximum GPU temporary memory which the
ICudaEngine can use at execution time.

And for batch_size that is specified when calling execute*() on the execution context: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/Core/ExecutionContext.html#tensorrt.IExecutionContext.execute

batch_size – The batch size. This is at most the value supplied when the ICudaEngine was built.

rmccorm4 on 2 Nov 2019

@rmccorm4 Wanna ask, what if different images have different num output boxes? Multi batch how to make them know which part belongs to which image?

jinfagang on 3 Jan 2020

Hi @sanmudaxia,

max_batch_size is the max batch size that your TensorRT engine will accept, you can execute a batch of sizes from 1,2,..., up to max_batch_size. The TensorRT engine will also be optimized for max_batch_size for an implicit batch network. For an explicit batch network, you can create serveral optimization profiles to optimize for various batch sizes.

Please see the docs for these terms: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/Core/Builder.html
max_batch_size – int The maximum batch size which can be used at execution time, 
and also the batch size for which the ICudaEngine will be optimized.

max_workspace_size – int The maximum GPU temporary memory which the
ICudaEngine can use at execution time.
And for batch_size that is specified when calling execute*() on the execution context: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/Core/ExecutionContext.html#tensorrt.IExecutionContext.execute
batch_size – The batch size. This is at most the value supplied when the ICudaEngine was built.

may i check for "max_workspace_size" what does (1<<20) meant? what is the "1" and "20" and "<<" meant? how to decide how much is required for my network?