Tensorrt: Details about Int8

Created on 18 Jun 2019 · 7Comments · Source: NVIDIA/TensorRT

Hi, how can we get details about int8 implementation or the quantization scheme?

good-first-issue question

Source

XHPlus

👍5

Most helpful comment

@XHPlus @yanhn

The INT8 quantization scheme used by TensorRT is fairly straightforward. Its linear and symmetric over a clipped range of input values, i.e., the dynamic range (-max to +max).

The dynamic range itself can either be specified for every tensor in the network or determined using the calibration method. See section 5.1.3 of the developer guide for instructions:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#enable_int8_c

You can extend/implement custom calibrators as well:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#int8_sample

See sampleInt8API for example on setting per-tensor ranges:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#int8_api_sample

A few blogposts and GTC talks describing quantization scheme similar to that used in TensorRT:
https://towardsdatascience.com/low-precision-inference-with-tensorrt-6eb3cda0730b
https://medium.com/tensorflow/high-performance-inference-with-tensorrt-integration-c4d78795fbfe
http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

rajeevsrao on 19 Jun 2019

👍4

All 7 comments

Same question~

yanhn on 18 Jun 2019

Can not find any optimization related code in the repo... e.g. Layer & Tensor fusion, quantization...

yehenrytian on 18 Jun 2019

@XHPlus @yanhn

The INT8 quantization scheme used by TensorRT is fairly straightforward. Its linear and symmetric over a clipped range of input values, i.e., the dynamic range (-max to +max).

You can extend/implement custom calibrators as well:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#int8_sample

See sampleInt8API for example on setting per-tensor ranges:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#int8_api_sample

rajeevsrao on 19 Jun 2019

👍4

@yehenrytian

The graph optimizer is part of the core library, which is not included in this release. The components we have open sourced are Caffe and ONNX parsers (for importing models from respective framework/interchange formats), and plugins (to extend functionality beyond the set of TensorRT supported Layers via custom ops).

rajeevsrao on 19 Jun 2019

@rajeevsrao
Sir, Iwant to use onnx int8 without calibration method .something i want to make sure about getting the dynamic range. I train a model use pytorch and export to onnx then run it on tensor RT.
First, when I run the model in pytorch as follow:
model = detection_net()
model.load_state_dict(torch.load('detection_net_epoch200.pth'))
....
....
def get_features_hook(self,input,output):
print(output.min(),output.max())
model.layer[1].register_forward_hook(get_features_hook)
a = model(input_img)
...
.....
the output of get_features_hook is the dynamic range?
Second, in the sample of sampleINT8API(D:\TensorRT-5.1.5.0\samples),resnet50_per_tensor_dynamic_range.txt,

...
gpu_0/conv1_1:5.43116007373
gpu_0/res_conv1_bn_1:8.69735834748
gpu_0/res_conv1_bn_2:8.69735834748
.....
the content of this txt such as: gpu_0/res_conv1_bn_1:8.69735834748, 8.69735834748 is the ynamic range? max or -max,which is it?

I'm a beginner of TensorRT, maybe these questions are too elementary.

MenGuangwen-CN-0411 on 21 Jun 2019

@XHPlus @yanhn

The INT8 quantization scheme used by TensorRT is fairly straightforward. Its linear and symmetric over a clipped range of input values, i.e., the dynamic range (-max to +max).

The dynamic range itself can either be specified for every tensor in the network or determined using the calibration method. See section 5.1.3 of the developer guide for instructions:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#enable_int8_c

You can extend/implement custom calibrators as well:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#int8_sample

See sampleInt8API for example on setting per-tensor ranges:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#int8_api_sample

A few blogposts and GTC talks describing quantization scheme similar to that used in TensorRT:
https://towardsdatascience.com/low-precision-inference-with-tensorrt-6eb3cda0730b
https://medium.com/tensorflow/high-performance-inference-with-tensorrt-integration-c4d78795fbfe
http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

Thanks for your meticulous reply.
I will learn the mentioned materials above.

XHPlus on 23 Jun 2019

@rajeevsrao
Sir, Iwant to use onnx int8 without calibration method .something i want to make sure about getting the dynamic range. I train a model use pytorch and export to onnx then run it on tensor RT.
First, when I run the model in pytorch as follow:
model = detection_net()
model.load_state_dict(torch.load('detection_net_epoch200.pth'))
....
....
def get_features_hook(self,input,output):
print(output.min(),output.max())
model.layer[1].register_forward_hook(get_features_hook)
a = model(input_img)
...
.....
the output of get_features_hook is the dynamic range?
Second, in the sample of sampleINT8API(D:\TensorRT-5.1.5.0\samples),resnet50_per_tensor_dynamic_range.txt,

...
gpu_0/conv1_1:5.43116007373
gpu_0/res_conv1_bn_1:8.69735834748
gpu_0/res_conv1_bn_2:8.69735834748
.....
the content of this txt such as: gpu_0/res_conv1_bn_1:8.69735834748, 8.69735834748 is the ynamic range? max or -max,which is it?

I'm a beginner of TensorRT, maybe these questions are too elementary.