Tensorrt: How to use TensorRT to generate scale factors?

Created on 11 Nov 2019 · 13Comments · Source: NVIDIA/TensorRT

Hi,

I want to generate the scale factors for VGG-16 for NVDLA INT8 Configuration (ref: https://github.com/nvdla/sw/blob/v1.2.0-OC/LowPrecision.md). I am not sure how to do this. I tried to follow the documentation but did not understand where to start. If anyone can give the steps that would be really helpful. I am fairly new to this so any help will be highly appreciated.Thanks.

Python INT8 question

Source

Okaymaddy

👍1

Most helpful comment

@Okaymaddy, I created a reference for INT8 calibration on Imagenet-like data. Hopefully you can use this as a starting point: https://github.com/rmccorm4/tensorrt-utils/blob/master/classification/imagenet/ImagenetCalibrator.py

rmccorm4 on 14 Nov 2019

❤1 👍1

All 13 comments

Hi @Okaymaddy,

You can generate the calibration by doing int8 calibration with TensorRT. There is an example for MNIST that comes with the full TensorRT release that you can download from devzone or get from the NGC container. You have to implement the calibrator class for your model, and make sure the data you use for calibration is pre-processed in the same way that your model expects during training/inference.

This is a snippet taken from the /usr/src/tensorrt/samples/python/int8_caffe_mnist/calibrator.py sample.

class MNISTEntropyCalibrator(trt.IInt8EntropyCalibrator2):
    def __init__(self, training_data, cache_file, batch_size=64):
        # Whenever you specify a custom constructor for a TensorRT class,
        # you MUST call the constructor of the parent explicitly.
        trt.IInt8EntropyCalibrator2.__init__(self)

        self.cache_file = cache_file

        # Every time get_batch is called, the next batch of size batch_size will be copied to the device and returned.
        self.data = load_mnist_data(training_data)
        self.batch_size = batch_size
        self.current_index = 0

        # Allocate enough memory for a whole batch.
        self.device_input = cuda.mem_alloc(self.data[0].nbytes * self.batch_size)

    def get_batch_size(self):
        return self.batch_size

    # TensorRT passes along the names of the engine bindings to the get_batch function.
    # You don't necessarily have to use them, but they can be useful to understand the order of
    # the inputs. The bindings list is expected to have the same ordering as 'names'.
    def get_batch(self, names):
        if self.current_index + self.batch_size > self.data.shape[0]:
            return None

        current_batch = int(self.current_index / self.batch_size)
        if current_batch % 10 == 0:
            print("Calibrating batch {:}, containing {:} images".format(current_batch, self.batch_size))

        batch = self.data[self.current_index:self.current_index + self.batch_size].ravel()
        cuda.memcpy_htod(self.device_input, batch)
        self.current_index += self.batch_size
        return [self.device_input]

    def read_calibration_cache(self):
        # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
        if os.path.exists(self.cache_file):
            with open(self.cache_file, "rb") as f:
                return f.read()

    def write_calibration_cache(self, cache):
        with open(self.cache_file, "wb") as f:
            f.write(cache)

You will create a similar calibrator class to above for your VGG-16 model, and then pass in an instance of that calibrator when building the engine along with setting the fp16/int8 flags. Here's a snippet from /usr/src/tensorrt/samples/python/int8_caffe_mnist/sample.py:

# ...
calibration_cache = "mnist_calibration.cache"
calib = MNISTEntropyCalibrator(test_set, cache_file=calibration_cache)
# ...

# This function builds an engine from a Caffe model.
def build_int8_engine(deploy_file, model_file, calib, batch_size=32):
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.CaffeParser() as parser:
        # We set the builder batch size to be the same as the calibrator's, as we use the same batches
        # during inference. Note that this is not required in general, and inference batch size is
        # independent of calibration batch size.
        builder.max_batch_size = batch_size
        builder.max_workspace_size = common.GiB(1)
        builder.int8_mode = True
        builder.int8_calibrator = calib
        # Parse Caffe model
        model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=ModelData.DTYPE)
        network.mark_output(model_tensors.find(ModelData.OUTPUT_NAME))
        # Build engine and do int8 calibration.
        return builder.build_cuda_engine(network)

This will create a text file similar to: https://github.com/nvdla/sw/blob/v1.2.0-OC/umd/utils/calibdata/resnet50.txt

Then it looks like you can use this script to convert from the calibration cache text file to a JSON file containing the scale factors: https://github.com/nvdla/sw/blob/v1.2.0-OC/umd/utils/calibdata/calib_txt_to_json.py

I would recommend using the ONNX parser instead of the Caffe parser that the above sample uses. You can download a vgg16 ONNX model from the ONNX model zoo: https://github.com/onnx/models/tree/master/vision/classification/vgg#model

rmccorm4 on 11 Nov 2019

❤1

This is pretty outdated, but this article may be a helpful reference: https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt/

For the calibration cache code, I would refer to the sample code from the TRT6 release, not the code in that article from TRT3

rmccorm4 on 12 Nov 2019

❤1

rmccorm4 on 14 Nov 2019

❤1 👍1

Hi @rmccorm4 , I was running the samle MNIST provided in <TensorRT root directory>/samples/sampleMNIST and I got the following errors:

$./sample_mnist
&&&& RUNNING TensorRT.sample_mnist # ./sample_mnist
[10/18/2019-17:31:22] [I] Building and running a GPU inference engine for MNIST
[10/18/2019-17:31:22] [E] [TRT] CUDA initialization failure with error 38. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
&&&& FAILED TensorRT.sample_mnist # ./sample_mnist

Error 38 is basically because I am not using a GPU. So, is a GPU mandatory to run code samples in TensoRT? Thanks.

Okaymaddy on 20 Nov 2019

👍1

Hi @Okaymaddy,

Yes, an NVIDIA GPU is mandatory. TensorRT is built on CUDA which is specifically meant for acclerating computation with NVIDIA GPUs.

rmccorm4 on 21 Nov 2019

👍1

Hi @rmccorm4 , I am not sure but I was looking at the sample files, and in my case to generate int8 calibration files, should I be using https://github.com/NVIDIA/TensorRT/tree/release/6.0/samples/opensource/sampleINT8 and configure that to fit for VGG-16?

I ran the above code and got a file similar to https://github.com/nvdla/sw/blob/v1.2.0-OC/umd/utils/calibdata/resnet50.txt. Should I be editing the sampleINT8 file then instead? Thanks.

Okaymaddy on 23 Nov 2019

Hi @Okaymaddy,

Yes that file is the calibration cache. You could try to tweak the sampleINT8 code if you're looking for a C++ example. However, if you have an ONNX model for VGG16 handy (https://github.com/onnx/models/tree/master/vision/classification/vgg), you could also use the python code described here: https://github.com/rmccorm4/tensorrt-utils/tree/master/classification/imagenet#int8-calibration

There's also an example calibration cache produced from calibrating on a random sample of 512 images from the imagenet validation dataset here: https://github.com/rmccorm4/tensorrt-utils/blob/19.10/classification/imagenet/caches/vgg16.cache

If you plan to actually use the INT8 model, I would suggest creating your own calibration cache with your own choice/amount of data. The example cache I linked above is just for testing/convenience.

rmccorm4 on 23 Nov 2019

Hi @rmccorm4 , I believe at this moment the NVDLA compiler only supports Caffe models. In that case, is it still okay to use the above codes? Also, where can I get the data set for different Caffe models (for Resnet-50, VGG-16, etc)? Thanks!

Okaymaddy on 27 Nov 2019

👍1

Sorry @Okaymaddy, I don't know much about NVDLA, your other thread is probably better suited for those questions: https://github.com/nvdla/sw/issues/177

As for the dataset, it is generally ImageNet, which can be downloaded here: http://image-net.org/download but you need to make an account and such. It's also rather large, original images are ~150GB if I recall correctly.

rmccorm4 on 28 Nov 2019

👍1

Hi @rmccorm4 , I was looking at samples/python/int8_caffe_mnist/sample.py and the following segment of code seems to use test and train set for MNIST:

def main():
    _, data_files = common.find_sample_data(description="Runs a Caffe MNIST network in Int8 mode", subfolder="mnist", find_files=["t10k-images-idx3-ubyte", "t10k-labels-idx1-ubyte", "train-images-idx3-uby
te", ModelData.DEPLOY_PATH, ModelData.MODEL_PATH])
    [test_set, test_labels, train_set, deploy_file, model_file] = data_files

    # Now we create a calibrator and give it the location of our calibration data.
    # We also allow it to cache calibration data for faster engine building.
    calibration_cache = "mnist_calibration.cache"
    calib = MNISTEntropyCalibrator(test_set, cache_file=calibration_cache)

    # Inference batch size can be different from calibration batch size.
    batch_size = 32
    with build_int8_engine(deploy_file, model_file, calib, batch_size) as engine, engine.create_execution_context() as context:
        # Batch size for inference can be different than batch size used for calibration.
        check_accuracy(context, batch_size, test_set=load_mnist_data(test_set), test_labels=load_mnist_labels(test_labels))

if __name__ == '__main__':
    main()

I was wondering how to replace the files in find_files("t10k-images-idx3-ubyte", "t10k-labels-idx1-ubyte" etc) for other models, of the following snipet taken from above:

data_files = common.find_sample_data(description="...", subfolder="..", find_files=["t10k-images-idx3-ubyte", "t10k-labels-idx1-ubyte", "train-images-idx3-uby
te", ModelData.DEPLOY_PATH, ModelData.MODEL_PATH])

I am not sure where I can find the correct train/test files for Resnet-50 or VGG-16.

Okaymaddy on 2 Dec 2019

Hi @Okaymaddy,

If you print out data_files, I'm pretty sure it's just a tuple/list of string filenames. You can replicate this yourself for Imagenet data by downloading the Imagenet dataset, and using a function similar to this to gather all of the filenames:

https://github.com/rmccorm4/tensorrt-utils/blob/49796a0163c9bdea7e2b171fd3655f9c4913b5f9/classification/imagenet/ImagenetCalibrator.py#L33-L64

If you aren't able to figure it out, you can go look at the source for /workspace/tensorrt/samples/python/common.py to better understand what's happening.

rmccorm4 on 10 Dec 2019

👍1

Closing for now. Can reopen if you're still having trouble.

rmccorm4 on 29 Dec 2019

@Okaymaddy, I created a reference for INT8 calibration on Imagenet-like data. Hopefully you can use this as a starting point: https://github.com/rmccorm4/tensorrt-utils/blob/master/classification/imagenet/ImagenetCalibrator.py

Hi Ryan,

Thank you very much for all of your explanations. May I ask why you have created the ImagenetCalibrator.py file? It is quite different than the calibrator.py and sample.py that has been provided for the MNIST model.

Moreover, I know you have said, that the ImagenetCalibrator.py is a starting point. However, I do not know how should I continue from the ImagenetCalibrator.py.

Thank you again for your help.