Tensorrt: [Bug] InstanceNormalization results different in TensorRT compared to original model

Created on 9 Dec 2019  路  12Comments  路  Source: NVIDIA/TensorRT

Official InstanceNormalization plugin actually can not used in onnx2trt:

DEFINE_BUILTIN_OP_IMPORTER(InstanceNormalization)
{
    // Scales and biases must be initializers
    ASSERT(inputs.at(1).is_weights(), ErrorCode::kUNSUPPORTED_NODE);
    ASSERT(inputs.at(2).is_weights(), ErrorCode::kUNSUPPORTED_NODE);
    nvinfer1::ITensor* tensor_ptr = &convertToTensor(inputs.at(0), ctx);
//    ASSERT(!isDynamic(tensor_ptr->getDimensions()) && "InstanceNormalization does not support dynamic inputs!", ErrorCode::kUNSUPPORTED_NODE);
    auto scale_weights = inputs.at(1).weights();
    auto bias_weights = inputs.at(2).weights();
    OnnxAttrs attrs(node);
    float epsilon = attrs.get("epsilon", 1e-5f);
    // TensorRT only supports epsilon values >= 1e-4.
    epsilon = std::max(epsilon, 1e-4f);

    // Populate instanceNormalization plugin properties.
    // using Creator to create plugins
  nvinfer1::plugin::InstanceNormalizationPluginCreator pluginCreator;
  std::vector<nvinfer1::PluginField> mPluginAttributes1 = {
      nvinfer1::PluginField("epsilon", &epsilon, nvinfer1::PluginFieldType::kFLOAT32, sizeof(nvinfer1::PluginFieldType::kFLOAT32)),
      nvinfer1::PluginField("scales", &scale_weights.values, nvinfer1::PluginFieldType::kFLOAT32, scale_weights.count()* sizeof(scale_weights.type)),
      nvinfer1::PluginField("bias", &bias_weights.values, nvinfer1::PluginFieldType::kFLOAT32, bias_weights.count()* sizeof(bias_weights.type)),
  };
  nvinfer1::PluginFieldCollection mFC1;
  mFC1.nbFields = mPluginAttributes1.size();
  mFC1.fields = mPluginAttributes1.data();
  auto plugin = pluginCreator.createPlugin(pluginCreator.getPluginName(), &mFC1);
    RETURN_FIRST_OUTPUT(
        ctx->addPluginV2DynamicExt(plugin,
            {&convertToTensor(inputs.at(0), ctx)}));
}

I got this error when try to deserialize the plugin:

[2019-12-09 12:01:04   ERROR] FAILED_ALLOCATION: basic_string::_S_construct null not valid

C++ ONNX Plugins TODO

Most helpful comment

The initial issue ([2019-12-09 12:01:04 ERROR] FAILED_ALLOCATION: basic_string::_S_construct null not valid)
was fixed in https://github.com/NVIDIA/TensorRT/commit/090231a93ca6ed54f527f6851122460f221098fe#diff-f8ebe1c9f6980c94f806c1573ee62284 (no namespace set in clone())
I've got the same segfault when tried to onnx2trt with my custom plugin that had clone() method with copy-pasted structure from InstanceNormalization plugin

All 12 comments

Hi @jinfagang,

Can you reproduce this using the ONNX Parser in TensorRT API, or using trtexec --onnx=<model.onnx> (may need to build OSS components first). This would be easier to work with.

@rmccorm4 thanks for reply, does trtexec can generate tensorrt engine from onnx?

Yes, trtexec --onnx=model.onnx as mentioned in my earlier comment. trtexec is also rebuilt along with the OSS components if the binary shipped with the release doesn't work for you.

@jinfagang any update on this?

Seems like other users can use this plugin: https://devtalk.nvidia.com/default/topic/1068998/memory-leak-in-tensorrt-instancenormalization/?offset=2

@rmccorm4 I believe there is some memory issue in this plugin, also, I tried convert a model with InstanceNormalization but the result inference with TensorRT were totally wrong.

Memory leak issue is being looked into here: https://github.com/NVIDIA/TensorRT/issues/296 and should've been fixed by #315.

But I think the inference issue would be something else. Can you share code + model to reproduce it?

I also met the same problem. I tried to use tensorrt to parse an onnx model with instanceNormalization and the result is totally different from the original one.
I also tried to convert the onnx model to trt file using trtexec tool
trtexec --onnx=model.onnx --maxBatch=16 --saveEngine=model.trt
but got the error as following:

[01/15/2020-12:53:51] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[01/15/2020-12:53:52] [E] [TRT] FAILED_ALLOCATION: basic_string::_S_construct null not valid
[01/15/2020-12:53:52] [E] Engine serialization failed
[01/15/2020-12:53:52] [E] Saving engine to file failed

Hi @jinfagang @shamoqianting ,

Can either of you share a repro script that shows the output difference between InstanceNormalization in original framework vs. TensorRT? This will help debug and fix the issue.

The initial issue ([2019-12-09 12:01:04 ERROR] FAILED_ALLOCATION: basic_string::_S_construct null not valid)
was fixed in https://github.com/NVIDIA/TensorRT/commit/090231a93ca6ed54f527f6851122460f221098fe#diff-f8ebe1c9f6980c94f806c1573ee62284 (no namespace set in clone())
I've got the same segfault when tried to onnx2trt with my custom plugin that had clone() method with copy-pasted structure from InstanceNormalization plugin

Thanks for pointing that out @seovchinnikov, much appreciated.

@shamoqianting ,

I also met the same problem. I tried to use tensorrt to parse an onnx model with instanceNormalization and the result is totally different from the original one.
I also tried to convert the onnx model to trt file using trtexec tool
trtexec --onnx=model.onnx --maxBatch=16 --saveEngine=model.trt
but got the error as following:

Assuming you're using TensorRT 7, you'll have to build the OSS ONNX parser and other components from a commit after this fix (https://github.com/NVIDIA/TensorRT/commit/090231a93ca6ed54f527f6851122460f221098fe#diff-f8ebe1c9f6980c94f806c1573ee62284) as mentioned above and then try re-running your command.

For example, I think building the 20.01 tag of this repo should contain the fix, if master branch doesn't work for you.

Still I wanna ask which is more proper way to convert onnx to tensorrt. I saw onnx-tensorrt and third-party inside TensorRT of onnx-tensorrt, their behavior and code not quite same..... As an user, I don't know which one is the way Nvidia prefer. More disaster thing is that, some model I can convert with onnx2trt but can not convert through txtexec. Could you give me some advice on how their difference? (don't say they are same thing, actually they were not)

Was this page helpful?
0 / 5 - 0 ratings