Tensorrt: serialize failed with my yoloplugin

Created on 29 Oct 2019 · 7Comments · Source: NVIDIA/TensorRT

I implement the yolovPlugin (some code copy from https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps)in TensorRT6.0.1.5, test detect result is right and very fast.

But when I add the cudaEngine serialize code to serialize the engine, it failed(error message:[09/29/2019-13:17:47] [E] [TRT] FAILED_ALLOCATION: basic_string::_S_construct null not valid), and when I decrease the yolo detect head from 3 to 2(e.g remove the yolo head with 2626 gridsize), It could serialize successfull and deserialize successfull, I have checkd the yolo head with 2626 gridsize's param was right, so I don't no why it serialize failed with three yolo head， Very puzzled！

Since TensorRT's engine build and serialize module is not opensource, it is very difficult to debug where is the problem, I've tried a long time, need your help.

My implement github address:https://github.com/dongfangduoshou123/YoloV3-TensorRT

I look forward to you could test it and reply me as soon as possible, Thank you!

bug enhancement

Source

dongfangduoshou123

Most helpful comment

Hi @rmccorm4,

Finally, I find out the problem:If you want to serialize a builded engine with custom plugin, the plugin must be created by the corresponding pluginCreator, other wise the builded engine could run inference well, but when you serialize it, always failed.
contrast:

Wrong way:

Previous:auto* yoloplugin = new Yolo(80, 32, 13, 3); //Wrong! serialize will failed.

Right way:

YoloPluginCreator yolocreator;
int numclass = 7;
    int stride1 = 32;
    int gridsize1 = 13;
    int numanchors = 3;
    std::vector<PluginField> mPluginAttributes1 = {
        PluginField("numclass", &numclass, PluginFieldType::kINT32, 1),
        PluginField("stride", &stride1, PluginFieldType::kINT32, 1),
        PluginField("gridsize", &gridsize1, PluginFieldType::kINT32, 1),
        PluginField("numanchors", &numanchors, PluginFieldType::kINT32, 1)
    };
    PluginFieldCollection mFC1;
    mFC1.nbFields = mPluginAttributes1.size();
    mFC1.fields = mPluginAttributes1.data();
   auto yoloplugin = yolocreator.createPlugin(yolocreator.getPluginName(), &mFC1));

you could see use the plugin's creator to create the plugin, it should many lines of code, so I advise you could tell your TensorRT team to support create custom plugin instance from the c++'s new it,
and solve the problem that when a plugin is created from new it, serialize will always failed,
make plugin creation more flexibility!

you cannot re-open your own issues if a repo collaborator closed them.......
thrank you!

dongfangduoshou123 on 30 Oct 2019

🎉2

All 7 comments

@rmccorm4 I have upload the cmake build files, hope you could clone test it and tell me where the problem in the serialization. I think this may be a bug of the TensorRT's binary Components, hope opensource all module of TensorRT.

dongfangduoshou123 on 29 Oct 2019

Today I attempt to serialize the builded engine(contain custom yolo plugin) with trt's python api, it can success, the code logic is the same as in c++ for serialize, so I think this may a bug of TensorRT for serialize engine with custom added plugin in c++.

https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/seralizeEngineFromPythonAPI.py

And I find that the python api interface is so few, for example the ITensor's setname function is not exposed to python, just can be called in c++, I hope the python api could expose all interface that c++ api has, other wise it is very unfriendly for python users.

dongfangduoshou123 on 30 Oct 2019

Hi @dongfangduoshou123,

I don't have the bandwidth at the moment to look through your C++ implementation for the issue. I don't think there would be an issue in the C++ source code that is solved in the Python bindings, because they end up calling the C++ source code anyways. It may be an issue in your C++ implementation.

I'm sorry I can't be more helpful at the moment, but I'm glad it worked using the Python API 🙂

I'm going to close this for now, but please re-open if you find more details or a simpler/minimal example that reproduces the bug.

rmccorm4 on 30 Oct 2019

Hi @rmccorm4,

Wrong way:

Previous:auto* yoloplugin = new Yolo(80, 32, 13, 3); //Wrong! serialize will failed.

Right way:

YoloPluginCreator yolocreator;
int numclass = 7;
    int stride1 = 32;
    int gridsize1 = 13;
    int numanchors = 3;
    std::vector<PluginField> mPluginAttributes1 = {
        PluginField("numclass", &numclass, PluginFieldType::kINT32, 1),
        PluginField("stride", &stride1, PluginFieldType::kINT32, 1),
        PluginField("gridsize", &gridsize1, PluginFieldType::kINT32, 1),
        PluginField("numanchors", &numanchors, PluginFieldType::kINT32, 1)
    };
    PluginFieldCollection mFC1;
    mFC1.nbFields = mPluginAttributes1.size();
    mFC1.fields = mPluginAttributes1.data();
   auto yoloplugin = yolocreator.createPlugin(yolocreator.getPluginName(), &mFC1));

you cannot re-open your own issues if a repo collaborator closed them.......
thrank you!

dongfangduoshou123 on 30 Oct 2019

🎉2

Awesome, thanks for following up on this with the solution @dongfangduoshou123 !

you cannot re-open your own issues if a repo collaborator closed them.......

Sorry about that, in the future you can open a new issue and reference the closed one, or just tag the closer asking to re-open 🙂

rmccorm4 on 31 Oct 2019

I think this is not the root issue, I change into Creator way still gots error like:

WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
Parsing model
[DEBUG 20:08:06 instanceNormalizationPlugin.cpp:316] enter InstanceNormalization plugin createPlugin()
Building TensorRT engine, FP16 available:0
    Max batch size:     32
    Max workspace size: 1024 MiB
start to build cuda engine...
[2019-12-09 12:10:45 WARNING] TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.6.2
[2019-12-09 12:10:45 WARNING] TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.6.2
start to serialize engine plan...
[2019-12-09 12:10:45   ERROR] FAILED_ALLOCATION: basic_string::_S_construct null not valid
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to create object
[1]    3825 abort (core dumped)  onnx2trt model_instancenorm.onnx -o instancenorm.trt

jinfagang on 9 Dec 2019

I think this is not the root issue, I change into Creator way still gots error like:
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
Parsing model
[DEBUG 20:08:06 instanceNormalizationPlugin.cpp:316] enter InstanceNormalization plugin createPlugin()
Building TensorRT engine, FP16 available:0
Max batch size: 32
Max workspace size: 1024 MiB
start to build cuda engine...
[2019-12-09 12:10:45 WARNING] TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.6.2
[2019-12-09 12:10:45 WARNING] TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.6.2
start to serialize engine plan...
[2019-12-09 12:10:45 ERROR] FAILED_ALLOCATION: basic_string::_S_construct null not valid
terminate called after throwing an instance of 'std::runtime_error'
what(): Failed to create object
[1] 3825 abort (core dumped) onnx2trt model_instancenorm.onnx -o instancenorm.trt