Tensorrt: serialize failed with my yoloplugin

Created on 29 Oct 2019  ·  7Comments  ·  Source: NVIDIA/TensorRT

I implement the yolovPlugin (some code copy from https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps)in TensorRT6.0.1.5, test detect result is right and very fast.

But when I add the cudaEngine serialize code to serialize the engine, it failed(error message:[09/29/2019-13:17:47] [E] [TRT] FAILED_ALLOCATION: basic_string::_S_construct null not valid), and when I decrease the yolo detect head from 3 to 2(e.g remove the yolo head with 2626 gridsize), It could serialize successfull and deserialize successfull, I have checkd the yolo head with 2626 gridsize's param was right, so I don't no why it serialize failed with three yolo head, Very puzzled!

Since TensorRT's engine build and serialize module is not opensource, it is very difficult to debug where is the problem, I've tried a long time, need your help.

My implement github address:https://github.com/dongfangduoshou123/YoloV3-TensorRT

I look forward to you could test it and reply me as soon as possible, Thank you!

bug enhancement

Most helpful comment

Hi @rmccorm4,

Finally, I find out the problem:If you want to serialize a builded engine with custom plugin, the plugin must be created by the corresponding pluginCreator, other wise the builded engine could run inference well, but when you serialize it, always failed.
contrast:

Wrong way:

Previous:auto* yoloplugin = new Yolo(80, 32, 13, 3); //Wrong! serialize will failed.

Right way:

YoloPluginCreator yolocreator;
int numclass = 7;
    int stride1 = 32;
    int gridsize1 = 13;
    int numanchors = 3;
    std::vector<PluginField> mPluginAttributes1 = {
        PluginField("numclass", &numclass, PluginFieldType::kINT32, 1),
        PluginField("stride", &stride1, PluginFieldType::kINT32, 1),
        PluginField("gridsize", &gridsize1, PluginFieldType::kINT32, 1),
        PluginField("numanchors", &numanchors, PluginFieldType::kINT32, 1)
    };
    PluginFieldCollection mFC1;
    mFC1.nbFields = mPluginAttributes1.size();
    mFC1.fields = mPluginAttributes1.data();
   auto yoloplugin = yolocreator.createPlugin(yolocreator.getPluginName(), &mFC1));

you could see use the plugin's creator to create the plugin, it should many lines of code, so I advise you could tell your TensorRT team to support create custom plugin instance from the c++'s new it,
and solve the problem that when a plugin is created from new it, serialize will always failed,
make plugin creation more flexibility!

you cannot re-open your own issues if a repo collaborator closed them.......
thrank you!

All 7 comments

@rmccorm4 I have upload the cmake build files, hope you could clone test it and tell me where the problem in the serialization. I think this may be a bug of the TensorRT's binary Components, hope opensource all module of TensorRT.

Today I attempt to serialize the builded engine(contain custom yolo plugin) with trt's python api, it can success, the code logic is the same as in c++ for serialize, so I think this may a bug of TensorRT for serialize engine with custom added plugin in c++.

https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/seralizeEngineFromPythonAPI.py

And I find that the python api interface is so few, for example the ITensor's setname function is not exposed to python, just can be called in c++, I hope the python api could expose all interface that c++ api has, other wise it is very unfriendly for python users.

Hi @dongfangduoshou123,

I don't have the bandwidth at the moment to look through your C++ implementation for the issue. I don't think there would be an issue in the C++ source code that is solved in the Python bindings, because they end up calling the C++ source code anyways. It may be an issue in your C++ implementation.

I'm sorry I can't be more helpful at the moment, but I'm glad it worked using the Python API 🙂

I'm going to close this for now, but please re-open if you find more details or a simpler/minimal example that reproduces the bug.

Hi @rmccorm4,

Finally, I find out the problem:If you want to serialize a builded engine with custom plugin, the plugin must be created by the corresponding pluginCreator, other wise the builded engine could run inference well, but when you serialize it, always failed.
contrast:

Wrong way:

Previous:auto* yoloplugin = new Yolo(80, 32, 13, 3); //Wrong! serialize will failed.

Right way:

YoloPluginCreator yolocreator;
int numclass = 7;
    int stride1 = 32;
    int gridsize1 = 13;
    int numanchors = 3;
    std::vector<PluginField> mPluginAttributes1 = {
        PluginField("numclass", &numclass, PluginFieldType::kINT32, 1),
        PluginField("stride", &stride1, PluginFieldType::kINT32, 1),
        PluginField("gridsize", &gridsize1, PluginFieldType::kINT32, 1),
        PluginField("numanchors", &numanchors, PluginFieldType::kINT32, 1)
    };
    PluginFieldCollection mFC1;
    mFC1.nbFields = mPluginAttributes1.size();
    mFC1.fields = mPluginAttributes1.data();
   auto yoloplugin = yolocreator.createPlugin(yolocreator.getPluginName(), &mFC1));

you could see use the plugin's creator to create the plugin, it should many lines of code, so I advise you could tell your TensorRT team to support create custom plugin instance from the c++'s new it,
and solve the problem that when a plugin is created from new it, serialize will always failed,
make plugin creation more flexibility!

you cannot re-open your own issues if a repo collaborator closed them.......
thrank you!

Awesome, thanks for following up on this with the solution @dongfangduoshou123 !

you cannot re-open your own issues if a repo collaborator closed them.......

Sorry about that, in the future you can open a new issue and reference the closed one, or just tag the closer asking to re-open 🙂

I think this is not the root issue, I change into Creator way still gots error like:

WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
Parsing model
[DEBUG 20:08:06 instanceNormalizationPlugin.cpp:316] enter InstanceNormalization plugin createPlugin()
Building TensorRT engine, FP16 available:0
    Max batch size:     32
    Max workspace size: 1024 MiB
start to build cuda engine...
[2019-12-09 12:10:45 WARNING] TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.6.2
[2019-12-09 12:10:45 WARNING] TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.6.2
start to serialize engine plan...
[2019-12-09 12:10:45   ERROR] FAILED_ALLOCATION: basic_string::_S_construct null not valid
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to create object
[1]    3825 abort (core dumped)  onnx2trt model_instancenorm.onnx -o instancenorm.trt

I think this is not the root issue, I change into Creator way still gots error like:
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
Parsing model
[DEBUG 20:08:06 instanceNormalizationPlugin.cpp:316] enter InstanceNormalization plugin createPlugin()
Building TensorRT engine, FP16 available:0
Max batch size: 32
Max workspace size: 1024 MiB
start to build cuda engine...
[2019-12-09 12:10:45 WARNING] TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.6.2
[2019-12-09 12:10:45 WARNING] TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.6.2
start to serialize engine plan...
[2019-12-09 12:10:45 ERROR] FAILED_ALLOCATION: basic_string::_S_construct null not valid
terminate called after throwing an instance of 'std::runtime_error'
what(): Failed to create object
[1] 3825 abort (core dumped) onnx2trt model_instancenorm.onnx -o instancenorm.trt

Above issue being tracked here: https://github.com/NVIDIA/TensorRT/issues/260

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dhkim0225 picture dhkim0225  ·  6Comments

dhkim0225 picture dhkim0225  ·  4Comments

float123 picture float123  ·  6Comments

AlphaJia picture AlphaJia  ·  3Comments

anmol039w picture anmol039w  ·  5Comments