Tensorrt: allClassNMS_gpu: topk above 5000 will issue Segmentation fault.

Created on 11 Nov 2020  路  11Comments  路  Source: NVIDIA/TensorRT

Description

When I use trtexec to convert caffe model to trt, if the 'top_k' param is above 4000 in deploy.prototxt it will issue:

[11/11/2020-23:01:19] [V] [TRT] Layer(PluginV2): detection_out, Tactic: 0, detection_out reformatted input 0[Float(409600,1,1)], mbox_conf_flatten[Float(512000,1,1)], mbox_priorbox[Float(2,409600,1)] -> detection_out[Float(1,600,7)], keep_count[Float(1,1,1)]
[11/11/2020-23:01:21] [W] [TRT] TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.3.0
[11/11/2020-23:01:21] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[New Thread 0x7fffc2079700 (LWP 146443)]

Program received signal SIGSEGV, Segmentation fault.

And I use gdb to find out where the problem is:

Thread 6 (Thread 0x7fffc2079700 (LWP 146443)):
#0  0x00007fff00000001 in ?? ()
#1  0x00007fffe8dbd78d in pluginStatus_t allClassNMS_gpu<float, float>(CUstream_st*, int, int, int, int, float, bool, bool, void*, void*, void*, void*, void*, bool) () from /home/TensorRT-7.0.0.11/lib/libnvinfer_plugin.so.7
#2  0x00007fffe8dbb70c in allClassNMS(CUstream_st*, int, int, int, int, float, bool, bool, nvinfer1::DataType, nvinfer1::DataType, void*, void*, void*, void*, void*, bool) () from /home/TensorRT-7.0.0.11/lib/libnvinfer_plugin.so.7
#3  0x00007fffe8dc2bf1 in detectionInference(CUstream_st*, int, int, int, bool, bool, int, int, int, int, int, float, float, nvinfer1::plugin::CodeTypeSSD, nvinfer1::DataType, void const*, void const*, nvinfer1::DataType, void const*, void*, void*, void*, bool, bool) ()
   from /home/TensorRT-7.0.0.11/lib/libnvinfer_plugin.so.7
#4  0x00007fffe8d7650b in nvinfer1::plugin::DetectionOutput::enqueue(int, void const* const*, void**, void*, CUstream_st*) ()
   from /home/TensorRT-7.0.0.11/lib/libnvinfer_plugin.so.7
#5  0x00007fffea669a42 in nvinfer1::rt::cuda::PluginV2Runner::execute(nvinfer1::rt::CommonContext const&, nvinfer1::rt::ExecutionParameters const&) const () from /home/TensorRT-7.0.0.11/lib/libnvinfer.so.7
#6  0x00007fffea3d41b1 in nvinfer1::rt::ExecutionContext::enqueueInternal(CUevent_st**) ()
   from /homeThread 6 (Thread 0x7fffc2079700 (LWP 146443)):
#0  0x00007fff00000001 in ?? ()
#1  0x00007fffe8dbd78d in pluginStatus_t allClassNMS_gpu<float, float>(CUstream_st*, int, int, int, int, float, bool, bool, void*, void*, void*, void*, void*, bool) () from /home/TensorRT-7.0.0.11/lib/libnvinfer_plugin.so.7
#2  0x00007fffe8dbb70c in allClassNMS(CUstream_st*, int, int, int, int, float, bool, bool, nvinfer1::DataType, nvinfer1::DataType, void*, void*, void*, void*, void*, bool) () from /home/TensorRT-7.0.0.11/lib/libnvinfer_plugin.so.7
#3  0x00007fffe8dc2bf1 in detectionInference(CUstream_st*, int, int, int, bool, bool, int, int, int, int, int, float, float, nvinfer1::plugin::CodeTypeSSD, nvinfer1::DataType, void const*, void const*, nvinfer1::DataType, void const*, void*, void*, void*, bool, bool) ()
   from /home/TensorRT-7.0.0.11/lib/libnvinfer_plugin.so.7
#4  0x00007fffe8d7650b in nvinfer1::plugin::DetectionOutput::enqueue(int, void const* const*, void**, void*, CUstream_st*) ()
   from /home/TensorRT-7.0.0.11/lib/libnvinfer_plugin.so.7
#5  0x00007fffea669a42 in nvinfer1::rt::cuda::PluginV2Runner::execute(nvinfer1::rt::CommonContext const&, nvinfer1::rt::ExecutionParameters const&) const () from /home/TensorRT-7.0.0.11/lib/libnvinfer.so.7
#6  0x00007fffea3d41b1 in nvinfer1::rt::ExecutionContext::enqueueInternal(CUevent_st**) ()
   from /home/TensorRT-7.0.0.11/lib/libnvinfer.so.7
#7  0x00007fffea3d6ed0 in nvinfer1::rt::ExecutionContext::enqueue(int, void**, CUstream_st*, CUevent_st**) ()
   from /home/TensorRT-7.0.0.11/lib/libnvinfer.so.7/TensorRT-7.0.0.11/lib/libnvinfer.so.7
#7  0x00007fffea3d6ed0 in nvinfer1::rt::ExecutionContext::enqueue(int, void**, CUstream_st*, CUevent_st**) ()
   from /home/TensorRT-7.0.0.11/lib/libnvinfer.so.7

It seems the problem is from allClassNMS_gpu and when I set 'topk' below 4000 the problem will disappear.
I also found that if the topk is greater than 4000 (or 5000?) in places where nms-op is used, the same error will occur.
Is there a limit to topk value in NMS-op(in plugin)?

Environment

TensorRT Version: TensorRT-7.0.0.11
GPU Type: 1080ti
Nvidia Driver Version: 440.36
CUDA Version: 10.2
CUDNN Version: 7.6.5

Plugins triaged

All 11 comments

This is a limitation of the plugin: https://github.com/NVIDIA/TensorRT/tree/master/plugin/batchedNMSPlugin#known-issues

I've updated it upstream so it issues a more obvious error message rather than segfaulting.

I think most images are unlikely to have so many detections though. Is there a reason you need a topK of >4096?

Just to clarify, prior to running the NMS kernel, we sort the bounding boxes by confidence scores, and then only take the top K boxes. The low confidence boxes are discarded. So the topK value does not need to be the same as the number of boxes generated - it just needs to be large enough to capture real detected objects.

sorry but @pranavm-nvidia I was build OSS tensorrt libnvifer_plugin.so file but every symbol function dont exist in file library libnvifer_plugin.so. I want to call createBatchNMSPlugin but It was not there. only download new file libnvifer_plugin.so it will there?

@ledinhtri97 You can use the plugin registry if you're looking to create plugin instances manually.
First get the registry, then get the plugin creator you need, and finally create the plugin .

@pranavm-nvidia thank you but
I run nm -gDC out/libnvinfer_plugin.so of the file I was build from source:
output here: you can see that all symbol dont have any plugin exist. maybe i miss take something?

w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 U _Unwind_Resume
                 U std::ctype<char>::_M_widen_init() const
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const
                 U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const
                 U std::exception::what() const
                 U std::ostream::put(char)
                 U std::ostream::flush()
                 U std::ostream& std::ostream::_M_insert<bool>(bool)
                 U std::ostream& std::ostream::_M_insert<double>(double)
                 U std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long)
                 U std::ostream::operator<<(int)
                 U std::runtime_error::runtime_error(char const*)
                 U std::runtime_error::~runtime_error()
                 U std::basic_streambuf<char, std::char_traits<char> >::imbue(std::locale const&)
                 U std::basic_streambuf<char, std::char_traits<char> >::uflow()
                 U std::basic_streambuf<char, std::char_traits<char> >::xsgetn(char*, long)
                 U std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long)
                 U std::invalid_argument::invalid_argument(char const*)
                 U std::invalid_argument::~invalid_argument()
                 U std::locale::locale()
                 U std::locale::~locale()
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long)
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::reserve(unsigned long)
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_erase(unsigned long, unsigned long)
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&)
                 U std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::setbuf(char*, long)
                 U std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::_M_sync(char*, unsigned long, unsigned long)
                 U std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::seekoff(long, std::_Ios_Seekdir, std::_Ios_Openmode)
                 U std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::seekpos(std::fpos<__mbstate_t>, std::_Ios_Openmode)
                 U std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::overflow(int)
                 U std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::pbackfail(int)
                 U std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::showmanyc()
                 U std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::underflow()
                 U std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_ostringstream()
                 U std::ios_base::Init::Init()
                 U std::ios_base::Init::~Init()
                 U std::ios_base::ios_base()
                 U std::ios_base::~ios_base()
                 U std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*)
                 U std::basic_ios<char, std::char_traits<char> >::clear(std::_Ios_Iostate)
                 U std::exception::~exception()
                 U std::_Hash_bytes(void const*, unsigned long, unsigned long)
                 U std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
                 U std::__throw_bad_cast()
                 U std::__throw_bad_alloc()
                 U std::__throw_logic_error(char const*)
                 U std::__throw_length_error(char const*)
                 U std::__throw_system_error(int)
                 U std::cerr
                 U std::cout
                 U std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
                 U typeinfo for std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >
                 U typeinfo for std::ostream
                 U typeinfo for std::runtime_error
                 U typeinfo for std::invalid_argument
                 U typeinfo for std::exception
                 U VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >
                 U vtable for __cxxabiv1::__class_type_info
                 U vtable for __cxxabiv1::__si_class_type_info
                 U vtable for __cxxabiv1::__vmi_class_type_info
                 U vtable for std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >
                 U vtable for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >
                 U vtable for std::basic_streambuf<char, std::char_traits<char> >
                 U vtable for std::basic_ios<char, std::char_traits<char> >
                 U operator delete[](void*)
                 U operator delete(void*)
                 U operator new[](unsigned long)
                 U operator new(unsigned long)
                 U __cudaPopCallConfiguration
                 U __cudaPushCallConfiguration
                 U __cudaRegisterFatBinary
                 U __cudaRegisterFatBinaryEnd
                 U __cudaRegisterFunction
                 U __cudaRegisterVar
                 U __cudaUnregisterFatBinary
                 U __cxa_allocate_exception
                 U __cxa_atexit
                 U __cxa_begin_catch
                 U __cxa_end_catch
                 w __cxa_finalize
                 U __cxa_free_exception
                 U __cxa_guard_abort
                 U __cxa_guard_acquire
                 U __cxa_guard_release
                 U __cxa_pure_virtual
                 U __cxa_rethrow
                 U __cxa_throw
                 U __cxa_throw_bad_array_new_length
                 U __fprintf_chk
                 w __gmon_start__
                 U __gxx_personality_v0
                 U __printf_chk
                 w __pthread_key_create
                 U __stack_chk_fail
                 U abort
                 U cublasCreate_v2
                 U cublasDestroy_v2
                 U cublasGemmStridedBatchedEx
                 U cublasGetMathMode
                 U cublasGetPointerMode_v2
                 U cublasLtCreate
                 U cublasLtDestroy
                 U cublasLtMatmul
                 U cublasLtMatmulAlgoCapGetAttribute
                 U cublasLtMatmulAlgoCheck
                 U cublasLtMatmulAlgoConfigGetAttribute
                 U cublasLtMatmulAlgoConfigSetAttribute
                 U cublasLtMatmulAlgoGetIds
                 U cublasLtMatmulAlgoInit
                 U cublasLtMatmulDescCreate
                 U cublasLtMatmulDescDestroy
                 U cublasLtMatmulDescSetAttribute
                 U cublasLtMatmulPreferenceCreate
                 U cublasLtMatmulPreferenceDestroy
                 U cublasLtMatmulPreferenceSetAttribute
                 U cublasLtMatrixLayoutCreate
                 U cublasLtMatrixLayoutDestroy
                 U cublasLtMatrixLayoutSetAttribute
                 U cublasSasum_v2
                 U cublasScopy_v2
                 U cublasSetMathMode
                 U cublasSetPointerMode_v2
                 U cublasSetStream_v2
                 U cublasSgemmStridedBatched
                 U cublasSscal_v2
                 U cudaDeviceReset
                 U cudaDeviceSynchronize
                 U cudaEventCreate
                 U cudaEventCreateWithFlags
                 U cudaEventDestroy
                 U cudaEventElapsedTime
                 U cudaEventRecord
                 U cudaEventSynchronize
                 U cudaFree
                 U cudaFreeHost
                 U cudaFuncGetAttributes
                 U cudaGetDevice
                 U cudaGetDeviceProperties
                 U cudaGetErrorString
                 U cudaGetLastError
                 U cudaHostAlloc
                 U cudaLaunchKernel
                 U cudaMalloc
                 U cudaMallocHost
                 U cudaMemcpy
                 U cudaMemcpyAsync
                 U cudaMemset
                 U cudaMemsetAsync
                 U cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags
                 U cudaPeekAtLastError
                 U cudaStreamCreate
                 U cudaStreamDestroy
                 U cudaStreamSynchronize
                 U cudnnBatchNormalizationForwardTraining
                 U cudnnCreateTensorDescriptor
                 U cudnnDeriveBNTensorDescriptor
                 U cudnnDestroyTensorDescriptor
                 U cudnnSetStream
                 U cudnnSetTensor4dDescriptor
                 U dlclose
                 U dlopen
                 U dlsym
                 U exit
                 U free
                 U getLogger
                 U getPluginRegistry
000000000008f420 T initLibNvInferPlugins
                 U localtime
                 U malloc
                 U memcmp
                 U memcpy
                 U memmove
                 U memset
                 U pow
                 w pthread_mutex_lock
                 w pthread_mutex_unlock
                 U roundf
                 U sqrt
                 U sqrtf
                 U stderr
                 U strcmp
                 U strlen
                 U time

and here is output of file libnvinfer_plugin.so default download from nvidia-tensorrt tar file.
nm -gDC down/libnvinfer_plugin.so | grep NMS

0000000000087a50 T nvinfer1::plugin::BatchedNMSPlugin::initialize()
0000000000088690 T nvinfer1::plugin::BatchedNMSPlugin::setClipParam(bool)
0000000000088110 T nvinfer1::plugin::BatchedNMSPlugin::configurePlugin(nvinfer1::Dims const*, int, nvinfer1::Dims const*, int, nvinfer1::DataType const*, nvinfer1::DataType const*, bool const*, bool const*, nvinfer1::TensorFormat, int)
0000000000087fe0 T nvinfer1::plugin::BatchedNMSPlugin::setPluginNamespace(char const*)
0000000000087c20 T nvinfer1::plugin::BatchedNMSPlugin::getOutputDimensions(int, nvinfer1::Dims const*, int)
0000000000088010 T nvinfer1::plugin::BatchedNMSPlugin::destroy()
0000000000087b40 T nvinfer1::plugin::BatchedNMSPlugin::enqueue(int, void const* const*, void**, void*, CUstream_st*)
0000000000087a60 T nvinfer1::plugin::BatchedNMSPlugin::terminate()
0000000000088480 T nvinfer1::plugin::BatchedNMSPlugin::BatchedNMSPlugin(nvinfer1::plugin::NMSParameters)
00000000000884d0 T nvinfer1::plugin::BatchedNMSPlugin::BatchedNMSPlugin(void const*, unsigned long)
0000000000088480 T nvinfer1::plugin::BatchedNMSPlugin::BatchedNMSPlugin(nvinfer1::plugin::NMSParameters)
00000000000884d0 T nvinfer1::plugin::BatchedNMSPlugin::BatchedNMSPlugin(void const*, unsigned long)
0000000000089100 W nvinfer1::plugin::BatchedNMSPlugin::~BatchedNMSPlugin()
00000000000891e0 W nvinfer1::plugin::BatchedNMSPlugin::~BatchedNMSPlugin()
00000000000891e0 W nvinfer1::plugin::BatchedNMSPlugin::~BatchedNMSPlugin()
000000000005c480 T nvinfer1::plugin::NMSPluginCreator::createPlugin(char const*, nvinfer1::PluginFieldCollection const*)
000000000005bcf0 T nvinfer1::plugin::NMSPluginCreator::getFieldNames()
000000000005ce50 T nvinfer1::plugin::NMSPluginCreator::deserializePlugin(char const*, void const*, unsigned long)
000000000097a6d0 B nvinfer1::plugin::NMSPluginCreator::mPluginAttributes
000000000097a6f0 B nvinfer1::plugin::NMSPluginCreator::mFC
000000000005ceb0 T nvinfer1::plugin::NMSPluginCreator::NMSPluginCreator()
000000000005ceb0 T nvinfer1::plugin::NMSPluginCreator::NMSPluginCreator()
000000000005d5a0 W nvinfer1::plugin::NMSPluginCreator::~NMSPluginCreator()
000000000005d480 W nvinfer1::plugin::NMSPluginCreator::~NMSPluginCreator()
000000000005d480 W nvinfer1::plugin::NMSPluginCreator::~NMSPluginCreator()
0000000000088740 T nvinfer1::plugin::BatchedNMSPluginCreator::createPlugin(char const*, nvinfer1::PluginFieldCollection const*)
0000000000087b00 T nvinfer1::plugin::BatchedNMSPluginCreator::getFieldNames()
0000000000088630 T nvinfer1::plugin::BatchedNMSPluginCreator::deserializePlugin(char const*, void const*, unsigned long)
000000000097cae0 B nvinfer1::plugin::BatchedNMSPluginCreator::mPluginAttributes
000000000097cb00 B nvinfer1::plugin::BatchedNMSPluginCreator::mFC
0000000000088c70 T nvinfer1::plugin::BatchedNMSPluginCreator::BatchedNMSPluginCreator()
0000000000088c70 T nvinfer1::plugin::BatchedNMSPluginCreator::BatchedNMSPluginCreator()
00000000000674e0 W nvinfer1::plugin::BatchedNMSPluginCreator::~BatchedNMSPluginCreator()
00000000000695c0 W nvinfer1::plugin::BatchedNMSPluginCreator::~BatchedNMSPluginCreator()
00000000000695c0 W nvinfer1::plugin::BatchedNMSPluginCreator::~BatchedNMSPluginCreator()
0000000000087a40 T nvinfer1::plugin::BatchedNMSPlugin::getNbOutputs() const
0000000000087aa0 T nvinfer1::plugin::BatchedNMSPlugin::getPluginType() const
0000000000087a80 T nvinfer1::plugin::BatchedNMSPlugin::supportsFormat(nvinfer1::DataType, nvinfer1::TensorFormat) const
0000000000087ab0 T nvinfer1::plugin::BatchedNMSPlugin::getPluginVersion() const
0000000000087b10 T nvinfer1::plugin::BatchedNMSPlugin::getWorkspaceSize(int) const
0000000000087ad0 T nvinfer1::plugin::BatchedNMSPlugin::getOutputDataType(int, nvinfer1::DataType const*, int) const
0000000000087ac0 T nvinfer1::plugin::BatchedNMSPlugin::getPluginNamespace() const
0000000000087a70 T nvinfer1::plugin::BatchedNMSPlugin::getSerializationSize() const
0000000000087af0 T nvinfer1::plugin::BatchedNMSPlugin::canBroadcastInputAcrossBatch(int) const
0000000000087ae0 T nvinfer1::plugin::BatchedNMSPlugin::isOutputBroadcastAcrossBatch(int, bool const*, int) const
00000000000886a0 T nvinfer1::plugin::BatchedNMSPlugin::clone() const
0000000000087f10 T nvinfer1::plugin::BatchedNMSPlugin::serialize(void*) const
000000000005bc90 T nvinfer1::plugin::NMSPluginCreator::getPluginName() const
000000000005bca0 T nvinfer1::plugin::NMSPluginCreator::getPluginVersion() const
0000000000087aa0 T nvinfer1::plugin::BatchedNMSPluginCreator::getPluginName() const
0000000000087ab0 T nvinfer1::plugin::BatchedNMSPluginCreator::getPluginVersion() const
0000000000066290 T createBatchedNMSPlugin
0000000000065fc0 T createNMSPlugin

@ledinhtri97 That is strange, could you open a new issue for that?

hi @pranavm-nvidia I added new issues #894, please check out.

Hello @Oldpan , we can extend the plugin to support more boxes. Could you try rebuild the plugin with change I mentioned in:
https://github.com/NVIDIA/TensorRT/issues/510#issuecomment-725795935

@pranavm-nvidia Yes 馃槀 my ssd model needs to output 10000 anchors to select and I need to set topk to 10000.Thanks for ur interpret about batchnms. Now I think out another way to accomplish my task...
@ttyio Thanks ~ I changed the allClassNMS_gpu function to this:

  void (*kernel[])(const int, const int, const int, const int, const float,
                     const bool, const bool, float *, T_SCORE *, int *,
                     T_SCORE *, int *, bool) = {
      P(1), P(2), P(3), P(4), P(5), P(6), P(7), P(8), P(9), P(10),
      P(11), P(12), P(13), P(14), P(15), P(16), P(17)
  };

But when I put topk to 6200(6000 is ok!), it will issue Aborted (core dumped) .It seems the same problem as before, I鈥榤 looking for this problem.

@Oldpan do you have the callstack?

@ttyio
In my situation It's not convenient to outputs callstack,I'll try another way to get more information

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dhkim0225 picture dhkim0225  路  6Comments

MachineJeff picture MachineJeff  路  5Comments

yflv-yanxia picture yflv-yanxia  路  3Comments

AlphaJia picture AlphaJia  路  3Comments

SvanKeulen picture SvanKeulen  路  5Comments