Onnxruntime: ORT 1.5.1 TensorRT EP segfault on unloading shared library

Created on 6 Oct 2020  路  9Comments  路  Source: microsoft/onnxruntime

Describe the bug
Our application builds a shared library, ort_wrapper, using ORT C API, use dlopen(), dlclose() on ort_wrapper to load / unload ORT dynamically. It works fine for EPs like CUDA and OpenVINO, but it results in segfault if TRT EP is used. Stack trace shows

Thread 1 "test" received signal SIGSEGV, Segmentation fault.
0x00007fd78091def6 in (anonymous namespace)::KernelRegistryAndStatus::~KernelRegistryAndStatus() () from /workspace/build/Release/libonnxruntime_providers_tensorrt.so
(gdb) bt
#0  0x00007fd78091def6 in (anonymous namespace)::KernelRegistryAndStatus::~KernelRegistryAndStatus() () from /workspace/build/Release/libonnxruntime_providers_tensorrt.so
#1  0x00007fd7aa72e6c5 in __cxa_finalize (d=0x7fd780d11580) at cxa_finalize.c:83
#2  0x00007fd780919b13 in __do_global_dtors_aux ()
   from /workspace/build/Release/libonnxruntime_providers_tensorrt.so
#3  0x00007ffd744459a0 in ?? ()
#4  0x00007fd7aacf5ccc in _dl_close_worker (map=<optimized out>, 
    force=<optimized out>) at dl-close.c:288
Backtrace stopped: frame did not save the PC

Urgency
High, we are aiming to update ORT to 1.5.1 for the release.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • ONNX Runtime installed from (source or binary): source
  • ONNX Runtime version: 1.5.1
  • Python version: 3.6
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source): gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
  • CUDA/cuDNN version: 11.0.221 / 8.0.4.12
  • GPU model and memory:

To Reproduce

Expected behavior
Program should exit normally

TensorRT 1.5.2 bug

Most helpful comment

@pranavsharma I tested the fix and it did resolve the problem. Thanks all of you for resolving this issue.

All 9 comments

@ryanunderhill , can you please take a look.

@stevenlix, @pranavsharma FYI

@GuanLuo Is it easy for you to try using the Dnnl provider and seeing if the same issue happens? I'm trying to narrow it down and it looks like both providers should have the same issue.

Ah, I verified it crashes with Dnnl also. I have a fix in the works: #5523

@GuanLuo , could you help us validate Ryan's fix on your app?

Sorry for not getting back to this thread, I will try the fix as soon as possible

@GuanLuo any updates? We're looking to release this soon. Please let us know. Thanks!

@pranavsharma I tested the fix and it did resolve the problem. Thanks all of you for resolving this issue.

@pranavsharma I tested the fix and it did resolve the problem. Thanks all of you for resolving this issue.

Thanks @GuanLuo

closing this as it's resolved.

Was this page helpful?
0 / 5 - 0 ratings