Vision: libc10_cuda.so is not found when running the executable file

Created on 24 Aug 2020 · 14Comments · Source: pytorch/vision

🐛 Bug

Following the tutorials about running a torchscript model in c++, I can successfully run a faster rcnn model from torchvision. But when running the executable file, it failed to load a libc10_cuda.so. Currently I set the LD_LIBRARY_PATH as a workaround solution.

By the way, in some header files, like torchvision/nms.h, If I am not wong, a global function was defined in which a static object was initialized for registering ops. If the nms.h was included twice, a multiple definition error was caused, so I have to include these header files in the main.cpp. What is the best practice for a beginner to use torchvision?

To Reproduce

Steps to reproduce the behavior:

download libtorch-cxx11-abi-shared-with-deps-1.6.0.zip
downlaod vision-0.7.0-release
build and install the libtorch and torchvison
build the demo code and run the executable file
run the readelf -d ./example-app, the libc10_cuda.so is missing
Dynamic section at offset 0xb3708 contains 41 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc10.so]
0x0000000000000001 (NEEDED) Shared library: [libtorchvision.so]
0x0000000000000001 (NEEDED) Shared library: [libgeos.so.3.8.1]
0x0000000000000001 (NEEDED) Shared library: [libopencv_videoio.so.4.1]
0x0000000000000001 (NEEDED) Shared library: [libopencv_imgcodecs.so.4.1]
0x0000000000000001 (NEEDED) Shared library: [libopencv_imgproc.so.4.1]
0x0000000000000001 (NEEDED) Shared library: [libopencv_core.so.4.1]
0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so]
0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so]
0x0000000000000001 (NEEDED) Shared library: [libtorch.so]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000001d (RUNPATH) Library runpath: [/data/20200728/libtorch_example/example-app/libtorch/lib:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu]
0x000000000000000c (INIT) 0x5f7b8
0x000000000000000d (FINI) 0x90ed4
0x0000000000000019 (INIT_ARRAY) 0x2b25c8
0x000000000000001b (INIT_ARRAYSZ) 16 (bytes)
0x000000000000001a (FINI_ARRAY) 0x2b25d8
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x2d0
0x0000000000000005 (STRTAB) 0x1b7a0
0x0000000000000006 (SYMTAB) 0x67b8
0x000000000000000a (STRSZ) 255154 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x2b39d8
0x0000000000000002 (PLTRELSZ) 4128 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x5e798
0x0000000000000007 (RELA) 0x5b960
0x0000000000000008 (RELASZ) 11832 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000000000001e (FLAGS) BIND_NOW
0x000000006ffffffb (FLAGS_1) Flags: NOW PIE
0x000000006ffffffe (VERNEED) 0x5b850
0x000000006fffffff (VERNEEDNUM) 4
0x000000006ffffff0 (VERSYM) 0x59c52
0x000000006ffffff9 (RELACOUNT) 319
0x0000000000000000 (NULL) 0x0

#include  <iostream>
#include  <memory>
#include  <vector>
#include  <chrono>
#include  <torch/script.h>
#include "torchvision/vision.h"
#include "torchvision/ROIPool.h"
#include "torchvision/nms.h"

using namespace c10;

int main(int argc, const char* argv[])
 {

    if (argc != 2) {
        std::cerr << "usage : example-app <model path>\n";
        return -1;
    }

    torch::Device device(torch::kCUDA, 0);
    torch::jit::script::Module module;
    try {
        module = torch::jit::load(argv[1]);
        module.to(device);
    }
    catch (const c10::Error& e) {
        std::cerr << "error loading the model\n";
        return -1;
    }
    std::cout << "ok\n";
    cv::VideoCapture cap("/data/test.mp4");
    int frameCnt = 0;
    while(true)
    {
        cv::Mat frame;
        cap >> frame;
        if (frame.empty())
            break;

        auto start = std::chrono::steady_clock::now();
        auto imgSizeMax = std::max(frame.cols, frame.rows);
        auto imgScale = 320.0f / imgSizeMax;
        cv::Mat resized;
        cv::resize(frame, resized, resized.size(), imgScale, imgScale);
        cv::Mat bgr = cv::Mat::zeros(cv::Size(320, 320), CV_8UC3);
        resized.copyTo(bgr(cv::Rect(0, 0, resized.cols, resized.rows)));
        cv::Mat rgb;
        cv::cvtColor(bgr, rgb, cv::COLOR_BGR2RGB);
        auto tensor_image = torch::from_blob(rgb.data, {rgb.rows, rgb.cols, 3}, at::kByte);
        tensor_image = tensor_image.to(at::kFloat) / 255.0f;
        tensor_image = at::transpose(tensor_image, 0, 1);
        tensor_image = at::transpose(tensor_image, 0, 2);
        auto images = c10::List<at::Tensor>({tensor_image.to(device)});
        std::vector<torch::jit::IValue> inputs;
        inputs.emplace_back(images);

        auto output = module.forward(inputs).toTuple();
        auto dets = output->elements().at(1).toList().get(0).toGenericDict();

        frameCnt += 1;
        auto end = std::chrono::steady_clock::now();
        std::cout << "inference time :" << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << std::endl;
    }
}

Expected behavior

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A

OS: Ubuntu 18.04.4 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.14.4

Python version: 2.7 (64-bit runtime)
Is CUDA available: N/A
CUDA runtime version: 10.2.89
GPU models and configuration: GPU 0: GeForce GTX 1070
Nvidia driver version: 440.82
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect

Additional context

c++ frontend

Source

OblivionStaff

All 14 comments

@OblivionStaff I can not reproduce your issue. Here is my output of readelf -d ./example-app

Dynamic section at offset 0x817d0 contains 39 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libtorchvision.so]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_core.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_videoio.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgproc.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libtorch_cpu.so]
 0x0000000000000001 (NEEDED)             Shared library: [libtorch_cuda.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc10.so]
 0x0000000000000001 (NEEDED)             Shared library: [libtorch.so]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000001d (RUNPATH)            Library runpath: [/usr/local/lib:/usr/local/libtorch/lib:/usr/local/cuda/lib64:/opt/conda/lib]
 0x000000000000000c (INIT)               0x3ee38
 0x000000000000000d (FINI)               0x66f84
 0x0000000000000019 (INIT_ARRAY)         0x2810e8
 0x000000000000001b (INIT_ARRAYSZ)       16 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x2810f8
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x2d0
 0x0000000000000005 (STRTAB)             0x129f8
 0x0000000000000006 (SYMTAB)             0x4ce8
 0x000000000000000a (STRSZ)              167757 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x281a80
 0x0000000000000002 (PLTRELSZ)           3696 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x3dfc8
 0x0000000000000007 (RELA)               0x3ccd8
 0x0000000000000008 (RELASZ)             4848 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000000000001e (FLAGS)              BIND_NOW
 0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
 0x000000006ffffffe (VERNEED)            0x3cbb8
 0x000000006fffffff (VERNEEDNUM)         4
 0x000000006ffffff0 (VERSYM)             0x3b946
 0x000000006ffffff9 (RELACOUNT)          128
 0x0000000000000000 (NULL)               0x0

Programs can be executed but as I do not have example video, there is only "ok" printed in the stdout. Am i missing something ?

vfdev-5 on 27 Aug 2020

@vfdev-5 I reinstall my torchvison library. The problem is solved. Thank you.

OblivionStaff on 27 Aug 2020

OK, let's close the issue as solved. Feel free to reopen if you need reinvestigate this problem.

vfdev-5 on 27 Aug 2020

@vfdev-5
If I build torchvision use cmake --CMAKE_PREFIX_PATH=/absolute/path/to/libtorh -DWITH_CUDA=ON .., the error was reproduced.

If I build torchvision use cmake --CMAKE_PREFIX_PATH=/absolute/path/to/libtorh .., there will be a runtime error: 'Could not run 'torchvision::nms' with arguments from the 'CUDA' backend'

OblivionStaff on 27 Aug 2020

@OblivionStaff in my case WITH_CUDA=ON is not used in CMake configuration and there is no error.
Could you please details precisely your CMakeLists.txt, cmake build command and what is the model you are trying to use. Thanks

vfdev-5 on 27 Aug 2020

Hi, I faced the same problem, follow the instruction in Readme.

Use ldd /usr/local/lib/libtorchvision.so | grep cuda, its results is below:

libcudart.so.10.2 => /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 (0x00007fcca24b0000)
libc10_cuda.so => not found
libtorch_cuda.so => not found
libnvToolsExt.so.1 => /usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1 (0x00007fcca22a5000)

I think the problem is in the linking here, and ldd libtorchvision.so in the directory build, its results is normal:

libcudart.so.10.2 => /usr/local/cuda/lib64/libcudart.so.10.2 (0x00007f489bf61000)
libc10_cuda.so => /data/zhiq/packages/torch/stable/libtorch/lib/libc10_cuda.so (0x00007f489bd29000)
libtorch_cuda.so => /data/zhiq/packages/torch/stable/libtorch/lib/libtorch_cuda.so (0x00007f484c806000)
libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007f484c3f9000)

After copy the libtorchvision.so in build into /usr/local/lib, this problem is resolved.

zhiqwang on 27 Aug 2020

@vfdev-5 My issue is same to @zhiqwang . I followed his method and the problem is solved.

OblivionStaff on 28 Aug 2020

Otherwise, if some libs are missing with ldd, you can also set export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/libtorch/lib/.

@OblivionStaff can please provide details of your setup: CMakeLists.txt, cmake build command and what is the model you are trying to use. Thanks !

vfdev-5 on 28 Aug 2020

@vfdev-5
the CMakeLists.txt of torchvision

cmake_minimum_required(VERSION 3.0)
project(torchvision)
set(CMAKE_CXX_STANDARD 14)
set(TORCHVISION_VERSION 0.6.0)

option(WITH_CUDA "Enable CUDA support" OFF)

if(WITH_CUDA)
  enable_language(CUDA)
  add_definitions(-D__CUDA_NO_HALF_OPERATORS__)
  add_definitions(-DWITH_CUDA)
endif()

find_package(Python3 COMPONENTS Development)
find_package(Torch REQUIRED)

file(GLOB HEADERS torchvision/csrc/*.h)
file(GLOB OPERATOR_SOURCES torchvision/csrc/cpu/*.h torchvision/csrc/cpu/*.cpp torchvision/csrc/*.cpp)
if(WITH_CUDA)
  file(GLOB OPERATOR_SOURCES ${OPERATOR_SOURCES} torchvision/csrc/cuda/*.h torchvision/csrc/cuda/*.cu)
endif()
file(GLOB MODELS_HEADERS torchvision/csrc/models/*.h)
file(GLOB MODELS_SOURCES torchvision/csrc/models/*.h torchvision/csrc/models/*.cpp)

add_library(${PROJECT_NAME} SHARED ${MODELS_SOURCES} ${OPERATOR_SOURCES})
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_LIBRARIES} Python3::Python)
set_target_properties(${PROJECT_NAME} PROPERTIES EXPORT_NAME TorchVision)

target_include_directories(${PROJECT_NAME} INTERFACE
  $<BUILD_INTERFACE:${HEADERS}>
  $<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>)

include(GNUInstallDirs)
include(CMakePackageConfigHelpers)

set(TORCHVISION_CMAKECONFIG_INSTALL_DIR "share/cmake/TorchVision" CACHE STRING "install path for TorchVisionConfig.cmake")

configure_package_config_file(cmake/TorchVisionConfig.cmake.in
  "${CMAKE_CURRENT_BINARY_DIR}/TorchVisionConfig.cmake"
  INSTALL_DESTINATION ${TORCHVISION_CMAKECONFIG_INSTALL_DIR})

write_basic_package_version_file(${CMAKE_CURRENT_BINARY_DIR}/TorchVisionConfigVersion.cmake
  VERSION ${TORCHVISION_VERSION}
  COMPATIBILITY AnyNewerVersion)

install(FILES ${CMAKE_CURRENT_BINARY_DIR}/TorchVisionConfig.cmake
  ${CMAKE_CURRENT_BINARY_DIR}/TorchVisionConfigVersion.cmake
  DESTINATION ${TORCHVISION_CMAKECONFIG_INSTALL_DIR})

install(TARGETS ${PROJECT_NAME}
  EXPORT TorchVisionTargets
  LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
  )

install(EXPORT TorchVisionTargets
  NAMESPACE TorchVision::
  DESTINATION ${TORCHVISION_CMAKECONFIG_INSTALL_DIR})

install(FILES ${HEADERS} DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME})
install(FILES
  torchvision/csrc/cpu/vision_cpu.h
  DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}/cpu)
if(WITH_CUDA)
  install(FILES
    torchvision/csrc/cuda/vision_cuda.h 
    DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}/cuda)
endif()
install(FILES ${MODELS_HEADERS} DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}/models)

part of the CMakeLists.txt of my project

project(pipelines)
set(SOURCES
    main.cpp
)
set(CMAKE_CXX_STANDARD 17)
list(APPEND CMAKE_CXX_FLAGS " -pthread ${CMAKE_CXX_FLAGS}")
add_executable(pipelines ${SOURCES})

find_package(PkgConfig REQUIRED)
pkg_check_modules(AVCODEC  REQUIRED IMPORTED_TARGET libavcodec)
pkg_check_modules(AVFORMAT REQUIRED IMPORTED_TARGET libavformat)
pkg_check_modules(AVUTIL   REQUIRED IMPORTED_TARGET libavutil)

find_package(Torch REQUIRED  HINTS /root/libtorch)
if(TORCH_FOUND)
    message("Torch found")
    message(${TORCH_LIBRARIES})
else()
    message(FATAL_ERROR "Can not find Torch")
endif()
find_package(TorchVision REQUIRED)

find_package(PythonLibs REQUIRED )
find_package(OpenCV 4.1.0 REQUIRED)
find_package(GEOS 3.8.1 REQUIRED)

target_include_directories(pipelines
    PRIVATE 
        ${PROJECT_SOURCE_DIR}/include
        ${TensorRT_INCLUDE_DIRS}
        ${OpenCV_INCLUDE_DIRS}
        ${TORCH_INCLUDE_DIR}
        ${PYTHON_INCLUDE_DIRS}
)

find_library(CUVID_LIBRARY nvcuvid HINTS ${CMAKE_SOURCE_DIR}/libs)
find_library(STUBS_LIBRARY cuda HINTS /usr/local/cuda/lib64/stubs/)

target_link_libraries(pipelines
    PRIVATE
        trt::common
        tensorrt_library
        video_decoder::library
        image_decoder
        ${OpenCV_LIBS}
        ${CUDA_LIBRARIES}
        ${CUDA_npp_LIBRARY}
        ${STUBS_LIBRARY}
        ${AVCODEC_LDFLAGS}
        ${AVFORMAT_LDFLAGS}
        ${AVUTIL_LDFLAGS}
        ${CUVID_LIBRARY}
        stdc++fs
        ${TORCH_LIBRARIES}
        ${PYTHON_LIBRARIES}
        GEOS::geos
        TorchVision::TorchVision 
)

OblivionStaff on 28 Aug 2020

👍1

the cmake command:

cmake -DCMAKE_PREFIX_PATH=/root/libtorch/  -DWITH_CUDA=ON ..
make -j8
make install

the model is a faster rcnn model from torchvision:

preidction = model([image_tensor])
script_model = torch.jit.script(model)
script_model.save('faster_rcnn.pt')

OblivionStaff on 28 Aug 2020

Hi @OblivionStaff , can the faster_rcnn model run normally in libtorch with your configuration?

About three weeks ago, I have met some problem in the libtorch inference of faster_rcnn model, maybe I should try your methods :-)

zhiqwang on 28 Aug 2020

@zhiqwang , Yes. In fact, I copied the code about faster rcnn from torchvision and did some modification for my project. For exporting the model, I add some explicit type annotation and trying different version of PyTorch (PyTorch 1.6.0 may work). It has been a long time and I can not recall many details.

OblivionStaff on 28 Aug 2020

🚀1

@zhiqwang we are working on having an end-to-end example of how to use torchvision models in C++ (including faster rcnn).

WIP PR in #2577