Onnxruntime: Onnxruntime Error in libcublas.so.10

Created on 24 Sep 2019 · 19Comments · Source: microsoft/onnxruntime

Describe the bug
A clear and concise description of what the bug is.

I am using cuda 10, and I am running the code on aws conda enviornment

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

when I do

pip install onnxruntime
 ```

and run the code it runs smoothly without any error on the CPU. 

When I do

pip install onnxruntime-gpu==0.5.0


I come across a error

Traceback (most recent call last):
File "onxx_inference.py", line 3, in
import onnxruntime as rt
File "/home/ubuntu/anaconda3/envs/pytorch_demo/lib/python3.7/site-packages/onnxruntime/__init__.py", line 20, in
from onnxruntime.capi.session import InferenceSession
File "/home/ubuntu/anaconda3/envs/pytorch_demo/lib/python3.7/site-packages/onnxruntime/capi/session.py", line 9, in
from onnxruntime.capi import _pybind_state as C
File "/home/ubuntu/anaconda3/envs/pytorch_demo/lib/python3.7/site-packages/onnxruntime/capi/_pybind_state.py", line 8, in
import onnxruntime.capi._ld_preload
File "/home/ubuntu/anaconda3/envs/pytorch_demo/lib/python3.7/site-packages/onnxruntime/capi/_ld_preload.py", line 12, in
_libcublas = CDLL("libcublas.so.10", mode=RTLD_GLOBAL)
File "/home/ubuntu/anaconda3/envs/pytorch_demo/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcublas.so.10: cannot open shared object file: No such file or directory


I guess the error says i dont have CUDA 10 (libcublas.so.10) but I in reality I have cuda 10 but still cannot run onnxruntime on the gpu



**Urgency**
URGENT ( my onxx model is ready but inference on CPU is slow. I need inference on GPU)


**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 16.04
- ONNX Runtime installed from (source or binary): pip install onnxruntime-gpu==0.5.0
- ONNX Runtime version:0.5.0
- Python version:3.7
- Visual Studio version (if applicable):NA
- GCC/Compiler version (if compiling from source):NA
- CUDA/cuDNN version:cuda 10: cudnn 7.5.1
- GPU model and memory: Tesla K80

**To Reproduce**



1) conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

2) pip install onnxruntime-gpu==0.5.0

3) run model

**Expected behavior**
A clear and concise description of what you expected to happen.

output box coordinates



**Additional context**
Heres the code I run to get inference:

import torch
import cv2
import onnxruntime as rt

import craft_utils
import imgproc

sess = rt.InferenceSession("onnx/model1.onnx")
input_name = sess.get_inputs()[0].name

first_output_name = sess.get_outputs()[0].name

print('\n')
print('\n')
print('input_name',input_name)
print('output_name',first_output_name)
print('\n')
print('\n')

img = cv2.imread('./images/image1.jpg')
img_resized, target_ratio, size_heatmap = imgproc.resize_aspect_ratio(img, 1280, interpolation=cv2.INTER_LINEAR, mag_ratio=1.5)
ratio_h = ratio_w = 1 / target_ratio

print(ratio_h, ratio_w)

x = imgproc.normalizeMeanVariance(img_resized)
x = torch.from_numpy(x).permute(2, 0, 1) # [h, w, c] to [c, h, w]
x = x.unsqueeze(0) # [c, h, w] to [b, c, h, w]

y, _ = sess.run(None, {input_name: x.numpy()})

make score and link map

score_text = y[0, :, :, 0]
score_link = y[0, :, :, 1]

Post-processing

boxes = craft_utils.getDetBoxes(score_text, score_link, 0.5, 0.4, 0.4)

print(boxes)

boxes = craft_utils.adjustResultCoordinates(boxes, ratio_w, ratio_h)

print(boxes)

```

bug

Source

ajinkya933

Most helpful comment

Please delete "output_0.pb", then use onnx_test_runner run the model again. I tested locally, it works fine.

snnn on 31 Oct 2019

🎉2 👍1

All 19 comments

The Linux Python GPU package requires CUDA 10.1 and cuDNN 7.6
https://github.com/microsoft/onnxruntime#system-requirements-pre-requisite-dependencies

faxu on 26 Sep 2019

@faxu I installed CUDA 10.1 and cuDNN 7.6.


$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

But still I get an error:

$ python onxx_inference.py 
Traceback (most recent call last):
  File "onxx_inference.py", line 3, in <module>
    import onnxruntime as rt
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnxruntime/__init__.py", line 20, in <module>
    from onnxruntime.capi.session import InferenceSession
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnxruntime/capi/session.py", line 9, in <module>
    from onnxruntime.capi import _pybind_state as C
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnxruntime/capi/_pybind_state.py", line 8, in <module>
    import onnxruntime.capi._ld_preload
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnxruntime/capi/_ld_preload.py", line 14, in <module>
    _libcudart = CDLL("libcudart.so.10.1", mode=RTLD_GLOBAL)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcudart.so.10.1: cannot open shared object file: No such file or directory

Heres more info on the gpu:


$ nvidia-smi
Fri Sep 27 11:19:58 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   39C    P8    25W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

ajinkya933 on 27 Sep 2019

To resolve I did

export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.1/bin:$PATH

And now I get a new error

$ python onxx_inference.py 
Bus error (core dumped)

ajinkya933 on 27 Sep 2019

which model are you using?

HectorSVC on 27 Sep 2019

@HectorSVC I am using the model called (craft.onnx) :

https://drive.google.com/file/d/1QeB9cKMYd-OtswbT0xHXJSb6UBRKNdqK/view

The inference script that I am using is:

onxx_inference.py

import torch
import cv2
import onnxruntime as rt

import craft_utils
import imgproc

sess = rt.InferenceSession("onnx/craft.onnx")
input_name = sess.get_inputs()[0].name

first_output_name = sess.get_outputs()[0].name


print('\n')
print('\n')
print('input_name',input_name)
print('output_name',first_output_name)
print('\n')
print('\n')


img = cv2.imread('./images/daneil.jpg')
img_resized, target_ratio, size_heatmap = imgproc.resize_aspect_ratio(img, 1280, interpolation=cv2.INTER_LINEAR, mag_ratio=1.5)
ratio_h = ratio_w = 1 / target_ratio

print(ratio_h, ratio_w)


x = imgproc.normalizeMeanVariance(img_resized)
x = torch.from_numpy(x).permute(2, 0, 1)    # [h, w, c] to [c, h, w]
x = x.unsqueeze(0)                # [c, h, w] to [b, c, h, w]

y, _ = sess.run(None, {input_name: x.numpy()})

# make score and link map
score_text = y[0, :, :, 0]
score_link = y[0, :, :, 1]

# Post-processing
boxes = craft_utils.getDetBoxes(score_text, score_link, 0.5, 0.4, 0.4)
#print(boxes)
boxes = craft_utils.adjustResultCoordinates(boxes, ratio_w, ratio_h)
#print(boxes)

The strange thing is onxx_inference.py works on onnxruntime-cpu but it fails with Bus error (core dumped) error when I try to run it on onnxruntime-gpu

You can get craft_utils from this repo https://github.com/clovaai/CRAFT-pytorch

ajinkya933 on 28 Sep 2019

I've tried run the model with our onnx_test_runner.exe with GPU support there's no problem. Haven't tried the python code yet.

HectorSVC on 30 Sep 2019

@HectorSVC please share the steps to reproduce the output of onnx_test_runner.exe on windows ( for both CPU and GPU )

ajinkya933 on 1 Oct 2019

The difference is I'm using the latest master. You can build it from source. refer here:https://github.com/microsoft/onnxruntime/blob/master/BUILD.md
then you can find a project to build onnx_test_runner.exe, for this one is used for end to end test, so you can provide input/output data to make sure it runs and get the correct data.
there's another one onnxruntime_perf_test.exe which is used for performance test, you can specify the how many times (or how long) you want to run, and get the perf result.

for both these tool, you can specify the execution provider (-e cuda).
https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/test/onnx/main.cc
https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/test/perftest

HectorSVC on 1 Oct 2019

@HectorSVC I build it from source, and I ran one example its execution is :

https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/test/testdata/transform/matmul_add_fusion/2Input

./onnx_test_runner /home/ajinkya/Documents/onnxruntime/onnxruntime/test/testdata/transform/matmul_add_fusion/3Input/

2019-10-03 16:16:43.071444851 [E:onnxruntime:Default, runner.cc:173 ParallelRunTests] Running tests in parallel: at most 2 models at any time
2019-10-03 16:16:43.073451118 [E:onnxruntime:Default, runner.cc:192 ParallelRunTests] Running tests finished. Generating report
result: 
    Models: 2
    Total test cases: 2
        Succeeded: 2
        Not implemented: 0
        Failed: 0
    Stats by Operator type:
        Not implemented(0): 
        Failed:
Failed Test Cases:

I noticed the folder structure of folder named '2Input' is as follows:

2Input
     --test_data_0
          --input_0.pb
          --input_1.pb
          --output_0.pb
     --model.onnx

I want to have a similar setup on my side for my onxx model. But pl help me understand how do you get input_0.pb , input_1.pb , output_0.pb and what should I do to get these on my end . I tried to read them through Netron (https://lutzroeder.github.io/netron/) but input/output.pb file does not seem to be supported by netron.

My aim is to get output of bounding boxes generated after running my script(https://github.com/microsoft/onnxruntime/issues/1898#issuecomment-536152497). How is this achievable using onnx_test_runner ?

ajinkya933 on 3 Oct 2019

Hi @ajinkya933 ,

You may copy this function to your code:
https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/test/onnx/gen_test_models.py#L26

Then, use it for converting a numpy array to protobuf file.

snnn on 3 Oct 2019

issue 1898 is still open

ajinkya933 on 7 Oct 2019

@snnn Thank you for providing this link. I have used linked function and converted numpy array into protobuf file as follows:

import torch
import cv2
import onnxruntime as rt
import craft_utils
import imgproc
import onnx
import numpy as np
import os
import argparse
from onnx import numpy_helper
from onnx import helper
from onnx import utils
from onnx import AttributeProto, TensorProto, GraphProto
from scipy.spatial import distance

# function to convert numpy array into protobuf file is taken 
# from https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/test/onnx/gen_test_models.py#L26  

def write_tensor(f, c,input_name=None):
    tensor = numpy_helper.from_array(c)
    if input_name:
       tensor.name = input_name
    body = tensor.SerializeToString()
    f.write(body)


#read the onnx graph
sess = rt.InferenceSession("onnx/craft.onnx")
input_name = sess.get_inputs()[0].name

img = cv2.imread('./images/andrei_f.JPG')
img = cv2.resize(img, (1280,928))
img = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)


img_resized, target_ratio, size_heatmap = imgproc.resize_aspect_ratio(img, 1280, interpolation=cv2.INTER_LINEAR, mag_ratio=1.5)
ratio_h = ratio_w = 1 / target_ratio

print(ratio_h, ratio_w)


x = imgproc.normalizeMeanVariance(img_resized)
x = torch.from_numpy(x).permute(2, 0, 1)    # [h, w, c] to [c, h, w]
x = x.unsqueeze(0)                # [c, h, w] to [b, c, h, w]
c=x.numpy()
y, _ = sess.run(None, {input_name: x.numpy()})

print('y',y)
print('x',c)

data_dir='2Input/test_data_0'

with open(os.path.join(data_dir,"input_0.pb"),"wb") as f:
    write_tensor(f, c)


with open(os.path.join(data_dir,"output_0.pb"),"wb") as f:
    write_tensor(f, y)


print('Export successful')

I then rearranged the input_0 and output_0 as shown in https://github.com/microsoft/onnxruntime/issues/1898#issuecomment-537904078.

Howerver i get an error :

2019-10-07 12:13:08.258446147 [E:onnxruntime:Default, runner.cc:342 RunTask] 2Input:/home/ajinkya/Documents/onnxruntime/onnxruntime/test/onnx/TestCase.cc:582 void OnnxTestCase::ConvertTestData(const std::vector<onnx::TensorProto>&, onnxruntime::test::HeapBuffer&, bool, std::unordered_map<std::__cxx11::basic_string<char>, OrtValue*>&) data count mismatch, expect 2, got 1
Stacktrace:

result: 
    Models: 1
    Total test cases: 1
        Succeeded: 0
        Not implemented: 0
        Failed: 1
            Got exception while running: 1
    Stats by Operator type:
        Not implemented(0): 
        Failed:
Failed Test Cases:2Input of unknown version
test 2Input failed, please fix it

ajinkya933 on 7 Oct 2019

Hi @ajinkya933

Does the model have two inputs?

snnn on 7 Oct 2019

No, the model file has one input. please see model file (https://drive.google.com/file/d/1QeB9cKMYd-OtswbT0xHXJSb6UBRKNdqK/view). The model name is craft.onnx

' 2Input' is the name of Folder

ajinkya933 on 7 Oct 2019

@snnn were you able to figure this out

ajinkya933 on 10 Oct 2019

@snnn Ive granted you access to the model file, please let me know if you have trouble in downloading it

ajinkya933 on 23 Oct 2019

Where I can find the imgproc module?

snnn on 28 Oct 2019

All the modules are available on : https://github.com/clovaai/CRAFT-pytorch

Location of improc: https://github.com/clovaai/CRAFT-pytorch/blob/master/imgproc.py

ajinkya933 on 29 Oct 2019

Please delete "output_0.pb", then use onnx_test_runner run the model again. I tested locally, it works fine.