Serving: how to run tensorflow/serving:gpu in docker 19.03

Created on 13 Nov 2019 · 11Comments · Source: tensorflow/serving

Some ERRORs occured When I try to run tensorflow/serving:latest-gpu.
Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
Ubuntu 16.04.6 LTS
docker version: 19.03
cuda version:9.0

awaiting tensorflower bug

Source

YelongYin

Most helpful comment

Out of the box, tensorflow-gpu 2.0 installed from pypi by pip doesn't work under container pulled from NVIDIA docker image nvidia/cuda:10.0-cudnn7-devel.

Even I created a symlink from /usr/local/cuda to /usr/local/nvidia, I still failed to launch 1D CNN in Keras.

I have to pull image nvcr.io/nvidia/tensorflow:19.12-tf2-py3 from NVIDIA to make it runs.

@luvwinnie
Here is the list

rickyzhang82 on 26 Jan 2020

👍2

All 11 comments

Please go though this issue particularly this comment as your system is missing a few configurations. Thanks!

gowthamkpr on 13 Nov 2019

hi, mybe didn't use --gpus flags when docker run. example:

docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

Talbot3 on 21 Nov 2019

👍2

@YelongYin,
Can you please let us know if your issue still persists, or is it resolved, so that we can work towards closure of this issue. Thanks!

rmothukuru on 26 Nov 2019

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

rmothukuru on 2 Dec 2019

I'm having the same problem. Running tensorflow-gpu locally with cuda10 succeeded. However, when tensorflow serving is run with docker, the following message appears and uses only cpu resources.

2019-12-06 09: 59: 12.917246: W external / org_tensorflow / tensorflow / stream_executor / platform / default / dso_loader.cc: 55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: / usr / local / nvidia / lib: / usr / local / nvidia / lib64
2019-12-06 09: 59: 12.917287: E external / org_tensorflow / tensorflow / stream_executor / cuda / cuda_driver.cc: 318] failed call to cuInit: UNKNOWN ERROR (303)

penny4860 on 6 Dec 2019

Is there any list of compatible CUDA CUDNN version?

luvwinnie on 12 Dec 2019

Could you reopen the issue? I experienced the same thing. When using tensorflow-gpu-2.1.0 and docker 19.03.4.

Error log from notebook:

2020-01-25 04:32:14.300854: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-25 04:32:14.300950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-25 04:32:14.300959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Why tensorflow-gpu look for cuda library path /usr/local/nvidia/lib:/usr/local/nvidia/lib64 rather than /usr/local/cuda? Do you on purpose to sabotage the build?

docker run --gpus all --rm nvidia/cuda nvidia-smi
Unable to find image 'nvidia/cuda:latest' locally
latest: Pulling from nvidia/cuda
7ddbc47eeb70: Pull complete 
c1bbdc448b72: Pull complete 
8c3b70e39044: Pull complete 
45d437916d57: Pull complete 
d8f1569ddae6: Pull complete 
85386706b020: Pull complete 
ee9b457b77d0: Pull complete 
be4f3343ecd3: Pull complete 
30b4effda4fd: Pull complete 
Digest: sha256:31e2a1ca7b0e1f678fb1dd0c985b4223273f7c0f3dbde60053b371e2a1aee2cd
Status: Downloaded newer image for nvidia/cuda:latest
Sat Jan 25 04:50:09 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 440.44       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   34C    P8     9W / 185W |    894MiB /  7981MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

I can created a symlink in the docker image. But this stinks. Because /usr/local/cuda is a default nvidia library installation location and all NVIDIA place their stuffs there.

Why you mess this up when you build tensorflow-gpu?

rickyzhang82 on 25 Jan 2020

👍2

Out of the box, tensorflow-gpu 2.0 installed from pypi by pip doesn't work under container pulled from NVIDIA docker image nvidia/cuda:10.0-cudnn7-devel.

Even I created a symlink from /usr/local/cuda to /usr/local/nvidia, I still failed to launch 1D CNN in Keras.

I have to pull image nvcr.io/nvidia/tensorflow:19.12-tf2-py3 from NVIDIA to make it runs.

@luvwinnie
Here is the list

rickyzhang82 on 26 Jan 2020

👍2

I met the same problem,Is there any compatible CUDA CUDNN version for tensorflow serving?

upwindflys on 5 Mar 2020

Running into the same issue

Zethson on 2 Apr 2020

Does the issue still exist with the latest version? There was some mismatches for the library version between Tensorflow and Tensorflow Serving.

shadowdragon89 on 16 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Serving "metadata" - empty input signature

marcoadurno · 3Comments

Apt-get Install does not use GPU

dylanrandle · 3Comments

Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf//': Could not find handler for bind rule //external:protobuf error on ubuntu 16.04

sandipmgiri · 3Comments

Verbose logging for prediction

waichee · 4Comments

Op type not registered 'ClipByValue' in binary running on 229d61c80ffd

cchung100m · 4Comments