Serving: how to run tensorflow/serving:gpu in docker 19.03

Created on 13 Nov 2019  路  11Comments  路  Source: tensorflow/serving

Some ERRORs occured When I try to run tensorflow/serving:latest-gpu.
Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
Ubuntu 16.04.6 LTS
docker version: 19.03
cuda version:9.0

awaiting tensorflower bug

Most helpful comment

Out of the box, tensorflow-gpu 2.0 installed from pypi by pip doesn't work under container pulled from NVIDIA docker image nvidia/cuda:10.0-cudnn7-devel.

Even I created a symlink from /usr/local/cuda to /usr/local/nvidia, I still failed to launch 1D CNN in Keras.

I have to pull image nvcr.io/nvidia/tensorflow:19.12-tf2-py3 from NVIDIA to make it runs.

@luvwinnie
Here is the list

All 11 comments

Please go though this issue particularly this comment as your system is missing a few configurations. Thanks!

hi, mybe didn't use --gpus flags when docker run. example:

docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

@YelongYin,
Can you please let us know if your issue still persists, or is it resolved, so that we can work towards closure of this issue. Thanks!

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

I'm having the same problem. Running tensorflow-gpu locally with cuda10 succeeded. However, when tensorflow serving is run with docker, the following message appears and uses only cpu resources.

2019-12-06 09: 59: 12.917246: W external / org_tensorflow / tensorflow / stream_executor / platform / default / dso_loader.cc: 55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: / usr / local / nvidia / lib: / usr / local / nvidia / lib64
2019-12-06 09: 59: 12.917287: E external / org_tensorflow / tensorflow / stream_executor / cuda / cuda_driver.cc: 318] failed call to cuInit: UNKNOWN ERROR (303)

Is there any list of compatible CUDA CUDNN version?

Could you reopen the issue? I experienced the same thing. When using tensorflow-gpu-2.1.0 and docker 19.03.4.

Error log from notebook:

2020-01-25 04:32:14.300854: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-25 04:32:14.300950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-25 04:32:14.300959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Why tensorflow-gpu look for cuda library path /usr/local/nvidia/lib:/usr/local/nvidia/lib64 rather than /usr/local/cuda? Do you on purpose to sabotage the build?

docker run --gpus all --rm nvidia/cuda nvidia-smi
Unable to find image 'nvidia/cuda:latest' locally
latest: Pulling from nvidia/cuda
7ddbc47eeb70: Pull complete 
c1bbdc448b72: Pull complete 
8c3b70e39044: Pull complete 
45d437916d57: Pull complete 
d8f1569ddae6: Pull complete 
85386706b020: Pull complete 
ee9b457b77d0: Pull complete 
be4f3343ecd3: Pull complete 
30b4effda4fd: Pull complete 
Digest: sha256:31e2a1ca7b0e1f678fb1dd0c985b4223273f7c0f3dbde60053b371e2a1aee2cd
Status: Downloaded newer image for nvidia/cuda:latest
Sat Jan 25 04:50:09 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 440.44       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   34C    P8     9W / 185W |    894MiB /  7981MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

I can created a symlink in the docker image. But this stinks. Because /usr/local/cuda is a default nvidia library installation location and all NVIDIA place their stuffs there.

Why you mess this up when you build tensorflow-gpu?

Out of the box, tensorflow-gpu 2.0 installed from pypi by pip doesn't work under container pulled from NVIDIA docker image nvidia/cuda:10.0-cudnn7-devel.

Even I created a symlink from /usr/local/cuda to /usr/local/nvidia, I still failed to launch 1D CNN in Keras.

I have to pull image nvcr.io/nvidia/tensorflow:19.12-tf2-py3 from NVIDIA to make it runs.

@luvwinnie
Here is the list

I met the same problem,Is there any compatible CUDA CUDNN version for tensorflow serving?

Running into the same issue

Does the issue still exist with the latest version? There was some mismatches for the library version between Tensorflow and Tensorflow Serving.

Was this page helpful?
0 / 5 - 0 ratings