Hi, I'm having trouble with finding libnvidia-ml.so library when I input the command nvidia-smi
[root@localhost ~]# nvidia-docker run -it --rm docker.io/nvidia/cuda
root@19b523a18622:/# nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
I found the issue with same problem in another issue article(here) but it seems that he yet not solved problem.
I'm not sure how nvidia-docker works. I mean, don't they load my kernel's nvidia driver data to the docker image? I don't know how they works. (Is there any material to understand the mechanism?)
Anyhow, I couldn't solve the problem due to libnvidia-ml.so which should be located in /usr/lib directory.
(+)
I actually succeeded with following command
nvidia-docker run -it --rm -v /usr/lib:/usr/lib --privileged=true docker.io/nvidia/cuda
by mounting the library file on to docker image. - but it doesn't offer opencl with following output.
./hello: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no version information available (required by ./hello)
Am I following the right track?
I found this can be solved with --privileged=true option, though I still having trouble with opencl
You don't need --privileged, nor you need to mount /usr/lib. Maybe there is an issue with your driver installation, does nvidia-smi works outside a container? And what's the output of ldconfig -p | grep nvidia?
My outputs are:
[root@localhost ~]# nvidia-smi
Fri Jun 23 11:00:55 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 560 Off | 0000:01:00.0 N/A | N/A |
| 18% 36C P12 N/A / N/A | 225MiB / 963MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
and
[root@localhost ~]# ldconfig -p | grep nvidia
libvdpau_nvidia.so (libc6,x86-64, OS ABI: Linux 2.3.99) => /lib64/libvdpau_nvidia.so
libvdpau_nvidia.so (libc6, OS ABI: Linux 2.3.99) => /lib/libvdpau_nvidia.so
libnvidia-tls.so.375.26 (libc6,x86-64, hwcap: 0x8000000000000000, OS ABI: Linux 2.3.99) => /lib64/tls/libnvidia-tls.so.375.26
libnvidia-tls.so.375.26 (libc6,x86-64, OS ABI: Linux 2.2.5) => /lib64/libnvidia-tls.so.375.26
libnvidia-tls.so.375.26 (libc6, OS ABI: Linux 2.2.5) => /lib/libnvidia-tls.so.375.26
libnvidia-ptxjitcompiler.so.375.26 (libc6,x86-64) => /lib64/libnvidia-ptxjitcompiler.so.375.26
libnvidia-ptxjitcompiler.so.375.26 (libc6) => /lib/libnvidia-ptxjitcompiler.so.375.26
libnvidia-opencl.so.1 (libc6,x86-64) => /lib64/libnvidia-opencl.so.1
libnvidia-opencl.so.1 (libc6) => /lib/libnvidia-opencl.so.1
libnvidia-ml.so.1 (libc6,x86-64) => /lib64/libnvidia-ml.so.1
libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
libnvidia-ml.so (libc6,x86-64) => /lib64/libnvidia-ml.so
libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so
libnvidia-ifr.so.1 (libc6,x86-64) => /lib64/libnvidia-ifr.so.1
libnvidia-ifr.so.1 (libc6) => /lib/libnvidia-ifr.so.1
libnvidia-ifr.so (libc6,x86-64) => /lib64/libnvidia-ifr.so
libnvidia-ifr.so (libc6) => /lib/libnvidia-ifr.so
libnvidia-gtk3.so.375.26 (libc6,x86-64) => /lib64/libnvidia-gtk3.so.375.26
libnvidia-gtk2.so.375.26 (libc6,x86-64) => /lib64/libnvidia-gtk2.so.375.26
libnvidia-glsi.so.375.26 (libc6,x86-64) => /lib64/libnvidia-glsi.so.375.26
libnvidia-glsi.so.375.26 (libc6) => /lib/libnvidia-glsi.so.375.26
libnvidia-glcore.so.375.26 (libc6,x86-64) => /lib64/libnvidia-glcore.so.375.26
libnvidia-glcore.so.375.26 (libc6) => /lib/libnvidia-glcore.so.375.26
libnvidia-fbc.so.1 (libc6,x86-64) => /lib64/libnvidia-fbc.so.1
libnvidia-fbc.so.1 (libc6) => /lib/libnvidia-fbc.so.1
libnvidia-fbc.so (libc6,x86-64) => /lib64/libnvidia-fbc.so
libnvidia-fbc.so (libc6) => /lib/libnvidia-fbc.so
libnvidia-fatbinaryloader.so.375.26 (libc6,x86-64) => /lib64/libnvidia-fatbinaryloader.so.375.26
libnvidia-fatbinaryloader.so.375.26 (libc6) => /lib/libnvidia-fatbinaryloader.so.375.26
libnvidia-encode.so.1 (libc6,x86-64) => /lib64/libnvidia-encode.so.1
libnvidia-encode.so.1 (libc6) => /lib/libnvidia-encode.so.1
libnvidia-encode.so (libc6,x86-64) => /lib64/libnvidia-encode.so
libnvidia-encode.so (libc6) => /lib/libnvidia-encode.so
libnvidia-eglcore.so.375.26 (libc6,x86-64) => /lib64/libnvidia-eglcore.so.375.26
libnvidia-eglcore.so.375.26 (libc6) => /lib/libnvidia-eglcore.so.375.26
libnvidia-egl-wayland.so.375.26 (libc6,x86-64) => /lib64/libnvidia-egl-wayland.so.375.26
libnvidia-compiler.so.375.26 (libc6,x86-64) => /lib64/libnvidia-compiler.so.375.26
libnvidia-compiler.so.375.26 (libc6) => /lib/libnvidia-compiler.so.375.26
libnvidia-cfg.so.1 (libc6,x86-64) => /lib64/libnvidia-cfg.so.1
libnvidia-cfg.so (libc6,x86-64) => /lib64/libnvidia-cfg.so
libGLX_nvidia.so.0 (libc6,x86-64) => /lib64/libGLX_nvidia.so.0
libGLX_nvidia.so.0 (libc6) => /lib/libGLX_nvidia.so.0
libGLESv2_nvidia.so.2 (libc6,x86-64) => /lib64/libGLESv2_nvidia.so.2
libGLESv2_nvidia.so.2 (libc6) => /lib/libGLESv2_nvidia.so.2
libGLESv1_CM_nvidia.so.1 (libc6,x86-64) => /lib64/libGLESv1_CM_nvidia.so.1
libGLESv1_CM_nvidia.so.1 (libc6) => /lib/libGLESv1_CM_nvidia.so.1
libEGL_nvidia.so.0 (libc6,x86-64) => /lib64/libEGL_nvidia.so.0
libEGL_nvidia.so.0 (libc6) => /lib/libEGL_nvidia.so.0
is there any clue?
(+) Still, if I don't use --privileged=true option, it says it can't find libnvidia-ml.so library
Also, CUDA is working, instead of OpenCL.
It's probably the selinux problem indeed, so you should check back the other GitHub issue.
I solved selinux problem and now I can run without --privileged option
however, opencl is still not working. It just say ".../libOpenCL.so.1: no version information available"
I found without this case, there are many google results about this issue and many of them solved just by install again, but as I use nvidia-docker, I may not install new opencl.
or should I try apt-get nvidia-opencl-icd...?
Actually I want to build something like cuda-aware mpi app, so want to run GPU for each docker image using MPI. (in fact, since CUDA is working, I'm trying to build an app with using cuda now)
For OpenCL, you can look at the Dockerfile we used to have:
https://github.com/NVIDIA/nvidia-docker/blob/v1.0.0/ubuntu-14.04/opencl/runtime/Dockerfile
But we don't publish an OpenCL image on DockerHub.
It works! Thank you very much.
Most helpful comment
I found this can be solved with --privileged=true option, though I still having trouble with opencl