Hello,
I would like to call nvidia-smi in Dockerfile, but docker building fails. My Dockerfile:
FROM nvidia/cuda:7.5-cudnn5-devel
RUN nvidia-smi
CMD /bin/bash
I am using building command: nvidia-docker build -t gpu ., but error message is displayed:
/bin/sh: 1: nvidia-smi: not found
When I build another docker image based on nvidia/cuda:7.5-cudnn5-devel and run container using such image, command nvidia-smi works. It seems nvidia GPU and its libraries are not available during docker image building.
Could you help me?
It seems nvidia GPU and its libraries are not available during docker image building.
This is correct, the driver files (libraries and binaries) are mounted from the host (using a Docker volume) when the container is started.
When doing a docker build, there is a limited set of options for the build environment: you can't import devices, you can't change the network setting.
Note that nvidia-docker is passthrough to docker for docker build, this is documented here
But this shouldn't be an issue, you don't need to actually a GPU in your system in order to compile CUDA code. You can install the nvcc toolchain on any machine and compile your code, and then during deployment you do need a machine with a GPU and you use nvidia-docker.
@Josca: does that answer your question, can we close this?
Hi,
I've had problems with missing libraries/directories during a build and arrived at this issue. My exact issue is that I ran a ldconfig during a build and NVIDIA's libraries didn't make it to the loader cache because their directories aren't mounted at build time. Steps to reproduce:
Minimal Dockerfile:
FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
RUN ldconfig -v | grep nvidia || true
Output of nvidia-docker build .:
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
---> 0e44f0afa846
Step 2 : RUN ldconfig -v | grep nvidia || true
---> Running in 2548bd9799b6
/sbin/ldconfig.real: Can't stat /usr/local/cuda/lib: No such file or directory
/sbin/ldconfig.real: Path `/usr/local/cuda/lib64' given more than once
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib64: No such file or directory
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring
---> 4e3b5f01e1cb
Removing intermediate container 2548bd9799b6
Successfully built 4e3b5f01e1cb
Libraries are found when ldconfig -v | grep nvidia is executed in a running container:
root@c6a487836b23:/# ldconfig -v | grep nvidia
/sbin/ldconfig.real: Can't stat /usr/local/cuda/lib: No such file or directory
/sbin/ldconfig.real: Path `/usr/local/cuda/lib64' given more than once
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/usr/local/nvidia/lib:
libnvidia-ptxjitcompiler.so.375.10 -> libnvidia-ptxjitcompiler.so.375.10
libnvidia-eglcore.so.375.10 -> libnvidia-eglcore.so.375.10
libnvidia-ml.so.1 -> libnvidia-ml.so.375.10
libnvidia-fatbinaryloader.so.375.10 -> libnvidia-fatbinaryloader.so.375.10
libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.375.10
libnvidia-tls.so.375.10 -> libnvidia-tls.so.375.10
libnvidia-fbc.so.1 -> libnvidia-fbc.so.375.10
libnvidia-glcore.so.375.10 -> libnvidia-glcore.so.375.10
libEGL_nvidia.so.0 -> libEGL_nvidia.so.375.10
libnvidia-encode.so.1 -> libnvidia-encode.so.375.10
libGLX_nvidia.so.0 -> libGLX_nvidia.so.375.10
libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.375.10
libnvidia-ifr.so.1 -> libnvidia-ifr.so.375.10
libnvidia-glsi.so.375.10 -> libnvidia-glsi.so.375.10
/usr/local/nvidia/lib64:
libnvidia-ptxjitcompiler.so.375.10 -> libnvidia-ptxjitcompiler.so.375.10
libnvidia-eglcore.so.375.10 -> libnvidia-eglcore.so.375.10
libnvidia-compiler.so.375.10 -> libnvidia-compiler.so.375.10
libnvidia-ml.so.1 -> libnvidia-ml.so.375.10
libnvidia-fatbinaryloader.so.375.10 -> libnvidia-fatbinaryloader.so.375.10
libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.375.10
libnvidia-tls.so.375.10 -> libnvidia-tls.so.375.10
libnvidia-fbc.so.1 -> libnvidia-fbc.so.375.10
libnvidia-glcore.so.375.10 -> libnvidia-glcore.so.375.10
libnvidia-opencl.so.1 -> libnvidia-opencl.so.375.10
libEGL_nvidia.so.0 -> libEGL_nvidia.so.375.10
libnvidia-encode.so.1 -> libnvidia-encode.so.375.10
libGLX_nvidia.so.0 -> libGLX_nvidia.so.375.10
libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.375.10
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring
libnvidia-ifr.so.1 -> libnvidia-ifr.so.375.10
libnvidia-glsi.so.375.10 -> libnvidia-glsi.so.375.10
Furthermore, there are a few minor annoyances, such as a Automatic GPU detection failed. Building for all known architectures message when using CMake with CUDA.
@flx42, Is there a technical limitation disallowing NVIDIA and CUDA related directories being mounted at build time? IMHO it would be better to have them if possible.
Yes, there are some minor annoyances, but it's better this way. You want the build to be reproducible, and you don't need a GPU or driver files to compile code. You're not supposed to execute computations or tests in a docker build, GPU or CPU.
Yes, there are technical limitations, you can't mount volumes or devices at build time.
OK, thanks for the fast response.
We at Graphistry have found that we need various libraries in the environment path when running GPU code (specifically, running via OpenCL) in nvidia-docker.
We've been learning Graphistry and docker as we go, so we're always open to suggestions, but what we've done is to add the paths to libraries in our environment, to compensate for the shell not having these by default.
Our https://hub.docker.com/r/graphistry/gpu-base/ stock GPU container is built with an environment that gets amended via https://github.com/graphistry/infrastructure/blob/master/container-images/gpu-base/Dockerfile#L10 (this is how I'd do it now, as I've since learned), but that's how we find things work in production for us. Suggestions welcome :)
@lsb I don't think I follow, this thread is about the driver libraries at build time, not the CUDA binaries/libraries. The CUDA toolkit is always present at build time, and you don't need a GPU or the NVIDIA driver to compile/build.
By the way, we already set those variables in the environment/ld.cache:
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/devel/Dockerfile#L27
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L29-L31
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L36
Ah, thank you
Most helpful comment
This is correct, the driver files (libraries and binaries) are mounted from the host (using a Docker volume) when the container is started.
When doing a
docker build, there is a limited set of options for the build environment: you can't import devices, you can't change the network setting.Note that
nvidia-dockeris passthrough todockerfordocker build, this is documented hereBut this shouldn't be an issue, you don't need to actually a GPU in your system in order to compile CUDA code. You can install the nvcc toolchain on any machine and compile your code, and then during deployment you do need a machine with a GPU and you use
nvidia-docker.