Nvidia-docker: Use nvidia-smi in Dockerfile

Created on 20 Oct 2016 · 8Comments · Source: NVIDIA/nvidia-docker

Hello,

I would like to call nvidia-smi in Dockerfile, but docker building fails. My Dockerfile:
FROM nvidia/cuda:7.5-cudnn5-devel
RUN nvidia-smi
CMD /bin/bash

I am using building command: nvidia-docker build -t gpu ., but error message is displayed:
/bin/sh: 1: nvidia-smi: not found

When I build another docker image based on nvidia/cuda:7.5-cudnn5-devel and run container using such image, command nvidia-smi works. It seems nvidia GPU and its libraries are not available during docker image building.

Could you help me?

work as intended

Source

Josca

Most helpful comment

It seems nvidia GPU and its libraries are not available during docker image building.

This is correct, the driver files (libraries and binaries) are mounted from the host (using a Docker volume) when the container is started.
When doing a docker build, there is a limited set of options for the build environment: you can't import devices, you can't change the network setting.
Note that nvidia-docker is passthrough to docker for docker build, this is documented here

But this shouldn't be an issue, you don't need to actually a GPU in your system in order to compile CUDA code. You can install the nvcc toolchain on any machine and compile your code, and then during deployment you do need a machine with a GPU and you use nvidia-docker.

flx42 on 20 Oct 2016

👍4

All 8 comments

It seems nvidia GPU and its libraries are not available during docker image building.

flx42 on 20 Oct 2016

👍4

@Josca: does that answer your question, can we close this?

flx42 on 31 Oct 2016

Hi,

I've had problems with missing libraries/directories during a build and arrived at this issue. My exact issue is that I ran a ldconfig during a build and NVIDIA's libraries didn't make it to the loader cache because their directories aren't mounted at build time. Steps to reproduce:

Minimal Dockerfile:

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
RUN ldconfig -v | grep nvidia || true

Output of nvidia-docker build .:

Sending build context to Docker daemon 2.048 kB
Step 1 : FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
 ---> 0e44f0afa846
Step 2 : RUN ldconfig -v | grep nvidia || true
 ---> Running in 2548bd9799b6
/sbin/ldconfig.real: Can't stat /usr/local/cuda/lib: No such file or directory
/sbin/ldconfig.real: Path `/usr/local/cuda/lib64' given more than once
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib64: No such file or directory
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring

 ---> 4e3b5f01e1cb
Removing intermediate container 2548bd9799b6
Successfully built 4e3b5f01e1cb

Libraries are found when ldconfig -v | grep nvidia is executed in a running container:

root@c6a487836b23:/# ldconfig -v | grep nvidia
/sbin/ldconfig.real: Can't stat /usr/local/cuda/lib: No such file or directory
/sbin/ldconfig.real: Path `/usr/local/cuda/lib64' given more than once
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/usr/local/nvidia/lib:
        libnvidia-ptxjitcompiler.so.375.10 -> libnvidia-ptxjitcompiler.so.375.10
        libnvidia-eglcore.so.375.10 -> libnvidia-eglcore.so.375.10
        libnvidia-ml.so.1 -> libnvidia-ml.so.375.10
        libnvidia-fatbinaryloader.so.375.10 -> libnvidia-fatbinaryloader.so.375.10
        libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.375.10
        libnvidia-tls.so.375.10 -> libnvidia-tls.so.375.10
        libnvidia-fbc.so.1 -> libnvidia-fbc.so.375.10
        libnvidia-glcore.so.375.10 -> libnvidia-glcore.so.375.10
        libEGL_nvidia.so.0 -> libEGL_nvidia.so.375.10
        libnvidia-encode.so.1 -> libnvidia-encode.so.375.10
        libGLX_nvidia.so.0 -> libGLX_nvidia.so.375.10
        libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.375.10
        libnvidia-ifr.so.1 -> libnvidia-ifr.so.375.10
        libnvidia-glsi.so.375.10 -> libnvidia-glsi.so.375.10
/usr/local/nvidia/lib64:
        libnvidia-ptxjitcompiler.so.375.10 -> libnvidia-ptxjitcompiler.so.375.10
        libnvidia-eglcore.so.375.10 -> libnvidia-eglcore.so.375.10
        libnvidia-compiler.so.375.10 -> libnvidia-compiler.so.375.10
        libnvidia-ml.so.1 -> libnvidia-ml.so.375.10
        libnvidia-fatbinaryloader.so.375.10 -> libnvidia-fatbinaryloader.so.375.10
        libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.375.10
        libnvidia-tls.so.375.10 -> libnvidia-tls.so.375.10
        libnvidia-fbc.so.1 -> libnvidia-fbc.so.375.10
        libnvidia-glcore.so.375.10 -> libnvidia-glcore.so.375.10
        libnvidia-opencl.so.1 -> libnvidia-opencl.so.375.10
        libEGL_nvidia.so.0 -> libEGL_nvidia.so.375.10
        libnvidia-encode.so.1 -> libnvidia-encode.so.375.10
        libGLX_nvidia.so.0 -> libGLX_nvidia.so.375.10
        libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.375.10
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring

        libnvidia-ifr.so.1 -> libnvidia-ifr.so.375.10
        libnvidia-glsi.so.375.10 -> libnvidia-glsi.so.375.10

Furthermore, there are a few minor annoyances, such as a Automatic GPU detection failed. Building for all known architectures message when using CMake with CUDA.

@flx42, Is there a technical limitation disallowing NVIDIA and CUDA related directories being mounted at build time? IMHO it would be better to have them if possible.

tiagoshibata on 8 Nov 2016

Yes, there are some minor annoyances, but it's better this way. You want the build to be reproducible, and you don't need a GPU or driver files to compile code. You're not supposed to execute computations or tests in a docker build, GPU or CPU.
Yes, there are technical limitations, you can't mount volumes or devices at build time.

flx42 on 9 Nov 2016

OK, thanks for the fast response.

tiagoshibata on 9 Nov 2016

We at Graphistry have found that we need various libraries in the environment path when running GPU code (specifically, running via OpenCL) in nvidia-docker.

We've been learning Graphistry and docker as we go, so we're always open to suggestions, but what we've done is to add the paths to libraries in our environment, to compensate for the shell not having these by default.

Our https://hub.docker.com/r/graphistry/gpu-base/ stock GPU container is built with an environment that gets amended via https://github.com/graphistry/infrastructure/blob/master/container-images/gpu-base/Dockerfile#L10 (this is how I'd do it now, as I've since learned), but that's how we find things work in production for us. Suggestions welcome :)

lsb on 9 Nov 2016

@lsb I don't think I follow, this thread is about the driver libraries at build time, not the CUDA binaries/libraries. The CUDA toolkit is always present at build time, and you don't need a GPU or the NVIDIA driver to compile/build.

By the way, we already set those variables in the environment/ld.cache:
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/devel/Dockerfile#L27
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L29-L31
https://github.com/NVIDIA/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/Dockerfile#L36

flx42 on 9 Nov 2016

Ah, thank you

lsb on 10 Nov 2016

Was this page helpful?

0 / 5 - 0 ratings