Nvidia-docker: building images with nvidia-docker

Created on 5 Jan 2018  Â·  15Comments  Â·  Source: NVIDIA/nvidia-docker

1. Building images with nvidia-docker

2. In the past, I am able to just call nvidia-docker build. With version 2 requiring '--runtime' flag, Docker does not recognize it (although it works just fine as 'docker run --runtime'). I have not seen anything in the documentation regarding building with nvidia-docker version 2. Please advise.

work as intended

Most helpful comment

Actually I think it's really important the have runtime support for docker build.

The reason is for testing:
If we wanna run unit tests after compiling GPU-related tools, we'll have to get GPU access somehow.

All 15 comments

I've followed the instructions [https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup] to register the runtime but still cannot set runtime default to 'nvidia'. I stop docker.service, and run 'sudo dockerd --default-runtime=nvidia &' which sets my runtime default to 'nvidia' but when I try to restart the service, it fails.

Please help!

Do you need GPU support during docker build? If not, you can just use docker build.

With version 1.0, nvidia-docker build was not doing anything special.

Sounds good. I’ll give it a try. Primarily, I thought nvidia-docker provided gpu passthru support for building Caffe with GPU, which I do see on the build logs.

You don't need to have a GPU machine to build a GPU project. The compiler (nvcc) doesn't need to run GPU code, it only needs to know which GPU families you will target.

Actually I think it's really important the have runtime support for docker build.

The reason is for testing:
If we wanna run unit tests after compiling GPU-related tools, we'll have to get GPU access somehow.

This is really quite important. Many tools require the presence of hardware to be configured correctly.
Please, either fix this, or provide us with a workaround for building with tools, that refuse to compile without libcuda etc.

Set the default runtime to NVIDIA

Set the default runtime to NVIDIA

I don't have access to /etc/docker/daemon.json on the system. I am assuming there is no 'per-user' default for this, since it's a daemon setting. Am I missing something?

I ran into this same issue trying to compile something that uses tensorflow in a xenial-based image. tensorflow was complaining:

ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

I was able to get my docker builds to work by setting the default runtime as @RenaudWasTaken suggested. I didn't really know how to do this until I googled around figuring it out. Perhaps this may help others:

  1. Edit/create the /etc/docker/daemon.json with the below content:
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}
  1. Install nvidia-container-runtime package. I had followed the instructions here, but it seems nvidia-container-runtime isn't installed by default.
sudo apt-get install nvidia-container-runtime
  1. sudo systemctl restart docker.service
  2. Try your docker build again.

Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime

Another solution if your docker build is just doing compilation is to use the stubs in /usr/local/cuda/lib64/stubs/

I ran into this same issue trying to compile something that uses tensorflow in a xenial-based image. tensorflow was complaining:

ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

I was able to get my docker builds to work by setting the default runtime as @RenaudWasTaken suggested. I didn't really know how to do this until I googled around figuring it out. Perhaps this may help others:

  1. Edit/create the /etc/docker/daemon.json with the below content:
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}
  1. Install nvidia-container-runtime package. I had followed the instructions here, but it seems nvidia-container-runtime isn't installed by default.
sudo apt-get install nvidia-container-runtime
  1. sudo systemctl restart docker.service
  2. Try your docker build again.

Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime

mark!

@z13974509906 the recommended path is to build CUDA code during docker build time and run CUDA code during docker run time :)

You wouldn't need libcuda.so in that case and can use the stubs at build time.

To build using the stubs, you need to make the stubs path known to the linker. One option is to add the path to the LIBRARY_PATH environmental variable. (LD_LIBRARY_PATH is for runtime linking, whereas LIBRARY_PATH is used for compile time linking). Example:

ENV LIBRARY_PATH $LIBRARY_PATH:/usr/local/cuda/lib64/stubs

I ran into this same issue trying to compile something that uses tensorflow in a xenial-based image. tensorflow was complaining:

ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

I was able to get my docker builds to work by setting the default runtime as @RenaudWasTaken suggested. I didn't really know how to do this until I googled around figuring it out. Perhaps this may help others:

  1. Edit/create the /etc/docker/daemon.json with the below content:
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}
  1. Install nvidia-container-runtime package. I had followed the instructions here, but it seems nvidia-container-runtime isn't installed by default.
sudo apt-get install nvidia-container-runtime
  1. sudo systemctl restart docker.service
  2. Try your docker build again.

Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime

This solved my problem where torch.cuda.is_available() returns False.

@icolwell-as

...

  1. sudo systemctl restart docker.service
  2. Try your docker build again.

Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime

You don't need to restart the daemon, sudo killall -s HUP dockerd is usually enough.
Unlike it sounds, it won't kill anything.
It will send SIGHUP to dockerd and signal handler will reload the config json.

Was this page helpful?
0 / 5 - 0 ratings