Nvidia-docker: building images with nvidia-docker

Created on 5 Jan 2018 · 15Comments · Source: NVIDIA/nvidia-docker

1. Building images with nvidia-docker

2. In the past, I am able to just call nvidia-docker build. With version 2 requiring '--runtime' flag, Docker does not recognize it (although it works just fine as 'docker run --runtime'). I have not seen anything in the documentation regarding building with nvidia-docker version 2. Please advise.

work as intended

Source

cobie8a

Most helpful comment

Actually I think it's really important the have runtime support for docker build.

The reason is for testing:
If we wanna run unit tests after compiling GPU-related tools, we'll have to get GPU access somehow.

xkszltl on 19 Mar 2018

👍43

All 15 comments

I've followed the instructions [https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup] to register the runtime but still cannot set runtime default to 'nvidia'. I stop docker.service, and run 'sudo dockerd --default-runtime=nvidia &' which sets my runtime default to 'nvidia' but when I try to restart the service, it fails.

Please help!

cobie8a on 5 Jan 2018

Do you need GPU support during docker build? If not, you can just use docker build.

With version 1.0, nvidia-docker build was not doing anything special.

flx42 on 5 Jan 2018

👍1

Sounds good. I’ll give it a try. Primarily, I thought nvidia-docker provided gpu passthru support for building Caffe with GPU, which I do see on the build logs.

cobie8a on 6 Jan 2018

You don't need to have a GPU machine to build a GPU project. The compiler (nvcc) doesn't need to run GPU code, it only needs to know which GPU families you will target.

flx42 on 6 Jan 2018

Actually I think it's really important the have runtime support for docker build.

The reason is for testing:
If we wanna run unit tests after compiling GPU-related tools, we'll have to get GPU access somehow.

xkszltl on 19 Mar 2018

👍43

This is really quite important. Many tools require the presence of hardware to be configured correctly.
Please, either fix this, or provide us with a workaround for building with tools, that refuse to compile without libcuda etc.

RuRo on 27 Feb 2019

Set the default runtime to NVIDIA

RenaudWasTaken on 27 Feb 2019

👍5

Set the default runtime to NVIDIA

I don't have access to /etc/docker/daemon.json on the system. I am assuming there is no 'per-user' default for this, since it's a daemon setting. Am I missing something?

RuRo on 27 Feb 2019

I ran into this same issue trying to compile something that uses tensorflow in a xenial-based image. tensorflow was complaining:

ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

I was able to get my docker builds to work by setting the default runtime as @RenaudWasTaken suggested. I didn't really know how to do this until I googled around figuring it out. Perhaps this may help others:

Edit/create the /etc/docker/daemon.json with the below content:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

Install nvidia-container-runtime package. I had followed the instructions here, but it seems nvidia-container-runtime isn't installed by default.

sudo apt-get install nvidia-container-runtime

sudo systemctl restart docker.service
Try your docker build again.

icolwell-as on 9 Aug 2019

👍20 ❤1 🎉1

Another solution if your docker build is just doing compilation is to use the stubs in /usr/local/cuda/lib64/stubs/

RenaudWasTaken on 9 Aug 2019

I ran into this same issue trying to compile something that uses tensorflow in a xenial-based image. tensorflow was complaining:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.
I was able to get my docker builds to work by setting the default runtime as @RenaudWasTaken suggested. I didn't really know how to do this until I googled around figuring it out. Perhaps this may help others:

Edit/create the /etc/docker/daemon.json with the below content:
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}
Install nvidia-container-runtime package. I had followed the instructions here, but it seems nvidia-container-runtime isn't installed by default.
sudo apt-get install nvidia-container-runtime
sudo systemctl restart docker.service

Try your docker build again.

Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime

mark!

dancingpipi on 21 Oct 2019

🎉3

@z13974509906 the recommended path is to build CUDA code during docker build time and run CUDA code during docker run time :)

You wouldn't need libcuda.so in that case and can use the stubs at build time.

RenaudWasTaken on 21 Oct 2019

To build using the stubs, you need to make the stubs path known to the linker. One option is to add the path to the LIBRARY_PATH environmental variable. (LD_LIBRARY_PATH is for runtime linking, whereas LIBRARY_PATH is used for compile time linking). Example:

ENV LIBRARY_PATH $LIBRARY_PATH:/usr/local/cuda/lib64/stubs

kevindoran on 29 Jul 2020

I ran into this same issue trying to compile something that uses tensorflow in a xenial-based image. tensorflow was complaining:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.
I was able to get my docker builds to work by setting the default runtime as @RenaudWasTaken suggested. I didn't really know how to do this until I googled around figuring it out. Perhaps this may help others:

Edit/create the /etc/docker/daemon.json with the below content:
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}
Install nvidia-container-runtime package. I had followed the instructions here, but it seems nvidia-container-runtime isn't installed by default.
sudo apt-get install nvidia-container-runtime
sudo systemctl restart docker.service

Try your docker build again.

Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime

This solved my problem where torch.cuda.is_available() returns False.

wuyuanyi135 on 16 Nov 2020

@icolwell-as

...

sudo systemctl restart docker.service

Try your docker build again.

Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime

You don't need to restart the daemon, sudo killall -s HUP dockerd is usually enough.
Unlike it sounds, it won't kill anything.
It will send SIGHUP to dockerd and signal handler will reload the config json.

xkszltl on 16 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings