I've followed the instructions [https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup] to register the runtime but still cannot set runtime default to 'nvidia'. I stop docker.service, and run 'sudo dockerd --default-runtime=nvidia &' which sets my runtime default to 'nvidia' but when I try to restart the service, it fails.
Please help!
Do you need GPU support during docker build? If not, you can just use docker build.
With version 1.0, nvidia-docker build was not doing anything special.
Sounds good. I’ll give it a try. Primarily, I thought nvidia-docker provided gpu passthru support for building Caffe with GPU, which I do see on the build logs.
You don't need to have a GPU machine to build a GPU project. The compiler (nvcc) doesn't need to run GPU code, it only needs to know which GPU families you will target.
Actually I think it's really important the have runtime support for docker build.
The reason is for testing:
If we wanna run unit tests after compiling GPU-related tools, we'll have to get GPU access somehow.
This is really quite important. Many tools require the presence of hardware to be configured correctly.
Please, either fix this, or provide us with a workaround for building with tools, that refuse to compile without libcuda etc.
Set the default runtime to NVIDIA
Set the default runtime to NVIDIA
I don't have access to /etc/docker/daemon.json on the system. I am assuming there is no 'per-user' default for this, since it's a daemon setting. Am I missing something?
I ran into this same issue trying to compile something that uses tensorflow in a xenial-based image. tensorflow was complaining:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
I was able to get my docker builds to work by setting the default runtime as @RenaudWasTaken suggested. I didn't really know how to do this until I googled around figuring it out. Perhaps this may help others:
/etc/docker/daemon.json with the below content:{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
nvidia-container-runtime package. I had followed the instructions here, but it seems nvidia-container-runtime isn't installed by default.sudo apt-get install nvidia-container-runtime
sudo systemctl restart docker.servicedocker build again.Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime
Another solution if your docker build is just doing compilation is to use the stubs in /usr/local/cuda/lib64/stubs/
I ran into this same issue trying to compile something that uses tensorflow in a xenial-based image. tensorflow was complaining:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory Failed to load the native TensorFlow runtime.I was able to get my
docker builds to work by setting the default runtime as @RenaudWasTaken suggested. I didn't really know how to do this until I googled around figuring it out. Perhaps this may help others:
- Edit/create the
/etc/docker/daemon.jsonwith the below content:{ "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia" }
- Install
nvidia-container-runtimepackage. I had followed the instructions here, but it seemsnvidia-container-runtimeisn't installed by default.sudo apt-get install nvidia-container-runtime
sudo systemctl restart docker.service- Try your
docker buildagain.Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime
mark!
@z13974509906 the recommended path is to build CUDA code during docker build time and run CUDA code during docker run time :)
You wouldn't need libcuda.so in that case and can use the stubs at build time.
To build using the stubs, you need to make the stubs path known to the linker. One option is to add the path to the LIBRARY_PATH environmental variable. (LD_LIBRARY_PATH is for runtime linking, whereas LIBRARY_PATH is used for compile time linking). Example:
ENV LIBRARY_PATH $LIBRARY_PATH:/usr/local/cuda/lib64/stubs
I ran into this same issue trying to compile something that uses tensorflow in a xenial-based image. tensorflow was complaining:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory Failed to load the native TensorFlow runtime.I was able to get my
docker builds to work by setting the default runtime as @RenaudWasTaken suggested. I didn't really know how to do this until I googled around figuring it out. Perhaps this may help others:
- Edit/create the
/etc/docker/daemon.jsonwith the below content:{ "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia" }
- Install
nvidia-container-runtimepackage. I had followed the instructions here, but it seemsnvidia-container-runtimeisn't installed by default.sudo apt-get install nvidia-container-runtime
sudo systemctl restart docker.service- Try your
docker buildagain.Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime
This solved my problem where torch.cuda.is_available() returns False.
@icolwell-as
...
sudo systemctl restart docker.service- Try your
docker buildagain.Related Links:
https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime
You don't need to restart the daemon, sudo killall -s HUP dockerd is usually enough.
Unlike it sounds, it won't kill anything.
It will send SIGHUP to dockerd and signal handler will reload the config json.
Most helpful comment
Actually I think it's really important the have runtime support for docker build.
The reason is for testing:
If we wanna run unit tests after compiling GPU-related tools, we'll have to get GPU access somehow.