I have same problem like this #174
It seem to be resolved in nvidia-docker2, but I can't work successfully
I just install nvidia-docker with
apt install nvidia-docker2
pull the container I need
nvidia-docker pull nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
run it
nvidia-docker run -it <image> /bin/bash
install python from apt in container
commit container to another repository
nvidia-docker commit <container> nvtest
save to tar
nvidia-docker save -o nvtest.tar nvtest
remove the nvtest in repository and runtime container
import tar from nvtest.tar
nvidia-docker import nvtest.tar nvtest
run command
docker run --runtime=nvidia --rm a81cce78e47c nvidia-smi
and output is
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown.
change nvidia-smi to /bin/bash has same output
and I try to use another method to export container
export container
nvidia-docker export -o nvtest.tgz b93807e16849
import tar from nvtest.tar
cat nvtest.tgz | nvidia-docker import - nvtest2
and I can use bash in container now !
but, the nvcc is not in $PATH (still in /usr/local/cuda/bin)
and when I use ldconfig , the output is
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.384.130 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.384.130 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.384.130 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.384.130 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.384.130 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.384.130 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.1 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.384.130 is empty, not checked.
/usr/lib/x86_64-linux-gnu/libcuda.so.384.130 is 0 byte file now @@...
uname -aLinux yichiun-ubuntu1604 4.13.0-43-generic #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
dmesgnvidia-smi -a==============NVSMI LOG==============
Timestamp : Fri Jun 8 15:34:54 2018
Driver Version : 384.130Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : GeForce GTX 1060 6GB
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
......
docker versionClient:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:17:20 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarmServer:
Engine:
Version: 18.03.1-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:15:30 2018
OS/Arch: linux/amd64
Experimental: false
dpkg -l '*nvidia*' _or_ rpm -qa '*nvidia*'nvidia-container-runtime 2.0.0+docker18.03.1-1 amd64 NVIDIA container runtime
nvidia-container-runtime-hook 1.3.0-1 amd64 NVIDIA container runtime hook
nvidia-docker(no description available)
nvidia-docker2 2.0.3+docker18.03.1-1 all nvidia-docker CLI wrapper
nvidia-container-cli -Vversion: 1.0.0
build date: 2018-04-26T22:53+00:00
build revision: 163054a04b21c4455c8cae7e47873d9f2a091f55
build compiler: gcc-5 5.4.0 20160609
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
nvidia-docker save -o nvtest.tar nvtest
nvidia-docker import nvtest.tar nvtest
This is wrong, after a docker save, you need to docker load. Not docker import.
I'm surprised it works, but it's clearly not doing what you expect.
For your second problem, doing docker export will drop all the environment variables.
But you need those environment variables to trigger GPU support:
https://github.com/nvidia/nvidia-container-runtime#environment-variables-oci-spec
A fix would be:
cat nvtest.tgz | docker import --change "ENV NVIDIA_VISIBLE_DEVICES=all" - nvtest2
Most helpful comment
This is wrong, after a
docker save, you need todocker load. Notdocker import.I'm surprised it works, but it's clearly not doing what you expect.
For your second problem, doing
docker exportwill drop all the environment variables.But you need those environment variables to trigger GPU support:
https://github.com/nvidia/nvidia-container-runtime#environment-variables-oci-spec
A fix would be: