I met a interesting issue when I used nvidia-docker on centOS system,
when I run
nvidia-docker run --rm docker.io/nvidia/cuda:10.0-base-ubuntu16.04 nvidia-smi
I get
container_linux.go:247: starting container process caused "exec: "nvidia-smi": executable file not found in $PATH"
/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: "nvidia-smi": executable file not found in $PATH".
but when I run
nvidia-docker run --rm docker.io/nvidia/cuda:9.0-base-ubuntu16.04 nvidia-smi
everything is ok
Fri Feb 15 11:29:52 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A |
| 23% 31C P2 53W / 250W | 1605MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A |
| 23% 23C P8 15W / 250W | 1315MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:08:00.0 Off | N/A |
| 23% 23C P8 15W / 250W | 1315MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A |
| 23% 28C P8 15W / 250W | 1315MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 108... Off | 00000000:85:00.0 Off | N/A |
| 23% 27C P8 15W / 250W | 1315MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX 108... Off | 00000000:86:00.0 Off | N/A |
| 23% 25C P8 15W / 250W | 1315MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce GTX 108... Off | 00000000:89:00.0 Off | N/A |
| 23% 27C P8 15W / 250W | 1315MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX 108... Off | 00000000:8A:00.0 Off | N/A |
| 23% 23C P8 15W / 250W | 1315MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
why???
Which version of nvidia-docker are you using?
At the moment, the cuda:10-base image is incompatible with the original version of nvidia-docker. You will need to upgrade to nvidia-docker2 in order to run containers based on the cuda:10-base images.
The underlying cause for this is the removal of setting the LD_LIBRARY_PATH in the base image, as it is no longer necessary with nvidia-docker2. With the original version of nvidia-docker, the requirement to set LD_LIBRARY_PATH in the base image tended to cause many problems / confusion due to people overriding this variable (either in their own docker file or at runtime) and causing things to break. Setting this variable was deprecated in cuda:9 and removed in cuda:10.
From the official wiki page for nvidia-docker:
https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#why-arent-cuda-10-images-working-with-nvidia-docker-v1
@klueska Thanks for your relevant suggestion, after I update to nvidia-docker2, everything goes well.
Most helpful comment
Which version of
nvidia-dockerare you using?At the moment, the
cuda:10-baseimage is incompatible with the original version ofnvidia-docker. You will need to upgrade tonvidia-docker2in order to run containers based on thecuda:10-baseimages.The underlying cause for this is the removal of setting the
LD_LIBRARY_PATHin the base image, as it is no longer necessary withnvidia-docker2. With the original version ofnvidia-docker, the requirement to setLD_LIBRARY_PATHin the base image tended to cause many problems / confusion due to people overriding this variable (either in their own docker file or at runtime) and causing things to break. Setting this variable was deprecated incuda:9and removed incuda:10.From the official wiki page for
nvidia-docker:https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#why-arent-cuda-10-images-working-with-nvidia-docker-v1