I'm not very familiar with docker volume, but it appears to be ONLY good for one use.
sudo ./nvidia-docker volume setup
nvidia_driver_352.55
docker volume ls
DRIVER VOLUME NAME
local nvidia_driver_352.55
./nvidia-docker run --rm nvidia/cuda nvidia-smi```
+------------------------------------------------------+
| NVIDIA-SMI 352.55 Driver Version: 352.55 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 Off | 0000:01:00.0 N/A | N/A |
| 34% 54C P8 N/A / N/A | 653MiB / 4093MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 580 Off | 0000:02:00.0 N/A | N/A |
| 46% 54C P12 N/A / N/A | 7MiB / 3071MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20c Off | 0000:03:00.0 Off | Off |
| 37% 49C P0 48W / 225W | 96MiB / 5119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+
```
./nvidia-docker run --rm nvidia/cuda nvidia-smi
Error response from daemon: Error looking up volume plugin nvidia-docker: Plugin not found
docker volume ls
DRIVER VOLUME NAME
Tested on Ubuntu 14.04 running Docker 1.91 and Centos 7 running docker 1.9.0
It just seems like if sudo nvidia-docker volume setup is in the "Initial setup" section, than it shouldn't need to be run every time I create a new container, or am I missing something?
Yes this is one of our limitations documented here
This is due to --rm removing the volumes attached to a container (equivalent to docker rm -v).
Here is the corresponding Docker issue: https://github.com/docker/docker/issues/17907
A workaround would be to change volume setup to create a data container referencing the volume.
I'm not thrilled by this solution though...
I'm actually a little bit of a fan of the data container idea.
/var/lib/docker/volumes/foo/_data, or more mount magic... But if you are doing hard links in there, that suggests to me it's more of a normal directory. At any rate, I remember hearing from a security stand point, it's better to rely on a data container than direct mounting to your host devices. Some people may care about thatI was already playing with this idea when you mentioned it, it seems to work well to me :)
I used a Makefile with
install:
docker build -t nvidia_driver -f Dockerfile_nvidia_driver .
if docker inspect nvidia_driver_${NVIDIA_VERSION} > /dev/null 2>&1; then \
docker rm nvidia_driver_${NVIDIA_VERSION}; \
fi
docker run -v /usr/bin:/hostbin:ro -v /usr/lib64/nvidia:/hostlib64 --name nvidia_driver_${NVIDIA_VERSION} nvidia_driver
run:
docker run -it --rm \
--volumes-from nvidia_driver_${NVIDIA_VERSION}:ro \
$$(ls /dev/nvidia* | sed 's|^|--device |') \
cuda_example
And a Dockerfile_nvidia_driver of
FROM centos:7
VOLUME /usr/local/nvidia
CMD mkdir -p /usr/local/nvidia/bin && \
cp -a /hostbin/nvidia* /usr/local/nvidia/bin/ && \
cp -ra /hostlib64 /usr/local/nvidia/lib64
Sorry it's a little messy, but it was just a quick poc to prove to myself it would work
Data containers and volumes are exactly the same thing under the hood. Using volumes directly makes more sense because that's where Docker is headed with persistent volumes and the new volume CLI. It also keeps things unified between the standalone version and the plugin version (i.e. nvidia-docker standalone uses a local driver).
Creating an image and a container for the sake of having a volume referenced is not ideal. Besides, you still have to make sure that the container is not deleted.
This, really is a Docker issue and will be fixed upstream eventually. In the meantime I suggest you run your container without --rm, or use nvidia-docker-plugin.
If you really want to lock the volume with a data container, it's just a matter of doing:
volume="$(sudo nvidia-docker volume setup)"
nvidia-docker create --name=LOCK -v $volume:/data:ro tianon/true
nvidia-docker run --rm nvidia/cuda nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi
Regarding copy vs hardlink, we chose to do so to keep the ecosystem as light as possible. Copying around MB of driver files in order to launch a container is not an option.
Closing since it's an issue with upstream Docker.
I updated the documentation accordingly.
Fixed in Docker 1.10, the documentation has been updated.
Just a notice, I've run the install instruction from README and tried to test, it failed with error:
docker: Error response from daemon: create nvidia_driver_361.28: create nvidia_driver_361.28: Error looking up volume plugin nvidia-docker: plugin not found.
The solution was:
sudo ./nvidia-docker volume setup
Installed version: nvidia-docker_1.0.0.beta.3-1_amd64.deb
Are you running Ubuntu? If so, can you show me the output of:
cat /var/log/upstart/nvidia-docker.log
hi @3XX0, similar problem as @orian.
I am running Ubuntu 14.04.
Install follows the instructions on the wiki
# Install nvidia-docker and nvidia-docker-plugin
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-beta.3/nvidia-docker_1.0.0.beta.3-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker_1.0.0.beta.3-1_amd64.deb && rm /tmp/nvidia-docker*.deb
and when I test it,
# Test nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi
it give the the error (which take me to this issue)
Error response from daemon: Error looking up volume plugin nvidia-docker: Plugin Error: Plugin.Activate, 400 Bad Request: malformed Host header
My nvidia-docker.log looks like this
$ sudo cat /var/log/upstart/nvidia-docker.log
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:41 Loading NVIDIA management library
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:41 Loading NVIDIA unified memory
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:41 Discovering GPU devices
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:43 Provisioning volumes at /var/lib/nvidia-docker/volumes
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:43 Serving plugin API at /var/lib/nvidia-docker
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:43 Serving remote API at localhost:3476
@guoquan see #83
I had this error after upgrading my nvida-driver to the latest version (wanted to use cuda 8):
"nvidia-docker run --rm nvidia/cuda nvidia-smi
^[[Adocker: Error response from daemon: create nvidia_driver_367.44: create nvidia_driver_367.44: Error looking up volume plugin nvidia-docker: plugin not found.
See 'docker run --help'.
."
Running on centos 7.
After a reboot, upgrading docker and nvida-docker-plugin and another reboot i realised that the plugin wasn't running.
sudo systemctl start nvidia-docker
fixed my issues
Running on AWS ami linux using nvidia-docker fails to initially launch container nvidia/cuda:7.5-devel
nvidia-docker run --rm nvidia/cuda:7.5-devel nvidia-smi
Error response from daemon: create nvidia_driver_352.99: Post http://%2Frun%2Fdocker%2Fplugins%2Fnvidia-docker.sock/VolumeDriver.Create: http: ContentLength=44 with Body length 0.
nvidia-docker volume ls
DRIVER VOLUME NAME
nvidia-docker nvidia_driver_352.99
Then when I try to launch the container again it succeeds.
Currently using docker version 1.11.2, build b9f10c9/1.11.2
cat /tmp/nvidia-docker.log
nvidia-docker-plugin | 2016/11/09 23:25:35 Loading NVIDIA unified memory
nvidia-docker-plugin | 2016/11/09 23:25:36 Loading NVIDIA management library
nvidia-docker-plugin | 2016/11/09 23:25:36 Discovering GPU devices
nvidia-docker-plugin | 2016/11/09 23:25:40 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2016/11/09 23:25:40 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2016/11/09 23:25:40 Serving remote API at localhost:3476
nvidia-docker-plugin | 2016/11/09 23:32:06 Received activate request
nvidia-docker-plugin | 2016/11/09 23:32:06 Plugins activated [VolumeDriver]
nvidia-docker-plugin | 2016/11/09 23:32:07 Received create request for volume 'nvidia_driver_352.99'
Most helpful comment
I had this error after upgrading my nvida-driver to the latest version (wanted to use cuda 8):
"nvidia-docker run --rm nvidia/cuda nvidia-smi
^[[Adocker: Error response from daemon: create nvidia_driver_367.44: create nvidia_driver_367.44: Error looking up volume plugin nvidia-docker: plugin not found.
See 'docker run --help'.
."
Running on centos 7.
After a reboot, upgrading docker and nvida-docker-plugin and another reboot i realised that the plugin wasn't running.
sudo systemctl start nvidia-dockerfixed my issues