Nvidia-docker: Only works once

Created on 13 Jan 2016  路  11Comments  路  Source: NVIDIA/nvidia-docker

I'm not very familiar with docker volume, but it appears to be ONLY good for one use.

  1. sudo ./nvidia-docker volume setup

nvidia_driver_352.55

  1. docker volume ls

DRIVER VOLUME NAME local nvidia_driver_352.55

  1. ./nvidia-docker run --rm nvidia/cuda nvidia-smi

```
+------------------------------------------------------+
| NVIDIA-SMI 352.55 Driver Version: 352.55 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 Off | 0000:01:00.0 N/A | N/A |
| 34% 54C P8 N/A / N/A | 653MiB / 4093MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 580 Off | 0000:02:00.0 N/A | N/A |
| 46% 54C P12 N/A / N/A | 7MiB / 3071MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20c Off | 0000:03:00.0 Off | Off |
| 37% 49C P0 48W / 225W | 96MiB / 5119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+
```

  1. ./nvidia-docker run --rm nvidia/cuda nvidia-smi

Error response from daemon: Error looking up volume plugin nvidia-docker: Plugin not found

  1. docker volume ls

DRIVER VOLUME NAME

Tested on Ubuntu 14.04 running Docker 1.91 and Centos 7 running docker 1.9.0

It just seems like if sudo nvidia-docker volume setup is in the "Initial setup" section, than it shouldn't need to be run every time I create a new container, or am I missing something?

wontfix

Most helpful comment

I had this error after upgrading my nvida-driver to the latest version (wanted to use cuda 8):
"nvidia-docker run --rm nvidia/cuda nvidia-smi

^[[Adocker: Error response from daemon: create nvidia_driver_367.44: create nvidia_driver_367.44: Error looking up volume plugin nvidia-docker: plugin not found.
See 'docker run --help'.
."

Running on centos 7.
After a reboot, upgrading docker and nvida-docker-plugin and another reboot i realised that the plugin wasn't running.

sudo systemctl start nvidia-docker

fixed my issues

All 11 comments

Yes this is one of our limitations documented here
This is due to --rm removing the volumes attached to a container (equivalent to docker rm -v).
Here is the corresponding Docker issue: https://github.com/docker/docker/issues/17907

A workaround would be to change volume setup to create a data container referencing the volume.
I'm not thrilled by this solution though...

I'm actually a little bit of a fan of the data container idea.

  1. I'm not 100% sure how the volumes work, are they a mounts actually to /var/lib/docker/volumes/foo/_data, or more mount magic... But if you are doing hard links in there, that suggests to me it's more of a normal directory. At any rate, I remember hearing from a security stand point, it's better to rely on a data container than direct mounting to your host devices. Some people may care about that
  2. If the driver files are copied to a data container, that should alleviate this
  3. You no longer need root to set it up, you just need docker group permissions.

I was already playing with this idea when you mentioned it, it seems to work well to me :)

I used a Makefile with

install:
        docker build -t nvidia_driver -f Dockerfile_nvidia_driver .
        if docker inspect nvidia_driver_${NVIDIA_VERSION} > /dev/null 2>&1; then \
          docker rm nvidia_driver_${NVIDIA_VERSION}; \
        fi
        docker run -v /usr/bin:/hostbin:ro -v /usr/lib64/nvidia:/hostlib64 --name nvidia_driver_${NVIDIA_VERSION} nvidia_driver

run:
        docker run -it --rm \
                   --volumes-from nvidia_driver_${NVIDIA_VERSION}:ro \
                   $$(ls /dev/nvidia* | sed 's|^|--device |') \
                   cuda_example

And a Dockerfile_nvidia_driver of

FROM centos:7

VOLUME /usr/local/nvidia

CMD mkdir -p /usr/local/nvidia/bin && \
    cp -a /hostbin/nvidia* /usr/local/nvidia/bin/ && \
    cp -ra /hostlib64 /usr/local/nvidia/lib64

Sorry it's a little messy, but it was just a quick poc to prove to myself it would work

Data containers and volumes are exactly the same thing under the hood. Using volumes directly makes more sense because that's where Docker is headed with persistent volumes and the new volume CLI. It also keeps things unified between the standalone version and the plugin version (i.e. nvidia-docker standalone uses a local driver).

Creating an image and a container for the sake of having a volume referenced is not ideal. Besides, you still have to make sure that the container is not deleted.

This, really is a Docker issue and will be fixed upstream eventually. In the meantime I suggest you run your container without --rm, or use nvidia-docker-plugin.
If you really want to lock the volume with a data container, it's just a matter of doing:

volume="$(sudo nvidia-docker volume setup)"
nvidia-docker create --name=LOCK -v $volume:/data:ro tianon/true
nvidia-docker run --rm nvidia/cuda nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi

Regarding copy vs hardlink, we chose to do so to keep the ecosystem as light as possible. Copying around MB of driver files in order to launch a container is not an option.

Closing since it's an issue with upstream Docker.
I updated the documentation accordingly.

Fixed in Docker 1.10, the documentation has been updated.

Just a notice, I've run the install instruction from README and tried to test, it failed with error:

docker: Error response from daemon: create nvidia_driver_361.28: create nvidia_driver_361.28: Error looking up volume plugin nvidia-docker: plugin not found.

The solution was:

sudo ./nvidia-docker volume setup

Installed version: nvidia-docker_1.0.0.beta.3-1_amd64.deb

Are you running Ubuntu? If so, can you show me the output of:

cat /var/log/upstart/nvidia-docker.log

hi @3XX0, similar problem as @orian.
I am running Ubuntu 14.04.
Install follows the instructions on the wiki

# Install nvidia-docker and nvidia-docker-plugin
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-beta.3/nvidia-docker_1.0.0.beta.3-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker_1.0.0.beta.3-1_amd64.deb && rm /tmp/nvidia-docker*.deb

and when I test it,

# Test nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi

it give the the error (which take me to this issue)

Error response from daemon: Error looking up volume plugin nvidia-docker: Plugin Error: Plugin.Activate, 400 Bad Request: malformed Host header

My nvidia-docker.log looks like this

$ sudo cat /var/log/upstart/nvidia-docker.log
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:41 Loading NVIDIA management library
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:41 Loading NVIDIA unified memory
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:41 Discovering GPU devices
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:43 Provisioning volumes at /var/lib/nvidia-docker/volumes
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:43 Serving plugin API at /var/lib/nvidia-docker
/usr/bin/nvidia-docker-plugin | 2016/04/25 15:36:43 Serving remote API at localhost:3476

@guoquan see #83

I had this error after upgrading my nvida-driver to the latest version (wanted to use cuda 8):
"nvidia-docker run --rm nvidia/cuda nvidia-smi

^[[Adocker: Error response from daemon: create nvidia_driver_367.44: create nvidia_driver_367.44: Error looking up volume plugin nvidia-docker: plugin not found.
See 'docker run --help'.
."

Running on centos 7.
After a reboot, upgrading docker and nvida-docker-plugin and another reboot i realised that the plugin wasn't running.

sudo systemctl start nvidia-docker

fixed my issues

Running on AWS ami linux using nvidia-docker fails to initially launch container nvidia/cuda:7.5-devel

nvidia-docker run --rm nvidia/cuda:7.5-devel nvidia-smi
Error response from daemon: create nvidia_driver_352.99: Post http://%2Frun%2Fdocker%2Fplugins%2Fnvidia-docker.sock/VolumeDriver.Create: http: ContentLength=44 with Body length 0.

nvidia-docker volume ls
DRIVER VOLUME NAME
nvidia-docker nvidia_driver_352.99

Then when I try to launch the container again it succeeds.

Currently using docker version 1.11.2, build b9f10c9/1.11.2

cat /tmp/nvidia-docker.log
nvidia-docker-plugin | 2016/11/09 23:25:35 Loading NVIDIA unified memory
nvidia-docker-plugin | 2016/11/09 23:25:36 Loading NVIDIA management library
nvidia-docker-plugin | 2016/11/09 23:25:36 Discovering GPU devices
nvidia-docker-plugin | 2016/11/09 23:25:40 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2016/11/09 23:25:40 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2016/11/09 23:25:40 Serving remote API at localhost:3476
nvidia-docker-plugin | 2016/11/09 23:32:06 Received activate request
nvidia-docker-plugin | 2016/11/09 23:32:06 Plugins activated [VolumeDriver]
nvidia-docker-plugin | 2016/11/09 23:32:07 Received create request for volume 'nvidia_driver_352.99'

Was this page helpful?
0 / 5 - 0 ratings

Related issues

henry-blip picture henry-blip  路  3Comments

o1lo01ol1o picture o1lo01ol1o  路  4Comments

meftaul picture meftaul  路  3Comments

djglowny picture djglowny  路  3Comments

mmitterma picture mmitterma  路  4Comments