Nvidia-docker: Error when trying run nvidia-docker

Created on 3 Apr 2017 · 20Comments · Source: NVIDIA/nvidia-docker

root@Computor:/home/alex# nvidia-docker-plugin
nvidia-docker-plugin | 2017/04/03 15:24:24 Loading NVIDIA unified memory
nvidia-docker-plugin | 2017/04/03 15:24:24 Loading NVIDIA management library
nvidia-docker-plugin | 2017/04/03 15:24:24 Discovering GPU devices
nvidia-docker-plugin | 2017/04/03 15:24:24 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2017/04/03 15:24:24 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2017/04/03 15:24:24 Serving remote API at localhost:3476
nvidia-docker-plugin | 2017/04/03 15:24:24 Error: listen tcp 127.0.0.1:3476: bind: address already in use

alex@Computor:~$ systemctl status nvidia-docker
● nvidia-docker.service - NVIDIA Docker plugin
Loaded: loaded (/lib/systemd/system/nvidia-docker.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: https://github.com/NVIDIA/nvidia-docker/wiki
alex@Computor:~$ nvidia-docker run --rm -ti crisbal/torch-rnn:cuda7.5 bash

docker: Error response from daemon: create nvidia_driver_375.39: create nvidia_driver_375.39: Error looking up volume plugin nvidia-docker: legacy plugin: plugin not found.
See 'docker run --help'.
alex@Computor:~$
alex@Computor:~$ ldconfig -p | grep nvidia-ml
libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib/nvidia-375/libnvidia-ml.so.1
libnvidia-ml.so.1 (libc6) => /usr/lib32/nvidia-375/libnvidia-ml.so.1
libnvidia-ml.so (libc6,x86-64) => /usr/lib/nvidia-375/libnvidia-ml.so
libnvidia-ml.so (libc6) => /usr/lib32/nvidia-375/libnvidia-ml.so

Thank you in advance, i have been giving some readings from the other issues but i can't seem to get this to work i'm trying to use it for torch rnn

Source

NeoZeromus

Most helpful comment

@flx42 i tried to reinstall several times, but got the same result.
But i notice that my situation is different from @NeoZeromus. I am able to manually run sudo nvidia-docker-plugin & and than nvidia-docker run properly.

armoreal on 8 Apr 2017

👍8 ❤4 🎉3

All 20 comments

You shouldn't start nvidia-docker-plugin manually, it should be started by systemd.
After restarting the machine, what's the output of journalctl -n -u nvidia-docker?

flx42 on 6 Apr 2017

👍2

I have the same issue.
armor@armor2x:~$ service nvidia-docker status ● nvidia-docker.service - NVIDIA Docker plugin Loaded: loaded (/lib/systemd/system/nvidia-docker.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: https://github.com/NVIDIA/nvidia-docker/wiki armor@armor2x:~$ nvidia-docker run --rm nvidia/cuda nvidia-smi docker: Error response from daemon: create nvidia_driver_375.39: Error looking up volume plugin nvidia-docker: legacy plugin: plugin not found. See 'docker run --help'. armor@armor2x:~$ journalctl -n -u nvidia-docker -- No entries --

Also note that while installing I got this (error code 9):
armor@armor2x:~$ sudo dpkg -i /tmp/nvidia-docker*.deb Configuring user useradd: group nvidia-docker exists - if you want to add this user to that group, use -g. dpkg: ошибка при обработке пакета nvidia-docker (--install): подпроцесс установлен сценарий post-installation возвратил код ошибки 9
seems like problem with user or usergroup?

armoreal on 8 Apr 2017

Same output @flx42, should i do a fresh reinstall?

NeoZeromus on 8 Apr 2017

@armoreal uninstall nvidia-docker, remove the nvidia-docker, then try a fresh reinstall.

@NeoZeromus are you saying you still have listen tcp 127.0.0.1:3476: bind: address already in use? And you didn't run nvidia-docker-plugin manually this time?

flx42 on 8 Apr 2017

@flx42 exactly

NeoZeromus on 8 Apr 2017

@NeoZeromus Do you have another service running on this port?

$ sudo lsof -i :3476
COMMAND    PID          USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
nvidia-do 8318 nvidia-docker   19u  IPv4  81085      0t0  TCP localhost:3476 (LISTEN)

flx42 on 8 Apr 2017

armoreal on 8 Apr 2017

👍8 ❤4 🎉3

Having the same error here.

I have

$ modinfo -F version nvidia
367.57

I did

docker volume create --driver=nvidia-docker --name=nvidia_driver_$(modinfo -F version nvidia)

and the volume is there

sudo docker volume ls | grep nvidia
nvidia-docker       nvidia_driver_367.57

I run doing

docker run -it --device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia0 --volume-driver nvidia-docker -v nvidia_driver_367.57:/usr/local/nvidia:ro $IMAGE bash

and I have

$ journalctl -n -u nvidia-docker
-- Logs begin at Thu 2017-02-09 23:42:51 UTC, end at Thu 2017-04-27 17:36:00 UTC. --
Apr 12 19:31:39 nvidia-docker-loreto nvidia-docker-plugin[8831]: /usr/bin/nvidia-docker-plugin | 2017/04/12 19:31:39 Received mount request for volume 'nvidia_driver_367.57'
Apr 12 19:36:45 nvidia-docker-loreto nvidia-docker-plugin[8831]: /usr/bin/nvidia-docker-plugin | 2017/04/12 19:36:45 Received unmount request for volume 'nvidia_driver_367.57'

so it should be fine since it is listening

ubuntu@nvidia-docker-loreto:~$ sudo lsof -i :3476
COMMAND    PID          USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
nvidia-do 8831 nvidia-docker   13u  IPv4  63280      0t0  TCP localhost:3476 (LISTEN)

loretoparisi on 27 Apr 2017

I just hit a problem on a fresh install;
nvidia-modprobe was not installed. I installed that and rebooted and everything was fine.
This may or may not help with the specific problems mentioned above but it did take care of things when I got the error message "Error looking up volume plugin nvidia-docker: legacy plugin: plugin not found"

dbkinghorn on 1 May 2017

👍1

Reboot after install solved it for me too.

puyash on 4 May 2017

👍5

@dbkinghorn how did you install the driver? if with the cuda package, it should include nvidia-modprobe.

In my experience, installing drivers and CUDA using nvidia's apt ppa works really well. Just make sure you don't have a preexisting installation from a runfile of both drivers or CUDA. Those should be uninstallable with /usr/bin/nvidia-uninstall (for the driver) and /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl (for CUDA 8.0, e.g.)

grisaitis on 12 May 2017

@grisaitis
When I do a docker setup I don't necessarily install CUDA at all on the host ... will use a docker container for that ....
I do Ubuntu installs from server, then add a desktop and the NV drivers from the graphics drivers ppa. If you want to see the setup look for the HPCblog at "Puget Systems" I have a series of docker and nvidia-docker posts that have all the details of getting a nice setup working. There is enough detail that you can script the install. I was showing this stuff at GTC and it was a big hit ... best wishes

dbkinghorn on 13 May 2017

Did you find the problem on your setup?

3XX0 on 13 Jun 2017

after I use nvidia-run -it id /bin/bash, I got this error

-- Logs begin at 三 2017-06-21 15:12:23 CST, end at 三 2017-06-21 16:27:18 CST. --
6月 21 16:27:04 guiyang sudo[6056]: pam_unix(sudo:session): session closed for user root
6月 21 16:27:18 guiyang sudo[6075]:  guiyang : TTY=pts/17 ; PWD=/home/guiyang ; USER=root ; COMMAND=/usr/bin/nvidia-docker run -it df300f686ea4 /bin/bash
6月 21 16:27:18 guiyang sudo[6075]: pam_unix(sudo:session): session opened for user root by (uid=0)
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.443791696+08:00" level=error msg="Handler for GET /v1.27/containers/df300f686ea4/json returned error: No such container: df300f686ea4"
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.456286981+08:00" level=error msg="Handler for GET /v1.27/containers/df300f686ea4/json returned error: No such container: df300f686ea4"
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.464654980+08:00" level=error msg="Handler for GET /v1.27/volumes/nvidia_driver_375.66 returned error: get nvidia_driver_375.66: no such volume"
6月 21 16:27:18 guiyang nvidia-docker-plugin[5513]: /usr/bin/nvidia-docker-plugin | 2017/06/21 16:27:18 Received create request for volume 'nvidia_driver_375.66'
6月 21 16:27:18 guiyang nvidia-docker-plugin[5513]: /usr/bin/nvidia-docker-plugin | 2017/06/21 16:27:18 Error: mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/375.66: permission denied
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.624923075+08:00" level=error msg="Handler for POST /v1.27/containers/create returned error: create nvidia_driver_375.66: VolumeDriver.Create: internal error, che
6月 21 16:27:18 guiyang sudo[6075]: pam_unix(sudo:session): session closed for user root

liuguiyangnwpu on 21 Jun 2017

@liuguiyangnwpu hm, a permissions error with mkdir.

What happens when you try creating that directory as root (/var/lib/nvidia-docker/volumes/nvidia_driver/375.66)? Does the parent directory /var/lib/nvidia-docker/volumes/nvidia_driver exist?

grisaitis on 21 Jun 2017

hi @grisaitis ,
the first problem is

6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.443791696+08:00" level=error msg="Handler for GET /v1.27/containers/df300f686ea4/json returned error: No such container: df300f686ea4"
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.456286981+08:00" level=error msg="Handler for GET /v1.27/containers/df300f686ea4/json returned error: No such container: df300f686ea4"

the system log info is No such container: df300f686ea4, but I run the image-id by the docker images
After I use

systemctl restart nvidia-docker

it does have mkdir the /var/lib/nvidia-docker/volumes/nvidia_driver
and there are some info in that

bin lib lib64

But I use

nvidia-docker run -it image-id /bin/bash
there some other errors occurs .

liuguiyangnwpu on 21 Jun 2017

Try removing /var/lib/nvidia-docker and restart nvidia-docker.
Then run nvidia-docker run --rm nvidia/cuda nvidia-smi to check your install.

If it fails, paste the logs given by journalctl -u nvidia-docker

3XX0 on 28 Jun 2017

@3XX0 thanks !

liuguiyangnwpu on 30 Jun 2017

FYI, after installing 2.0 and it not working, I wanted to go back to 1.0 But the install failed even after purge, etc. I had to

sudo rm -rf /usr/bin/nvidia-docker /var/lib/nvidia-docker/
# remove nvidia-docker group entry in /etc/group

Then reinstall 1.0. Otherwise install actually failed and I got a similar error as described here.

pseudotensor on 20 Nov 2017

@pseudotensor please file a new issue for the 2.0 problem you faced. We have some improvements to do in the installation guide.

flx42 on 20 Nov 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Error response from daemon: OCI runtime create failed:

agnis84 · 4Comments

Ubuntu 18.10 is definitely missing

DimanNe · 3Comments

Error gathering devices info, lsat /dev/nvidia-uvm-tools: no such file

Anthony-Tatowicz · 4Comments

This docker work on Windows 10?

SpotCrowdTech · 3Comments

Unknown flag --gpus error from subprocess module

adbeda · 3Comments