root@Computor:/home/alex# nvidia-docker-plugin
nvidia-docker-plugin | 2017/04/03 15:24:24 Loading NVIDIA unified memory
nvidia-docker-plugin | 2017/04/03 15:24:24 Loading NVIDIA management library
nvidia-docker-plugin | 2017/04/03 15:24:24 Discovering GPU devices
nvidia-docker-plugin | 2017/04/03 15:24:24 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2017/04/03 15:24:24 Serving plugin API at /run/docker/plugins
nvidia-docker-plugin | 2017/04/03 15:24:24 Serving remote API at localhost:3476
nvidia-docker-plugin | 2017/04/03 15:24:24 Error: listen tcp 127.0.0.1:3476: bind: address already in use
alex@Computor:~$ systemctl status nvidia-docker
● nvidia-docker.service - NVIDIA Docker plugin
Loaded: loaded (/lib/systemd/system/nvidia-docker.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: https://github.com/NVIDIA/nvidia-docker/wiki
alex@Computor:~$ nvidia-docker run --rm -ti crisbal/torch-rnn:cuda7.5 bash
docker: Error response from daemon: create nvidia_driver_375.39: create nvidia_driver_375.39: Error looking up volume plugin nvidia-docker: legacy plugin: plugin not found.
See 'docker run --help'.
alex@Computor:~$
alex@Computor:~$ ldconfig -p | grep nvidia-ml
libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib/nvidia-375/libnvidia-ml.so.1
libnvidia-ml.so.1 (libc6) => /usr/lib32/nvidia-375/libnvidia-ml.so.1
libnvidia-ml.so (libc6,x86-64) => /usr/lib/nvidia-375/libnvidia-ml.so
libnvidia-ml.so (libc6) => /usr/lib32/nvidia-375/libnvidia-ml.so
Thank you in advance, i have been giving some readings from the other issues but i can't seem to get this to work i'm trying to use it for torch rnn
You shouldn't start nvidia-docker-plugin manually, it should be started by systemd.
After restarting the machine, what's the output of journalctl -n -u nvidia-docker?
I have the same issue.
armor@armor2x:~$ service nvidia-docker status
● nvidia-docker.service - NVIDIA Docker plugin
Loaded: loaded (/lib/systemd/system/nvidia-docker.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: https://github.com/NVIDIA/nvidia-docker/wiki
armor@armor2x:~$ nvidia-docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: create nvidia_driver_375.39: Error looking up volume plugin nvidia-docker: legacy plugin: plugin not found.
See 'docker run --help'.
armor@armor2x:~$ journalctl -n -u nvidia-docker
-- No entries --
Also note that while installing I got this (error code 9):
armor@armor2x:~$ sudo dpkg -i /tmp/nvidia-docker*.deb
Configuring user
useradd: group nvidia-docker exists - if you want to add this user to that group, use -g.
dpkg: ошибка при обработке пакета nvidia-docker (--install):
подпроцесс установлен сценарий post-installation возвратил код ошибки 9
seems like problem with user or usergroup?
Same output @flx42, should i do a fresh reinstall?
@armoreal uninstall nvidia-docker, remove the nvidia-docker, then try a fresh reinstall.
@NeoZeromus are you saying you still have listen tcp 127.0.0.1:3476: bind: address already in use? And you didn't run nvidia-docker-plugin manually this time?
@flx42 exactly
@NeoZeromus Do you have another service running on this port?
$ sudo lsof -i :3476
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nvidia-do 8318 nvidia-docker 19u IPv4 81085 0t0 TCP localhost:3476 (LISTEN)
@flx42 i tried to reinstall several times, but got the same result.
But i notice that my situation is different from @NeoZeromus. I am able to manually run sudo nvidia-docker-plugin & and than nvidia-docker run properly.
Having the same error here.
I have
$ modinfo -F version nvidia
367.57
I did
docker volume create --driver=nvidia-docker --name=nvidia_driver_$(modinfo -F version nvidia)
and the volume is there
sudo docker volume ls | grep nvidia
nvidia-docker nvidia_driver_367.57
I run doing
docker run -it --device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia0 --volume-driver nvidia-docker -v nvidia_driver_367.57:/usr/local/nvidia:ro $IMAGE bash
and I have
$ journalctl -n -u nvidia-docker
-- Logs begin at Thu 2017-02-09 23:42:51 UTC, end at Thu 2017-04-27 17:36:00 UTC. --
Apr 12 19:31:39 nvidia-docker-loreto nvidia-docker-plugin[8831]: /usr/bin/nvidia-docker-plugin | 2017/04/12 19:31:39 Received mount request for volume 'nvidia_driver_367.57'
Apr 12 19:36:45 nvidia-docker-loreto nvidia-docker-plugin[8831]: /usr/bin/nvidia-docker-plugin | 2017/04/12 19:36:45 Received unmount request for volume 'nvidia_driver_367.57'
so it should be fine since it is listening
ubuntu@nvidia-docker-loreto:~$ sudo lsof -i :3476
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nvidia-do 8831 nvidia-docker 13u IPv4 63280 0t0 TCP localhost:3476 (LISTEN)
I just hit a problem on a fresh install;
nvidia-modprobe was not installed. I installed that and rebooted and everything was fine.
This may or may not help with the specific problems mentioned above but it did take care of things when I got the error message "Error looking up volume plugin nvidia-docker: legacy plugin: plugin not found"
Reboot after install solved it for me too.
@dbkinghorn how did you install the driver? if with the cuda package, it should include nvidia-modprobe.
In my experience, installing drivers and CUDA using nvidia's apt ppa works really well. Just make sure you don't have a preexisting installation from a runfile of both drivers or CUDA. Those should be uninstallable with /usr/bin/nvidia-uninstall (for the driver) and /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl (for CUDA 8.0, e.g.)
@grisaitis
When I do a docker setup I don't necessarily install CUDA at all on the host ... will use a docker container for that ....
I do Ubuntu installs from server, then add a desktop and the NV drivers from the graphics drivers ppa. If you want to see the setup look for the HPCblog at "Puget Systems" I have a series of docker and nvidia-docker posts that have all the details of getting a nice setup working. There is enough detail that you can script the install. I was showing this stuff at GTC and it was a big hit ... best wishes
Did you find the problem on your setup?
after I use nvidia-run -it id /bin/bash, I got this error
-- Logs begin at 三 2017-06-21 15:12:23 CST, end at 三 2017-06-21 16:27:18 CST. --
6月 21 16:27:04 guiyang sudo[6056]: pam_unix(sudo:session): session closed for user root
6月 21 16:27:18 guiyang sudo[6075]: guiyang : TTY=pts/17 ; PWD=/home/guiyang ; USER=root ; COMMAND=/usr/bin/nvidia-docker run -it df300f686ea4 /bin/bash
6月 21 16:27:18 guiyang sudo[6075]: pam_unix(sudo:session): session opened for user root by (uid=0)
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.443791696+08:00" level=error msg="Handler for GET /v1.27/containers/df300f686ea4/json returned error: No such container: df300f686ea4"
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.456286981+08:00" level=error msg="Handler for GET /v1.27/containers/df300f686ea4/json returned error: No such container: df300f686ea4"
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.464654980+08:00" level=error msg="Handler for GET /v1.27/volumes/nvidia_driver_375.66 returned error: get nvidia_driver_375.66: no such volume"
6月 21 16:27:18 guiyang nvidia-docker-plugin[5513]: /usr/bin/nvidia-docker-plugin | 2017/06/21 16:27:18 Received create request for volume 'nvidia_driver_375.66'
6月 21 16:27:18 guiyang nvidia-docker-plugin[5513]: /usr/bin/nvidia-docker-plugin | 2017/06/21 16:27:18 Error: mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/375.66: permission denied
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.624923075+08:00" level=error msg="Handler for POST /v1.27/containers/create returned error: create nvidia_driver_375.66: VolumeDriver.Create: internal error, che
6月 21 16:27:18 guiyang sudo[6075]: pam_unix(sudo:session): session closed for user root
@liuguiyangnwpu hm, a permissions error with mkdir.
What happens when you try creating that directory as root (/var/lib/nvidia-docker/volumes/nvidia_driver/375.66)? Does the parent directory /var/lib/nvidia-docker/volumes/nvidia_driver exist?
hi @grisaitis ,
the first problem is
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.443791696+08:00" level=error msg="Handler for GET /v1.27/containers/df300f686ea4/json returned error: No such container: df300f686ea4"
6月 21 16:27:18 guiyang dockerd[1330]: time="2017-06-21T16:27:18.456286981+08:00" level=error msg="Handler for GET /v1.27/containers/df300f686ea4/json returned error: No such container: df300f686ea4"
the system log info is No such container: df300f686ea4, but I run the image-id by the docker images
After I use
systemctl restart nvidia-docker
it does have mkdir the /var/lib/nvidia-docker/volumes/nvidia_driver
and there are some info in that
bin lib lib64
But I use
nvidia-docker run -it image-id /bin/bash
there some other errors occurs .
Try removing /var/lib/nvidia-docker and restart nvidia-docker.
Then run nvidia-docker run --rm nvidia/cuda nvidia-smi to check your install.
If it fails, paste the logs given by journalctl -u nvidia-docker
@3XX0 thanks !
FYI, after installing 2.0 and it not working, I wanted to go back to 1.0 But the install failed even after purge, etc. I had to
sudo rm -rf /usr/bin/nvidia-docker /var/lib/nvidia-docker/
# remove nvidia-docker group entry in /etc/group
Then reinstall 1.0. Otherwise install actually failed and I got a similar error as described here.
@pseudotensor please file a new issue for the 2.0 problem you faced. We have some improvements to do in the installation guide.
Most helpful comment
@flx42 i tried to reinstall several times, but got the same result.
But i notice that my situation is different from @NeoZeromus. I am able to manually run
sudo nvidia-docker-plugin &and than nvidia-docker run properly.