Nvidia-docker: Cannot (re)start a container after rebooting the host OS

Created on 10 Jan 2017  路  4Comments  路  Source: NVIDIA/nvidia-docker

I cannot (re)start a exited container after rebooting the host OS.

hyeon0145@titan:~$ nvidia-docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                      PORTS               NAMES
bdebabac201b        hyeon0145/notebook       "/run-jupyter-noteboo"   21 minutes ago      Exited (1) 11 minutes ago                       happy_jang
b6db902570bc        hyeon0145/notebook       "/run-jupyter-noteboo"   22 minutes ago      Exited (1) 11 minutes ago                       gloomy_williams
hyeon0145@titan:~$ nvidia-docker start bdeb
Error response from daemon: linux runtime spec devices: error gathering device information while adding custom device "/dev/nvidia-uvm-tools": lstat /dev/nvidia-uvm-tools: no such file or directory
Error: failed to start containers: bdeb

However, I can (re)start a exited container right after exiting the container.

hyeon0145@titan:~$ nvidia-docker run ... hyeon0145/notebook
94438c8888dc373e8276a6c4a045525ea34bfe235ebfd25a97a88ec27e6dab57
hyeon0145@titan:~$ nvidia-docker stop 94438
94438
hyeon0145@titan:~$ nvidia-docker start 94438
94438

Most helpful comment

Ah, I think I know what's going on: the NVIDIA driver is probably not loaded at this point. We didn't thought of this use case.
I believe that if you reboot, use nvidia-docker run to start a new container, then use nvidia-docker start with the container exited before the reboot, then it will work. Can you check?
Please also try the following: after the reboot, execute command nvidia-modprobe -u -c=0 and then try the nvidia-docker start again. Thank you.

All 4 comments

Ah, I think I know what's going on: the NVIDIA driver is probably not loaded at this point. We didn't thought of this use case.
I believe that if you reboot, use nvidia-docker run to start a new container, then use nvidia-docker start with the container exited before the reboot, then it will work. Can you check?
Please also try the following: after the reboot, execute command nvidia-modprobe -u -c=0 and then try the nvidia-docker start again. Thank you.

I am so sorry. When I re-try this issue today, it works without any problem.

I had the same problem and the solution suggested by @flx42 (nvidia-modprobe -u -c=0 before nvidia-docker start) did work! Thanks.

The use of the nvidia-docker run that @flx42 proposed worked great. I just ran nvidia-docker run hello-world and it works again just fine.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DimanNe picture DimanNe  路  3Comments

lsb picture lsb  路  4Comments

SpotCrowdTech picture SpotCrowdTech  路  3Comments

mmitterma picture mmitterma  路  4Comments

adbeda picture adbeda  路  3Comments