I cannot (re)start a exited container after rebooting the host OS.
hyeon0145@titan:~$ nvidia-docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bdebabac201b hyeon0145/notebook "/run-jupyter-noteboo" 21 minutes ago Exited (1) 11 minutes ago happy_jang
b6db902570bc hyeon0145/notebook "/run-jupyter-noteboo" 22 minutes ago Exited (1) 11 minutes ago gloomy_williams
hyeon0145@titan:~$ nvidia-docker start bdeb
Error response from daemon: linux runtime spec devices: error gathering device information while adding custom device "/dev/nvidia-uvm-tools": lstat /dev/nvidia-uvm-tools: no such file or directory
Error: failed to start containers: bdeb
However, I can (re)start a exited container right after exiting the container.
hyeon0145@titan:~$ nvidia-docker run ... hyeon0145/notebook
94438c8888dc373e8276a6c4a045525ea34bfe235ebfd25a97a88ec27e6dab57
hyeon0145@titan:~$ nvidia-docker stop 94438
94438
hyeon0145@titan:~$ nvidia-docker start 94438
94438
Ah, I think I know what's going on: the NVIDIA driver is probably not loaded at this point. We didn't thought of this use case.
I believe that if you reboot, use nvidia-docker run to start a new container, then use nvidia-docker start with the container exited before the reboot, then it will work. Can you check?
Please also try the following: after the reboot, execute command nvidia-modprobe -u -c=0 and then try the nvidia-docker start again. Thank you.
I am so sorry. When I re-try this issue today, it works without any problem.
I had the same problem and the solution suggested by @flx42 (nvidia-modprobe -u -c=0 before nvidia-docker start) did work! Thanks.
The use of the nvidia-docker run that @flx42 proposed worked great. I just ran nvidia-docker run hello-world and it works again just fine.
Most helpful comment
Ah, I think I know what's going on: the NVIDIA driver is probably not loaded at this point. We didn't thought of this use case.
I believe that if you reboot, use
nvidia-docker runto start a new container, then usenvidia-docker startwith the container exited before the reboot, then it will work. Can you check?Please also try the following: after the reboot, execute command
nvidia-modprobe -u -c=0and then try thenvidia-docker startagain. Thank you.