Fresh ec2 p2 instance
Running ubuntu 16
installed docker-ce, tested ok
downloaded nvidia-docker using wget ... as per README Ubuntu section this repo
then install deb gives me this:
ubuntu@ip-10-0-1-190:~$ sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
(Reading database ... 76860 files and directories currently installed.)
Preparing to unpack .../nvidia-docker_1.0.1-1_amd64.deb ...
Unpacking nvidia-docker (1.0.1-1) over (1.0.1-1) ...
Setting up nvidia-docker (1.0.1-1) ...
Setting up permissions
Job for nvidia-docker.service failed because the control process exited with error code. See "systemctl status nvidia-docker.service" and "journalctl -xe" for details.
nvidia-docker.service couldn't start.
Processing triggers for ureadahead (0.100.0-19) ...
It seems this is quite a common problem and has something to do with prerequisite install being rather unclear. Perhaps a more detailed set of instructions can be provided for installing all of the prerequisites.
Hello @merl-dev,
Unfortunately the user issues often arise from Docker installation or our driver installation, and we can't control that.
What's the output of journalctl -n -u nvidia-docker and nvidia-smi?
Hi @flx42 ,
I have the same problem. Output of journalctl -n -u nvidia-docker:
akalinin@nur190dhcp188:~/software$ journalctl -n -u nvidia-docker
-- Logs begin at Tue 2017-06-06 07:55:18 EDT, end at Tue 2017-06-06 10:45:42 EDT. --
Jun 06 10:43:07 nur190dhcp188 systemd[1]: Stopped NVIDIA Docker plugin.
Jun 06 10:43:07 nur190dhcp188 systemd[1]: Starting NVIDIA Docker plugin...
Jun 06 10:43:07 nur190dhcp188 systemd[1]: nvidia-docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 06 10:43:07 nur190dhcp188 systemd[1]: Failed to start NVIDIA Docker plugin.
Jun 06 10:43:07 nur190dhcp188 systemd[1]: nvidia-docker.service: Unit entered failed state.
Jun 06 10:43:07 nur190dhcp188 systemd[1]: nvidia-docker.service: Failed with result 'exit-code'.
Jun 06 10:43:08 nur190dhcp188 systemd[1]: nvidia-docker.service: Service hold-off time over, scheduling restart.
Jun 06 10:43:08 nur190dhcp188 systemd[1]: Stopped NVIDIA Docker plugin.
Jun 06 10:43:08 nur190dhcp188 systemd[1]: nvidia-docker.service: Start request repeated too quickly.
Jun 06 10:43:08 nur190dhcp188 systemd[1]: Failed to start NVIDIA Docker plugin.
and nvidia-smi
akalinin@nur190dhcp188:~/software$ nvidia-smi
Tue Jun 6 10:47:21 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:02:00.0 Off | N/A |
| 22% 47C P0 70W / 250W | 0MiB / 12207MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN X (Pascal) Off | 0000:82:00.0 Off | N/A |
| 0% 48C P0 57W / 250W | 0MiB / 12189MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I don't see anything special, try launching the plugin manually and paste the output: sudo -u nvidia-docker /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker
nvidia-modprobe was missing. Separate installation solved the issue. Thanks!
Any update @merl-dev ?
ok so I got it running, was missing steps:
sudo apt install nvidia-375 nvidia-settings
I also installed cuda and nvidia-cuda-toolkit (and cudnn?)
To be honest, there should be a slightly more straightforward install instruction set, as I am not sure I have an optimal setup and would likely spend a day getting there. An appropriate quote:
Yet another Docker tool
Since this solution deals with nitty-gritty details and is quite different from the common Docker use cases, we provide two tools for convenience: an alternative Docker CLI and a Docker plugin. While we understand that using supplementary command line tools can be frustrating, we tried to stay as close as possible to the Docker philosophy.
We hope that in the future Docker will enhance its plugin system to allow devices and parameters to be injected more easily in the command-line, rendering our tools obsolete.
I don't see how this quote is relevant.
Anyway:
https://github.com/NVIDIA/nvidia-docker/wiki/Installation#prerequisites
There are multiple ways of preparing a machine and installing the NVIDIA drivers, that's why it's difficult to provide generic install steps.
If you want a step by step install, we have the AWS tutorial
Most helpful comment
nvidia-modprobewas missing. Separate installation solved the issue. Thanks!