nvidia-docker fails on ubuntu 16

Created on 4 Jun 2017  路  7Comments  路  Source: NVIDIA/nvidia-docker

Fresh ec2 p2 instance
Running ubuntu 16
installed docker-ce, tested ok
downloaded nvidia-docker using wget ... as per README Ubuntu section this repo
then install deb gives me this:

ubuntu@ip-10-0-1-190:~$ sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
(Reading database ... 76860 files and directories currently installed.)
Preparing to unpack .../nvidia-docker_1.0.1-1_amd64.deb ...
Unpacking nvidia-docker (1.0.1-1) over (1.0.1-1) ...
Setting up nvidia-docker (1.0.1-1) ...
Setting up permissions
Job for nvidia-docker.service failed because the control process exited with error code. See "systemctl status nvidia-docker.service" and "journalctl -xe" for details.
nvidia-docker.service couldn't start.
Processing triggers for ureadahead (0.100.0-19) ...

It seems this is quite a common problem and has something to do with prerequisite install being rather unclear. Perhaps a more detailed set of instructions can be provided for installing all of the prerequisites.

Most helpful comment

nvidia-modprobe was missing. Separate installation solved the issue. Thanks!

All 7 comments

Hello @merl-dev,

Unfortunately the user issues often arise from Docker installation or our driver installation, and we can't control that.

What's the output of journalctl -n -u nvidia-docker and nvidia-smi?

Hi @flx42 ,

I have the same problem. Output of journalctl -n -u nvidia-docker:

akalinin@nur190dhcp188:~/software$ journalctl -n -u nvidia-docker
-- Logs begin at Tue 2017-06-06 07:55:18 EDT, end at Tue 2017-06-06 10:45:42 EDT. --
Jun 06 10:43:07 nur190dhcp188 systemd[1]: Stopped NVIDIA Docker plugin.
Jun 06 10:43:07 nur190dhcp188 systemd[1]: Starting NVIDIA Docker plugin...
Jun 06 10:43:07 nur190dhcp188 systemd[1]: nvidia-docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 06 10:43:07 nur190dhcp188 systemd[1]: Failed to start NVIDIA Docker plugin.
Jun 06 10:43:07 nur190dhcp188 systemd[1]: nvidia-docker.service: Unit entered failed state.
Jun 06 10:43:07 nur190dhcp188 systemd[1]: nvidia-docker.service: Failed with result 'exit-code'.
Jun 06 10:43:08 nur190dhcp188 systemd[1]: nvidia-docker.service: Service hold-off time over, scheduling restart.
Jun 06 10:43:08 nur190dhcp188 systemd[1]: Stopped NVIDIA Docker plugin.
Jun 06 10:43:08 nur190dhcp188 systemd[1]: nvidia-docker.service: Start request repeated too quickly.
Jun 06 10:43:08 nur190dhcp188 systemd[1]: Failed to start NVIDIA Docker plugin.

and nvidia-smi

akalinin@nur190dhcp188:~/software$ nvidia-smi
Tue Jun  6 10:47:21 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 22%   47C    P0    70W / 250W |      0MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 0000:82:00.0     Off |                  N/A |
|  0%   48C    P0    57W / 250W |      0MiB / 12189MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I don't see anything special, try launching the plugin manually and paste the output: sudo -u nvidia-docker /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker

nvidia-modprobe was missing. Separate installation solved the issue. Thanks!

Any update @merl-dev ?

ok so I got it running, was missing steps:
sudo apt install nvidia-375 nvidia-settings

I also installed cuda and nvidia-cuda-toolkit (and cudnn?)

To be honest, there should be a slightly more straightforward install instruction set, as I am not sure I have an optimal setup and would likely spend a day getting there. An appropriate quote:

Yet another Docker tool

Since this solution deals with nitty-gritty details and is quite different from the common Docker use cases, we provide two tools for convenience: an alternative Docker CLI and a Docker plugin. While we understand that using supplementary command line tools can be frustrating, we tried to stay as close as possible to the Docker philosophy.

We hope that in the future Docker will enhance its plugin system to allow devices and parameters to be injected more easily in the command-line, rendering our tools obsolete.

I don't see how this quote is relevant.

Anyway:
https://github.com/NVIDIA/nvidia-docker/wiki/Installation#prerequisites
There are multiple ways of preparing a machine and installing the NVIDIA drivers, that's why it's difficult to provide generic install steps.

If you want a step by step install, we have the AWS tutorial

Was this page helpful?
0 / 5 - 0 ratings