Nvidia-docker: Cannot restart docker after configuring /etc/docker/daemon.json

Created on 29 May 2018 · 5Comments · Source: NVIDIA/nvidia-docker

Hi everyone.
I got some trouble today installing this plugin.
Here is my environment
AWS Ubuntu Server 16.04
docker 18.03.1-ce
NVIDIA Docker: 2.0.3
CUDA Version 9.1.85

I have already installed nvidia-docker 2 . Then I used the following command to test the nvidia-docker2 and it is successful.
docker run --runtime=nvidia -it -p 8888:8888 tensorflow/tensorflow:latest-gpu

Then I followed the guide to install this plugin.I tried to configure the /etc/docker/daemon.json and
run the following commands:
sudo systemctl daemon-reload && sudo systemctl restart docker

And my configuration in daemon.json is here
{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }

But this step was wrong and I got the following output
job for docker.service failed because the control process exited with error code

Who can help me?
Thank you!

Source

yeya24

Most helpful comment

Oh, I see, you are registering the NVIDIA runtime twice:

In /etc/systemd/system/docker.service.d/override.conf with --add-runtime=nvidia
In /etc/docker/daemon.json with runtimes:

You should probably edit the systemd file.

flx42 on 30 May 2018

👍3

All 5 comments

What's systemctl status docker?

flx42 on 29 May 2018

Thanks @flx42

Here is the output I ran the command

systemctl status docker

● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─http-proxy.conf, override.conf
Active: inactive (dead) (Result: exit-code) since Wed 2018-05-30 01:32:02 UTC; 14s ago
Docs: https://docs.docker.com
Process: 16624 ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime (code=exited, status=1/FAILURE)
Main PID: 16624 (code=exited, status=1/FAILURE)
Tasks: 32
Memory: 142.8M
CPU: 45ms
CGroup: /system.slice/docker.service
├─15751 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/9664540dddbb4e2de3de2f4215e714f86a466c9f750919657b210469f8f4cb1d -addres
├─15822 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/3e09fafe7e558fe7ccdb43e395a46bffe92ff4bd7d955535ae7cc33a439faf53 -addres
├─15899 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/cde449970767382ab34b782c32110d5ea641bf0eb33370c82d5704e5e7867a28 -addres
└─16050 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/e8f941916018debee089c511125a1424c36d8968f1c7eb940d8262978120a4b0 -addres

May 30 01:32:02 ip-172-31-3-138 systemd[1]: Failed to start Docker Application Container Engine.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: docker.service: Unit entered failed state.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: docker.service: Failed with result 'exit-code'.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: docker.service: Service hold-off time over, scheduling restart.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: Stopped Docker Application Container Engine.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: docker.service: Start request repeated too quickly.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: Failed to start Docker Application Container Engine.

yeya24 on 30 May 2018

Oh, I see, you are registering the NVIDIA runtime twice:

In /etc/systemd/system/docker.service.d/override.conf with --add-runtime=nvidia
In /etc/docker/daemon.json with runtimes:

You should probably edit the systemd file.

flx42 on 30 May 2018

👍3

@flx42 Yes I'm sorry I configured it twice.
Thank you so much and it help my a lot!

yeya24 on 30 May 2018

Team, I hit the same issue. I did not register is manually but it got registered twice by itself. so, anyways, I made changes to /etc/systemd/system/docker.service.d/override.conf and removed --add-runtime=nvidia. however, I still get this message and docker does not start.

sudo cat /etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd://

sudo cat /etc/docker/daemon.json
{
"dns": ["8.8.8.8"],
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}

LOGS ------
Jun 09 00:49:54 dtlu16 dockerd[3262]: time="2018-06-09T00:49:54-07:00" level=info msg="containerd successfully booted in 0.004476s" module=containerd
Jun 09 00:49:54 dtlu16 dockerd[3262]: Error starting daemon: error initializing graphdriver: /var/lib/docker contains several valid graphdrivers: overlay2, aufs; Please cle
Jun 09 00:49:54 dtlu16 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 09 00:49:54 dtlu16 systemd[1]: Failed to start Docker Application Container Engine.
Jun 09 00:49:54 dtlu16 systemd[1]: docker.service: Unit entered failed state.
Jun 09 00:49:54 dtlu16 systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 09 00:49:54 dtlu16 systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Jun 09 00:49:54 dtlu16 systemd[1]: Stopped Docker Application Container Engine.
Jun 09 00:49:54 dtlu16 systemd[1]: docker.service: Start request repeated too quickly.
Jun 09 00:49:54 dtlu16 systemd[1]: Failed to start Docker Application Container Engine.
Jun 09 00:56:39 dtlu16 systemd[1]: Stopped Docker Application Container Engine.