Hi everyone.
I got some trouble today installing this plugin.
Here is my environment
AWS Ubuntu Server 16.04
docker 18.03.1-ce
NVIDIA Docker: 2.0.3
CUDA Version 9.1.85
I have already installed nvidia-docker 2 . Then I used the following command to test the nvidia-docker2 and it is successful.
docker run --runtime=nvidia -it -p 8888:8888 tensorflow/tensorflow:latest-gpu
Then I followed the guide to install this plugin.I tried to configure the /etc/docker/daemon.json and
run the following commands:
sudo systemctl daemon-reload && sudo systemctl restart docker
And my configuration in daemon.json is here
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
But this step was wrong and I got the following output
job for docker.service failed because the control process exited with error code
Who can help me?
Thank you!
What's systemctl status docker?
Thanks @flx42
Here is the output I ran the command
systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─http-proxy.conf, override.conf
Active: inactive (dead) (Result: exit-code) since Wed 2018-05-30 01:32:02 UTC; 14s ago
Docs: https://docs.docker.com
Process: 16624 ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime (code=exited, status=1/FAILURE)
Main PID: 16624 (code=exited, status=1/FAILURE)
Tasks: 32
Memory: 142.8M
CPU: 45ms
CGroup: /system.slice/docker.service
├─15751 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/9664540dddbb4e2de3de2f4215e714f86a466c9f750919657b210469f8f4cb1d -addres
├─15822 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/3e09fafe7e558fe7ccdb43e395a46bffe92ff4bd7d955535ae7cc33a439faf53 -addres
├─15899 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/cde449970767382ab34b782c32110d5ea641bf0eb33370c82d5704e5e7867a28 -addres
└─16050 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/e8f941916018debee089c511125a1424c36d8968f1c7eb940d8262978120a4b0 -addres
May 30 01:32:02 ip-172-31-3-138 systemd[1]: Failed to start Docker Application Container Engine.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: docker.service: Unit entered failed state.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: docker.service: Failed with result 'exit-code'.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: docker.service: Service hold-off time over, scheduling restart.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: Stopped Docker Application Container Engine.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: docker.service: Start request repeated too quickly.
May 30 01:32:02 ip-172-31-3-138 systemd[1]: Failed to start Docker Application Container Engine.
Oh, I see, you are registering the NVIDIA runtime twice:
/etc/systemd/system/docker.service.d/override.conf with --add-runtime=nvidia/etc/docker/daemon.json with runtimes:You should probably edit the systemd file.
@flx42 Yes I'm sorry I configured it twice.
Thank you so much and it help my a lot!
Team, I hit the same issue. I did not register is manually but it got registered twice by itself. so, anyways, I made changes to /etc/systemd/system/docker.service.d/override.conf and removed --add-runtime=nvidia. however, I still get this message and docker does not start.
sudo cat /etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd://
sudo cat /etc/docker/daemon.json
{
"dns": ["8.8.8.8"],
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
LOGS ------
Jun 09 00:49:54 dtlu16 dockerd[3262]: time="2018-06-09T00:49:54-07:00" level=info msg="containerd successfully booted in 0.004476s" module=containerd
Jun 09 00:49:54 dtlu16 dockerd[3262]: Error starting daemon: error initializing graphdriver: /var/lib/docker contains several valid graphdrivers: overlay2, aufs; Please cle
Jun 09 00:49:54 dtlu16 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 09 00:49:54 dtlu16 systemd[1]: Failed to start Docker Application Container Engine.
Jun 09 00:49:54 dtlu16 systemd[1]: docker.service: Unit entered failed state.
Jun 09 00:49:54 dtlu16 systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 09 00:49:54 dtlu16 systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Jun 09 00:49:54 dtlu16 systemd[1]: Stopped Docker Application Container Engine.
Jun 09 00:49:54 dtlu16 systemd[1]: docker.service: Start request repeated too quickly.
Jun 09 00:49:54 dtlu16 systemd[1]: Failed to start Docker Application Container Engine.
Jun 09 00:56:39 dtlu16 systemd[1]: Stopped Docker Application Container Engine.
Most helpful comment
Oh, I see, you are registering the NVIDIA runtime twice:
/etc/systemd/system/docker.service.d/override.confwith--add-runtime=nvidia/etc/docker/daemon.jsonwithruntimes:You should probably edit the systemd file.