My setup was working fine, and suddenly docker stopped working.
just did "sudo apt-get update"
My files look like this now:
dtlu@dtlu16:~$ sudo cat /etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
dtlu@dtlu16:~$ sudo cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
{
“dns”: [“172.20.130.181”]
}
uname -adtlu@dtlu16:~$ systemctl daemon-reload
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: dtlu,,, (dtlu)
Password:
==== AUTHENTICATION COMPLETE ===
dtlu@dtlu16:~$ sudo service docker status
â—Ź docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─override.conf
Active: inactive (dead) (Result: exit-code) since Sat 2018-06-09 01:16:10 PDT; 21min ago
Docs: https://docs.docker.com
Main PID: 2299 (code=exited, status=1/FAILURE)
Jun 09 01:16:10 dtlu16 systemd[1]: Failed to start Docker Application Container Engine.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Unit entered failed state.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Jun 09 01:16:10 dtlu16 systemd[1]: Stopped Docker Application Container Engine.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Start request repeated too quickly.
Jun 09 01:16:10 dtlu16 systemd[1]: Failed to start Docker Application Container Engine.
dtlu@dtlu16:~$
sudo docker version
Client:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:17:20 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Check docker status
Le sam. 9 juin 2018 à 10:38, ctxrag notifications@github.com a écrit :
- Issue or feature description
My setup was working fine, and suddenly docker stopped working.
- Steps to reproduce the issue
just did "sudo apt-get update"
My files look like this now:
dtlu@dtlu16:$ sudo cat /etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd://
--add-runtime=nvidia=/usr/bin/nvidia-container-runtime
dtlu@dtlu16:$ sudo cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
{
“dns”: [“172.20.130.181”]
}
- Information to attach
https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/
(optional if deemed irrelevant)
- Kernel version from uname -a
dtlu@dtlu16:$ uname -a
Linux dtlu16 4.13.0-43-generic #48
https://github.com/NVIDIA/nvidia-docker/pull/4816.04.1-Ubuntu SMP
Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
dtlu@dtlu16:~$dtlu@dtlu16:$ systemctl daemon-reload
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: dtlu,,, (dtlu)
Password:
==== AUTHENTICATION COMPLETE ===
dtlu@dtlu16:$ sudo service docker status
â—Ź docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor
preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─override.conf
Active: inactive (dead) (Result: exit-code) since Sat 2018-06-09 01:16:10
PDT; 21min ago
Docs: https://docs.docker.com
Main PID: 2299 (code=exited, status=1/FAILURE)Jun 09 01:16:10 dtlu16 systemd[1]: Failed to start Docker Application
Container Engine.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Unit entered failed
state.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Failed with result
'exit-code'.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Service hold-off time
over, scheduling restart.
Jun 09 01:16:10 dtlu16 systemd[1]: Stopped Docker Application Container
Engine.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Start request repeated
too quickly.
Jun 09 01:16:10 dtlu16 systemd[1]: Failed to start Docker Application
Container Engine.
dtlu@dtlu16:~$sudo docker version
Client:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:17:20 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the
docker daemon running?—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/NVIDIA/nvidia-docker/issues/761, or mute the thread
https://github.com/notifications/unsubscribe-auth/AKeVupdUsDw_TUQWKzATVULsx6UOlGRMks5t64mRgaJpZM4UhPW7
.
https://github.com/NVIDIA/nvidia-docker/issues/749#issuecomment-393002402
Did you manually edit the override.conf file?
Make sure to systemctl daemon-reload and systemctl reload docker
I had a similar issue today after having to reboot my computer.
uname -a
Linux pcvp19 4.4.0-124-generic #148-Ubuntu SMP Wed May 2 13:00:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
systemctl status nvidia-docker
Warning: nvidia-docker.service changed on disk. Run 'systemctl daemon-reload' to reload units.
â—Ź nvidia-docker.service
Loaded: masked (/dev/null; bad)
Active: inactive (dead)
pcvp@pcvp19:/var$ sudo gvim /lib/systemd/system/docker.service
pcvp@pcvp19:/var$ systemctl status docker
â—Ź docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─override.conf
Active: inactive (dead) (Result: exit-code) since Tue 2018-06-12 10:04:57 PDT; 12s ago
Docs: https://docs.docker.com
Process: 2910 ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime (code=exited, status=1/FAILURE)
Main PID: 2910 (code=exited, status=1/FAILURE)
Jun 12 10:04:57 pcvp19 systemd[1]: Failed to start Docker Application Container Engine.
Jun 12 10:04:57 pcvp19 systemd[1]: docker.service: Unit entered failed state.
Jun 12 10:04:57 pcvp19 systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 12 10:04:57 pcvp19 systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Jun 12 10:04:57 pcvp19 systemd[1]: Stopped Docker Application Container Engine.
Jun 12 10:04:57 pcvp19 systemd[1]: docker.service: Start request repeated too quickly.
Jun 12 10:04:57 pcvp19 systemd[1]: Failed to start Docker Application Container Engine.
systemctl daemon-reload and systemctl reload docker left the following message.
docker.service is not active, cannot reload.
Did you edit the override.conf file?
No.
I was not even aware of override.conf until looking at the error message. I assume the file got created as part of the nvidia-docker install?
Here is override.conf contents:
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
@artificialbrains this is weird, did you setup the machine yourself?
What is the output of dpkg -S /etc/systemd/system/docker.service.d/override.conf?
I was given the machine a while ago, but have been maintaining since.
Here is what I got from running the dpkg command:
sudo dpkg -S /etc/systemd/system/docker.service.d/override.conf
dpkg-query: no path found matching pattern /etc/systemd/system/docker.service.d/override.conf
However, dpkg command may have been affect by the fact that I just tried to remove nvidia-docker2 using sudo apt-get update and sudo apt-get remove nvidia-docker2. I want to uninstall and reinstall nvidia-docker2 to see if the problem gets fixed.
The following actions fixed the issue for me.
I uninstalled nvidia-docker using:
sudo apt-get update
sudo apt-get remove nvidia-docker2
Rebooted the computer.
Docker started to work again. Previously, docker did not start correctly with just the reboot alone.
Installed nvidia-docker2
sudo apt-get update
sudo apt-get install nvidia-docker2
Nvidia-docker is now operable.
Sadly, I still have the same issue with override.conf. I pushed my luck and rebooted my machine to find docker fails to start because of override.conf.
If I removed override.conf, docker and nvidia-docker services start correctly; however, the nvidia-docker fails to open previously constructed docker containers.
As a temporary solution, I removed nvidia-docker2, rebooted my machine, and reinstalled nvidia-docker2. This will work till I have to reboot my machine again.
The override.conf file is not installed by nvidia-docker. It's likely that someone manually setup the machine this way. You can do systemctl edit docker then remove the --add-runtime=... part.
Thanks!
That fixed the problem. I am now able to reboot the machine and nvidia-docker/docker both function properly.
I had the same problem and the fix proposed by @flx42 worked for me!
Thanks!
Closing this issue for now, but I'm surprised so many people are facing this issue. I'm wondering if another package conflicts with our settings.
Most helpful comment
The
override.conffile is not installed bynvidia-docker. It's likely that someone manually setup the machine this way. You can dosystemctl edit dockerthen remove the--add-runtime=...part.