Steps to reproduce the issue:
docker run -d --name=swam-agent --restart=always swarm join --advertise=<LocalHostName>:2375 consul://<Consul-hostname>:8500
docker stop swam-agent
docker start swam-agent
Describe the results you received:
Error response from daemon: Container command '/swarm' not found or does not exist.
Error: failed to start containers: swam-agent
From the Docker logs:
level=error msg="containerd: start container" error="oci runtime error: fork/exec /usr/bin/docker (deleted): no such file or directory: "
Describe the results you expected:
Swarm container should be able to launch successfully
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version:
Client:
Version: 1.11.2
API version: 1.23
Go version: go1.5.4
Git commit: b9f10c9
Built: Wed Jun 1 21:58:52 2016
OS/Arch: linux/amd64
Server:
Version: 1.11.2
API version: 1.23
Go version: go1.5.4
Git commit: b9f10c9
Built: Wed Jun 1 21:58:52 2016
OS/Arch: linux/amd64
Output of docker info:
Containers: 6
Running: 6
Paused: 0
Stopped: 0
Images: 22
Server Version: 1.11.2
Storage Driver: devicemapper
Pool Name: docker-202:2-4456533-pool
Pool Blocksize: 65.54 kB
Base Device Size: 107.4 GB
Backing Filesystem: ext4
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 76.6 GB
Data Space Total: 107.4 GB
Data Space Available: 30.78 GB
Metadata Space Used: 59.82 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.088 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /dev/home/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
Metadata loop file: /dev/home/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.117-RHEL6 (2016-08-15)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null host overlay
Kernel Version: 4.1.12-61.1.22.el6uek.x86_64
Operating System: <unknown>
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 16.49 GiB
Name: bus00bbd
ID: YBAG:V4YC:JTGP:54K4:ZGPB:RCTP:XCAN:JQZA:THU6:DTSD:SVZ7:HUDD
Docker Root Dir: /dev/home/docker
Debug mode (client): false
Debug mode (server): false
Http Proxy: http://www-host50.com:80
Https Proxy: http://www-host50.com:80
No Proxy: localhost,docker-registry.local,/var/run/docker.sock
Registry: https://index.docker.io/v1/
Labels:
pool=south
Cluster store: consul://consul.host.com:8500
Cluster advertise: myhost1.host.com:2375
Additional environment details (AWS, VirtualBox, physical, etc.):
OS: Oracle Enterprise Linux 6
Does /usr/bin/docker still exist in your system? Are you sure you didn't replace the binary while docker kept running.
Looks like a continuation of the discussions in https://github.com/docker/docker/issues/23411#issuecomment-267813370, and https://github.com/docker/docker/issues/25381#issuecomment-268239370
So here is the updated of trying 1.12.3, seeing similar error. The reproducing steps are same.
ERROR: for swarm-node Cannot start service swarm-node: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:334: running prestart hook 0 caused \\\"fork/exec /usr/bin/dockerd (deleted): no such file or directory\\\"\"\n"
Docker Info:
Containers: 5
Running: 3
Paused: 0
Stopped: 2
Images: 15
Server Version: 1.12.3
Storage Driver: devicemapper
Pool Name: docker-202:2-11665420-pool
Pool Blocksize: 65.54 kB
Base Device Size: 107.4 GB
Backing Filesystem: ext4
Data file: /dev/loop2
Metadata file: /dev/loop3
Data Space Used: 67.06 GB
Data Space Total: 107.4 GB
Data Space Available: 40.31 GB
Metadata Space Used: 50.95 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.097 GB
Thin Pool Minimum Free Space: 10.74 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /home/node1/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
Metadata loop file: /home/node1/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.117-RHEL6 (2016-08-15)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host null bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.1.12-61.1.17.el6uek.x86_64
Operating System: Oracle Linux Server 6.8
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 16.49 GiB
Name: node1
ID: MSEU:AELX:FWZ3:VJUK:JHIS:6AVE:RILK:A3CP:3Q2I:JTO3:4JRZ:7KBY
Docker Root Dir: /home/node1/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://www-host.com:80
Https Proxy: http://www-host.com:80
No Proxy: localhost,docker-registry.local,/var/run/docker.sock
Registry: https://index.docker.io/v1/
Labels:
group=test1
Cluster Store: consul://myconsul.host.com:8500
Cluster Advertise: node1.host.com:2375
Insecure Registries:
127.0.0.0/8
@MUI-Fazy Can you check if dockerd still has the same inode. Compare stat $(which dockerd) after daemon reboot and after problem appears. Still seems that something in your system is replacing it.
@tonistiigi
I would validate and update on it. Thanks
@tonistiigi
You were right, Inode value is changed but I am not sure which is changing it.
Any alternative way that I could do to prevent it from happening ?
Any alternative way that I could do to prevent it from happening ?
I think that's hard for us to tell; you need to find out what other software is running on that host that installs/overwrites dockerd
Not sure if this is the same cause, but this might help someone.
I noticed this same error today (2017 Jan 18) on Ubuntu Xenial (with docker installed from the default repository via the package "docker.io"). Restarting the docker service (systemctl stop docker.service && systemctl start docker.service) fixed my problem.
Checking /var/log/dpkg.log showed that docker had been updated, and I hadn't noticed it when I did my usual system updating.
Thanks for that @CraigKelly , it worked for me, btw your command could be shortened as systemctl restart docker 馃憤
let me close this issue, looks like there's nothing actionable here
Can confirm that this problem occurred after a package upgrade in Arch linux. Fixed with manual restart of the docker service as per @CraigKelly / @denysvitali 's suggestions. It seems like package manager updates could be a primary cause to this error condition.
Potential solutions:
I suspect the latter option is not the right way to go - it could interfere with service delivery and management frameworks. I'm not sure abut the best practices for upgrading docker in production, but I certainly suspect running the OS's automatic update might not not be it. For comparison it looks docker-machine's 'upgrade' function will simply shut the daemon down prior to performing an upgrade - if a user is managing docker using that frontend then the service interruption/restart is communicated up front at time of upgrade.
I certainly think an improved error message is a reasonable step. But the simplest way to address this issue as it pertains of automatic upgrades may be to implement docker machine's behavior at the package manager level - restarting the docker service can happen during an automatic upgrade.
That may put the fix out the scope of this issue tracker/repository. I'll look into fixing the upgrade behavior in Arch. @MUI-Fazy, am not sure who is the maintainer in charge of the Oracle Linux package, but you may be able to push a fix their way.
Suggestions for best practices to cut docker over to the new binary during an upgrade are welcome. Mostly I'm not sure whether this could destabilize complex docker environments. Even though the current error state prevents the user from creating new machines after an upgrade, it does not interrupt the execution or state of any of the containers running during upgrade. Any thoughts there?
Give docker a "live patching" functionality allowing it to provide the option to cut over to a new binary when it detects this condition
Docker has a "live-restore" daemon option, which allows upgrading the daemon, while keeping the containers running. This is _not_ compatible with running "swarm mode" though.
If you run services, the "cut over" can be handled by swarm (but of course depends on _what_ exactly you're running);
docker node update --availability=drain (this will make swarm put all tasks on other nodes)docker node update --availability=active)Can confirm that this problem occurred after a package upgrade in Arch linux.
Unattended updates of a production server are always tricky, I don't think docker can really do something about that, it's more a "general" issue. You can "pin" packages to a specific version, so that they don't get updated unless wanted.
I just got this issue with 18.06.1-ce without having done any system updates whatsoever.
Stopping all containers, then doing a sudo systemctl restart docker, then bringing up the containers solved it, although I ran into an error with MySQL that wouldn't start anymore.
Most helpful comment
Not sure if this is the same cause, but this might help someone.
I noticed this same error today (2017 Jan 18) on Ubuntu Xenial (with docker installed from the default repository via the package "docker.io"). Restarting the docker service (
systemctl stop docker.service && systemctl start docker.service) fixed my problem.Checking
/var/log/dpkg.logshowed that docker had been updated, and I hadn't noticed it when I did my usual system updating.