Moby: containerd: start container" error="oci runtime error: fork/exec /usr/bin/docker (deleted): no such file or directory: "

Created on 21 Dec 2016 · 13Comments · Source: moby/moby

Steps to reproduce the issue:

Install Docker Engine 1.11.2,
Pull Swarm Image and bring up swarm agent

docker run -d --name=swam-agent --restart=always swarm  join --advertise=<LocalHostName>:2375 consul://<Consul-hostname>:8500

Things would be fine for a day and perform

docker stop swam-agent
docker start swam-agent

Describe the results you received:

Error response from daemon: Container command '/swarm' not found or does not exist.
Error: failed to start containers: swam-agent

From the Docker logs:

level=error msg="containerd: start container" error="oci runtime error: fork/exec /usr/bin/docker (deleted): no such file or directory: "

Describe the results you expected:

Swarm container should be able to launch successfully

Additional information you deem important (e.g. issue happens only occasionally):

Initially installing and setting up is fine, later restarting or either stopping, removing, starting containers causing issue
Restarting docker service resolving

Output of docker version:

Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:58:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:58:52 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 6
 Running: 6
 Paused: 0
 Stopped: 0
Images: 22
Server Version: 1.11.2
Storage Driver: devicemapper
 Pool Name: docker-202:2-4456533-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 76.6 GB
 Data Space Total: 107.4 GB
 Data Space Available: 30.78 GB
 Metadata Space Used: 59.82 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.088 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /dev/home/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /dev/home/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.117-RHEL6 (2016-08-15)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host overlay
Kernel Version: 4.1.12-61.1.22.el6uek.x86_64
Operating System: <unknown>
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 16.49 GiB
Name: bus00bbd
ID: YBAG:V4YC:JTGP:54K4:ZGPB:RCTP:XCAN:JQZA:THU6:DTSD:SVZ7:HUDD
Docker Root Dir: /dev/home/docker
Debug mode (client): false
Debug mode (server): false
Http Proxy: http://www-host50.com:80
Https Proxy: http://www-host50.com:80
No Proxy: localhost,docker-registry.local,/var/run/docker.sock
Registry: https://index.docker.io/v1/
Labels:
 pool=south
Cluster store: consul://consul.host.com:8500
Cluster advertise: myhost1.host.com:2375

Additional environment details (AWS, VirtualBox, physical, etc.):
OS: Oracle Enterprise Linux 6

areruntime versio1.11

Source

MUI-Pop

👍4

Most helpful comment

Not sure if this is the same cause, but this might help someone.

I noticed this same error today (2017 Jan 18) on Ubuntu Xenial (with docker installed from the default repository via the package "docker.io"). Restarting the docker service (systemctl stop docker.service && systemctl start docker.service) fixed my problem.

Checking /var/log/dpkg.log showed that docker had been updated, and I hadn't noticed it when I did my usual system updating.

CraigKelly on 18 Jan 2017

👍90 😄6 🎉4 ❤2

All 13 comments

Does /usr/bin/docker still exist in your system? Are you sure you didn't replace the binary while docker kept running.

tonistiigi on 21 Dec 2016

Looks like a continuation of the discussions in https://github.com/docker/docker/issues/23411#issuecomment-267813370, and https://github.com/docker/docker/issues/25381#issuecomment-268239370

thaJeztah on 22 Dec 2016

So here is the updated of trying 1.12.3, seeing similar error. The reproducing steps are same.

ERROR: for swarm-node  Cannot start service swarm-node: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:334: running prestart hook 0 caused \\\"fork/exec /usr/bin/dockerd (deleted): no such file or directory\\\"\"\n"

Docker Info:

Containers: 5
 Running: 3
 Paused: 0
 Stopped: 2
Images: 15
Server Version: 1.12.3
Storage Driver: devicemapper
 Pool Name: docker-202:2-11665420-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: ext4
 Data file: /dev/loop2
 Metadata file: /dev/loop3
 Data Space Used: 67.06 GB
 Data Space Total: 107.4 GB
 Data Space Available: 40.31 GB
 Metadata Space Used: 50.95 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.097 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /home/node1/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /home/node1/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.117-RHEL6 (2016-08-15)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host null bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.1.12-61.1.17.el6uek.x86_64
Operating System: Oracle Linux Server 6.8
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 16.49 GiB
Name: node1
ID: MSEU:AELX:FWZ3:VJUK:JHIS:6AVE:RILK:A3CP:3Q2I:JTO3:4JRZ:7KBY
Docker Root Dir: /home/node1/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://www-host.com:80
Https Proxy: http://www-host.com:80
No Proxy: localhost,docker-registry.local,/var/run/docker.sock
Registry: https://index.docker.io/v1/
Labels:
 group=test1
Cluster Store: consul://myconsul.host.com:8500
Cluster Advertise: node1.host.com:2375
Insecure Registries:
 127.0.0.0/8

MUI-Pop on 28 Dec 2016

@MUI-Fazy Can you check if dockerd still has the same inode. Compare stat $(which dockerd) after daemon reboot and after problem appears. Still seems that something in your system is replacing it.

tonistiigi on 28 Dec 2016

👍1

@tonistiigi
I would validate and update on it. Thanks

MUI-Pop on 30 Dec 2016

@tonistiigi
You were right, Inode value is changed but I am not sure which is changing it.

Any alternative way that I could do to prevent it from happening ?

MUI-Pop on 3 Jan 2017

Any alternative way that I could do to prevent it from happening ?

I think that's hard for us to tell; you need to find out what other software is running on that host that installs/overwrites dockerd

thaJeztah on 3 Jan 2017

Not sure if this is the same cause, but this might help someone.

Checking /var/log/dpkg.log showed that docker had been updated, and I hadn't noticed it when I did my usual system updating.

CraigKelly on 18 Jan 2017

👍90 😄6 🎉4 ❤2

Thanks for that @CraigKelly , it worked for me, btw your command could be shortened as systemctl restart docker 👍

denysvitali on 4 Mar 2017

👍23 ❤10 😄4

let me close this issue, looks like there's nothing actionable here

thaJeztah on 4 Mar 2017

Can confirm that this problem occurred after a package upgrade in Arch linux. Fixed with manual restart of the docker service as per @CraigKelly / @denysvitali 's suggestions. It seems like package manager updates could be a primary cause to this error condition.

Potential solutions:

Hook a better error message into the present error - detect presence of new docker binaries and suggest a restart of the service
Give docker a "live patching" functionality allowing it to provide the option to cut over to a new binary when it detects this condition

I suspect the latter option is not the right way to go - it could interfere with service delivery and management frameworks. I'm not sure abut the best practices for upgrading docker in production, but I certainly suspect running the OS's automatic update might not not be it. For comparison it looks docker-machine's 'upgrade' function will simply shut the daemon down prior to performing an upgrade - if a user is managing docker using that frontend then the service interruption/restart is communicated up front at time of upgrade.

I certainly think an improved error message is a reasonable step. But the simplest way to address this issue as it pertains of automatic upgrades may be to implement docker machine's behavior at the package manager level - restarting the docker service can happen during an automatic upgrade.

That may put the fix out the scope of this issue tracker/repository. I'll look into fixing the upgrade behavior in Arch. @MUI-Fazy, am not sure who is the maintainer in charge of the Oracle Linux package, but you may be able to push a fix their way.

Suggestions for best practices to cut docker over to the new binary during an upgrade are welcome. Mostly I'm not sure whether this could destabilize complex docker environments. Even though the current error state prevents the user from creating new machines after an upgrade, it does not interrupt the execution or state of any of the containers running during upgrade. Any thoughts there?

maxsu on 7 Apr 2017

Give docker a "live patching" functionality allowing it to provide the option to cut over to a new binary when it detects this condition

Docker has a "live-restore" daemon option, which allows upgrading the daemon, while keeping the containers running. This is _not_ compatible with running "swarm mode" though.

If you run services, the "cut over" can be handled by swarm (but of course depends on _what_ exactly you're running);

put the daemon you'll be upgrading in docker node update --availability=drain (this will make swarm put all tasks on other nodes)
upgrade the daemon
make the daemon available again (docker node update --availability=active)
you can upgrade a daemon, and other nodes in the

Can confirm that this problem occurred after a package upgrade in Arch linux.

Unattended updates of a production server are always tricky, I don't think docker can really do something about that, it's more a "general" issue. You can "pin" packages to a specific version, so that they don't get updated unless wanted.

thaJeztah on 7 Apr 2017

👍2

I just got this issue with 18.06.1-ce without having done any system updates whatsoever.

Stopping all containers, then doing a sudo systemctl restart docker, then bringing up the containers solved it, although I ran into an error with MySQL that wouldn't start anymore.