RKE version:
rke v1.1.12, v1.2.3 and v1.2.4-rc9
Docker version: (docker version,docker info preferred)
docker-1.13.1-203.git0be3e21.el7
Operating system and kernel: (cat /etc/os-release, uname -r preferred)
CentOS 7.9 with Linux kernel 3.10.0-1160.11.1.el7.x86_64
SELinux enabled.
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
VMware VMs
* cluster.yml: *
default/stock rke generated cluster.yml for kubernetes v1.18.6 / v1.18.12 / v1.18.14.
Steps to Reproduce:
Results:
rke works OK if docker on the nodes is downgraded to CentOS 7.8 version (docker-1.13.1-162.git64e9980.el7) prior to running rke. This problem seems to happen only with the latest version of native docker rpm (docker-1.13.1-203.git0be3e21.el7) on the nodes when running rke.
docker on the nodes seems to work otherwise, until the moment rke starts kubelet - then docker daemon seems to crash somehow.
rke output when nodes are running "docker-1.13.1-203.git0be3e21":
INFO[0409] Starting container [kubelet] on host [10.10.10.12], try #1
INFO[0409] Starting container [kubelet] on host [10.10.10.13], try #1
INFO[0409] Starting container [kubelet] on host [10.10.10.11], try #1
DEBU[0459] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0459] Can't start Docker container [kubelet] on host [10.10.10.12]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0459] Starting container [kubelet] on host [10.10.10.12], try #2
DEBU[0459] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0459] Can't start Docker container [kubelet] on host [10.10.10.13]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0459] Starting container [kubelet] on host [10.10.10.13], try #2
DEBU[0459] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0459] Can't start Docker container [kubelet] on host [10.10.10.11]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0459] Starting container [kubelet] on host [10.10.10.11], try #2
DEBU[0510] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0510] Can't start Docker container [kubelet] on host [10.10.10.12]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0510] Starting container [kubelet] on host [10.10.10.12], try #3
DEBU[0510] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0510] Can't start Docker container [kubelet] on host [10.10.10.11]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0510] Starting container [kubelet] on host [10.10.10.11], try #3
DEBU[0510] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0510] Can't start Docker container [kubelet] on host [10.10.10.13]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0510] Starting container [kubelet] on host [10.10.10.13], try #3
DEBU[0510] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0510] Can't start Docker container [kubelet] on host [10.10.10.11]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0510] Starting container [kubelet] on host [10.10.10.11], try #3
DEBU[0510] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0561] Can't start Docker container [kubelet] on host [10.10.10.12]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
DEBU[0561] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0561] Can't start Docker container [kubelet] on host [10.10.10.13]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
DEBU[0561] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0561] Can't start Docker container [kubelet] on host [10.10.10.11]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
gz#15619
I've also run into this and have not been able to find a work around or fix.
I've also hit this and haven't found a work around.
Well, the workaround I mentioned above works, eg. downgrade to version docker-1.13.1-162.git64e9980.el7.
Would be nice to figure out though why rke/kubelet does not work with the latest el7.9 docker version..
can confirm downgrading docker version to 1.13.1-162 fixes this
I can also confirm.
Installing docker-1.13.1-162.git64e9980.el7_8 on RHEL 7.9 fixes this.
I'm not sure if this a Docker or RKE issue.
Was able to recreate on Digital Ocean:
yum update -y # to get to 7.9, looks like 7.6 is shipped
yum install docker -y
systemctl enable docker
reboot # for kernel update
Follow the docs for dockerroot group ownership.
rke up reveals the following:
INFO[0102] Starting container [kubelet] on host [68.183.116.42], try #1
WARN[0152] Can't start Docker container [kubelet] on host [68.183.116.42]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0152] Starting container [kubelet] on host [68.183.116.42], try #2
WARN[0203] Can't start Docker container [kubelet] on host [68.183.116.42]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0203] Starting container [kubelet] on host [68.183.116.42], try #3
WARN[0253] Can't start Docker container [kubelet] on host [68.183.116.42]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
FATA[0253] [workerPlane] Failed to bring up Worker Plane: [Failed to start [kubelet] container on host [68.183.116.42]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]
Docker commands also seem to go unresponsive.
To add to @bentastic27 comment. I followed the same process as well as disabling SELinux for the engine and I received the same result
It seems like the same kubelet / docker crash problem still happens with the latest native el7 docker version: docker-1.13.1-204.git0be3e21.el7.x86_64
I believe I reproduced the problem in Red Hat Bugzilla 1943700 and found it likely to be the issue described in Red Hat Bugzilla 1896883, so I closed the former in favor of the latter. You can follow along there. Thanks!
Most helpful comment
Well, the workaround I mentioned above works, eg. downgrade to version docker-1.13.1-162.git64e9980.el7.
Would be nice to figure out though why rke/kubelet does not work with the latest el7.9 docker version..