I created a Kubernetes cluster with Kops on AWS and it was working as expected.
Past week, one day the when I tried to do kubectl something it failed with Unable to connect to the server: EOF.
At the time, it seemed that the problem were the disks (we had use a little of the iops burst), but anyway, after some time, it was still not working. So I restarted the docker service on the master, and it was back.
Today, something similar happened. All the sudden the same error is back, but no IOPS burst was used:



The master machine itself also seemed ok to me:

I checked the logs, and I was able to see this:
/var/lib/docker/containers/6734f01d588781260fa3c01b0643311db41ec13009345d245e4ff46bfe8a1b4b]
Mar 26 06:20:29 ip-172-16-32-224 kubelet[2227]: I0326 06:19:08.132889 2227 fsHandler.go:131] du and find on following dirs took 1m6.97018256s: [ /var/lib/docker/containers/5ff458d3fddf4404a204147e142eac76c07535e00d32987a4a2b64bc7287b562]
Mar 26 06:20:34 ip-172-16-32-224 kubelet[2227]: I0326 06:19:06.493021 2227 fsHandler.go:131] du and find on following dirs took 1m39.710066636s: [ /var/lib/docker/containers/c0e9e0b6cbf455654e672a0471d595374de0cbe8de2ce370cceaae9feae17c2a]
Mar 26 06:20:39 ip-172-16-32-224 kubelet[2227]: I0326 06:19:06.532812 2227 fsHandler.go:131] du and find on following dirs took 46.069627684s: [ /var/lib/docker/containers/07c5cf4f012ed4777a5ddc082e74ee5d6be3cd3fea94454841b580022bad6348]
Mar 26 06:21:06 ip-172-16-32-224 kubelet[2227]: W0326 06:17:31.503046 2227 status_manager.go:451] Failed to update status for pod "_()": Get http://127.0.0.1:8080/api/v1/namespaces/kube-system/pods/kube-apiserver-ip-172-16-32-224.ec2.internal: dial tcp 127.0.0.1:8080: getsockopt: connection refused
Mar 26 06:21:15 ip-172-16-32-224 kubelet[2227]: E0326 06:19:45.143103 2227 reflector.go:188] pkg/kubelet/config/apiserver.go:44: Failed to list *api.Pod: Get http://127.0.0.1:8080/api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-16-32-224.ec2.internal&resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
Mar 26 07:02:11 ip-172-16-32-224 systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!
Mar 26 07:02:13 ip-172-16-32-224 systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
Mar 26 07:02:14 ip-172-16-32-224 systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Mar 26 07:03:22 ip-172-16-32-224 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Mar 26 07:03:27 ip-172-16-32-224 systemd[1]: Stopping Kubernetes Kubelet Server...
Mar 26 07:03:27 ip-172-16-32-224 systemd[1]: Starting Kubernetes Kubelet Server...
Mar 26 07:03:27 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 07:04:22 ip-172-16-32-224 kubelet[27498]: Flag --api-servers has been deprecated, Use --kubeconfig instead. Will be removed in a future version.
Mar 26 07:04:24 ip-172-16-32-224 kubelet[27498]: Flag --babysit-daemons has been deprecated, Will be removed in a future version.
Mar 26 07:04:24 ip-172-16-32-224 kubelet[27498]: Flag --config has been deprecated, Use --pod-manifest-path instead. Will be removed in a future version.
Mar 26 07:04:24 ip-172-16-32-224 kubelet[27498]: I0326 07:04:22.685995 27498 feature_gate.go:181] feature gates: map[]
Mar 26 07:04:24 ip-172-16-32-224 kubelet[27498]: I0326 07:04:24.206273 27498 aws.go:746] Building AWS cloudprovider
Mar 26 07:04:24 ip-172-16-32-224 kubelet[27498]: I0326 07:04:24.206341 27498 aws.go:700] Zone not specified in configuration file; querying AWS metadata service
Mar 26 07:04:32 ip-172-16-32-224 kubelet[27498]: I0326 07:04:32.675473 27498 aws.go:832] AWS cloud filtering on tags: map[KubernetesCluster:dev.contaazul.local]
Mar 26 07:04:33 ip-172-16-32-224 kubelet[27498]: I0326 07:04:32.935859 27498 server.go:369] Successfully initialized cloud provider: "aws" from the config file: ""
Mar 26 07:04:37 ip-172-16-32-224 kubelet[27498]: I0326 07:04:37.205748 27498 docker.go:356] Connecting to docker on unix:///var/run/docker.sock
Mar 26 07:04:37 ip-172-16-32-224 kubelet[27498]: I0326 07:04:37.425953 27498 docker.go:376] Start docker client with request timeout=2m0s
Mar 26 07:04:47 ip-172-16-32-224 kubelet[27498]: I0326 07:04:47.616207 27498 iptables.go:176] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
Mar 26 07:04:49 ip-172-16-32-224 kubelet[27498]: I0326 07:04:47.926977 27498 iptables.go:176] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
Mar 26 07:04:49 ip-172-16-32-224 kubelet[27498]: I0326 07:04:48.645576 27498 server.go:511] cloud provider determined current node name to be ip-172-16-32-224.ec2.internal
Mar 26 07:04:49 ip-172-16-32-224 kubelet[27498]: I0326 07:04:49.675486 27498 manager.go:143] cAdvisor running in container: "/"
Mar 26 09:17:58 ip-172-16-32-224 systemd[1]: protokube.service holdoff time over, scheduling restart.
Mar 26 09:17:58 ip-172-16-32-224 systemd[1]: Stopping Kubernetes Protokube Service...
Mar 26 09:17:58 ip-172-16-32-224 systemd[1]: Starting Kubernetes Protokube Service...
Mar 26 09:17:58 ip-172-16-32-224 systemd[1]: Started Kubernetes Protokube Service.
Mar 26 09:18:56 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:19:57 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:20:57 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:21:58 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:22:58 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:23:59 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:24:59 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:26:01 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:27:01 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
The Started Kubernetes Kubelet Server. line is repeats every minute for hours (which after looking into the issues here seems to be normal).
This Could not connect to D-Bus in the other hand seems odd. Is it some kind of bug? How can I fix it? Any other debug tips?
Thanks!
At our company we are having the same problem. After some looking around it seems to be a problem with docker version 1.12.3. In the docker release notes for 1.12.4 are some issues fixed with deadlocks.
Looking at this issue: https://github.com/kubernetes/kops/issues/1962
It seems like upgrading the docker version via kops to 1.12.6 is not possible at this moment. So we are waiting for a new kubernetes release and kops release to change the docker version. But at this moment you sometimes have to restart the docker daemon to keep things working which also restarts all the containers on that node.
@casrlos0 beta for 1.6 is out, can you test?
@chrislovecnm I think it was on alpha2 already, wasn't it?
anyways, the problem never occurred again...
closing :)
Most helpful comment
At our company we are having the same problem. After some looking around it seems to be a problem with docker version 1.12.3. In the docker release notes for 1.12.4 are some issues fixed with deadlocks.
Looking at this issue: https://github.com/kubernetes/kops/issues/1962
It seems like upgrading the docker version via kops to 1.12.6 is not possible at this moment. So we are waiting for a new kubernetes release and kops release to change the docker version. But at this moment you sometimes have to restart the docker daemon to keep things working which also restarts all the containers on that node.