Kops: Unable to connect to server, fixed only after a docker restart on server

Created on 27 Mar 2017  路  5Comments  路  Source: kubernetes/kops

I created a Kubernetes cluster with Kops on AWS and it was working as expected.

Past week, one day the when I tried to do kubectl something it failed with Unable to connect to the server: EOF.

At the time, it seemed that the problem were the disks (we had use a little of the iops burst), but anyway, after some time, it was still not working. So I restarted the docker service on the master, and it was back.

Today, something similar happened. All the sudden the same error is back, but no IOPS burst was used:

image
image
image

The master machine itself also seemed ok to me:

image

I checked the logs, and I was able to see this:

/var/log/daemon.log

/var/lib/docker/containers/6734f01d588781260fa3c01b0643311db41ec13009345d245e4ff46bfe8a1b4b]
Mar 26 06:20:29 ip-172-16-32-224 kubelet[2227]: I0326 06:19:08.132889    2227 fsHandler.go:131] du and find on following dirs took 1m6.97018256s: [ /var/lib/docker/containers/5ff458d3fddf4404a204147e142eac76c07535e00d32987a4a2b64bc7287b562]
Mar 26 06:20:34 ip-172-16-32-224 kubelet[2227]: I0326 06:19:06.493021    2227 fsHandler.go:131] du and find on following dirs took 1m39.710066636s: [ /var/lib/docker/containers/c0e9e0b6cbf455654e672a0471d595374de0cbe8de2ce370cceaae9feae17c2a]
Mar 26 06:20:39 ip-172-16-32-224 kubelet[2227]: I0326 06:19:06.532812    2227 fsHandler.go:131] du and find on following dirs took 46.069627684s: [ /var/lib/docker/containers/07c5cf4f012ed4777a5ddc082e74ee5d6be3cd3fea94454841b580022bad6348]
Mar 26 06:21:06 ip-172-16-32-224 kubelet[2227]: W0326 06:17:31.503046    2227 status_manager.go:451] Failed to update status for pod "_()": Get http://127.0.0.1:8080/api/v1/namespaces/kube-system/pods/kube-apiserver-ip-172-16-32-224.ec2.internal: dial tcp 127.0.0.1:8080: getsockopt: connection refused
Mar 26 06:21:15 ip-172-16-32-224 kubelet[2227]: E0326 06:19:45.143103    2227 reflector.go:188] pkg/kubelet/config/apiserver.go:44: Failed to list *api.Pod: Get http://127.0.0.1:8080/api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-16-32-224.ec2.internal&resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
Mar 26 07:02:11 ip-172-16-32-224 systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!
Mar 26 07:02:13 ip-172-16-32-224 systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
Mar 26 07:02:14 ip-172-16-32-224 systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Mar 26 07:03:22 ip-172-16-32-224 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Mar 26 07:03:27 ip-172-16-32-224 systemd[1]: Stopping Kubernetes Kubelet Server...
Mar 26 07:03:27 ip-172-16-32-224 systemd[1]: Starting Kubernetes Kubelet Server...
Mar 26 07:03:27 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 07:04:22 ip-172-16-32-224 kubelet[27498]: Flag --api-servers has been deprecated, Use --kubeconfig instead. Will be removed in a future version.
Mar 26 07:04:24 ip-172-16-32-224 kubelet[27498]: Flag --babysit-daemons has been deprecated, Will be removed in a future version.
Mar 26 07:04:24 ip-172-16-32-224 kubelet[27498]: Flag --config has been deprecated, Use --pod-manifest-path instead. Will be removed in a future version.
Mar 26 07:04:24 ip-172-16-32-224 kubelet[27498]: I0326 07:04:22.685995   27498 feature_gate.go:181] feature gates: map[]
Mar 26 07:04:24 ip-172-16-32-224 kubelet[27498]: I0326 07:04:24.206273   27498 aws.go:746] Building AWS cloudprovider
Mar 26 07:04:24 ip-172-16-32-224 kubelet[27498]: I0326 07:04:24.206341   27498 aws.go:700] Zone not specified in configuration file; querying AWS metadata service
Mar 26 07:04:32 ip-172-16-32-224 kubelet[27498]: I0326 07:04:32.675473   27498 aws.go:832] AWS cloud filtering on tags: map[KubernetesCluster:dev.contaazul.local]
Mar 26 07:04:33 ip-172-16-32-224 kubelet[27498]: I0326 07:04:32.935859   27498 server.go:369] Successfully initialized cloud provider: "aws" from the config file: ""
Mar 26 07:04:37 ip-172-16-32-224 kubelet[27498]: I0326 07:04:37.205748   27498 docker.go:356] Connecting to docker on unix:///var/run/docker.sock
Mar 26 07:04:37 ip-172-16-32-224 kubelet[27498]: I0326 07:04:37.425953   27498 docker.go:376] Start docker client with request timeout=2m0s
Mar 26 07:04:47 ip-172-16-32-224 kubelet[27498]: I0326 07:04:47.616207   27498 iptables.go:176] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
Mar 26 07:04:49 ip-172-16-32-224 kubelet[27498]: I0326 07:04:47.926977   27498 iptables.go:176] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
Mar 26 07:04:49 ip-172-16-32-224 kubelet[27498]: I0326 07:04:48.645576   27498 server.go:511] cloud provider determined current node name to be ip-172-16-32-224.ec2.internal
Mar 26 07:04:49 ip-172-16-32-224 kubelet[27498]: I0326 07:04:49.675486   27498 manager.go:143] cAdvisor running in container: "/"
Mar 26 09:17:58 ip-172-16-32-224 systemd[1]: protokube.service holdoff time over, scheduling restart.
Mar 26 09:17:58 ip-172-16-32-224 systemd[1]: Stopping Kubernetes Protokube Service...
Mar 26 09:17:58 ip-172-16-32-224 systemd[1]: Starting Kubernetes Protokube Service...
Mar 26 09:17:58 ip-172-16-32-224 systemd[1]: Started Kubernetes Protokube Service.
Mar 26 09:18:56 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:19:57 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:20:57 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:21:58 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:22:58 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:23:59 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:24:59 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:26:01 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.
Mar 26 09:27:01 ip-172-16-32-224 systemd[1]: Started Kubernetes Kubelet Server.

The Started Kubernetes Kubelet Server. line is repeats every minute for hours (which after looking into the issues here seems to be normal).

This Could not connect to D-Bus in the other hand seems odd. Is it some kind of bug? How can I fix it? Any other debug tips?

Thanks!

Most helpful comment

At our company we are having the same problem. After some looking around it seems to be a problem with docker version 1.12.3. In the docker release notes for 1.12.4 are some issues fixed with deadlocks.
Looking at this issue: https://github.com/kubernetes/kops/issues/1962
It seems like upgrading the docker version via kops to 1.12.6 is not possible at this moment. So we are waiting for a new kubernetes release and kops release to change the docker version. But at this moment you sometimes have to restart the docker daemon to keep things working which also restarts all the containers on that node.

All 5 comments

At our company we are having the same problem. After some looking around it seems to be a problem with docker version 1.12.3. In the docker release notes for 1.12.4 are some issues fixed with deadlocks.
Looking at this issue: https://github.com/kubernetes/kops/issues/1962
It seems like upgrading the docker version via kops to 1.12.6 is not possible at this moment. So we are waiting for a new kubernetes release and kops release to change the docker version. But at this moment you sometimes have to restart the docker daemon to keep things working which also restarts all the containers on that node.

2283 may have fixed that. Waiting for a new build of kops to test it.

@casrlos0 beta for 1.6 is out, can you test?

@chrislovecnm I think it was on alpha2 already, wasn't it?

anyways, the problem never occurred again...

closing :)

Was this page helpful?
0 / 5 - 0 ratings