What happened:
We started using KinD in our prow integration tests for Istio, and occasionally are seeing errors like
Error from server: error when creating "/logs/artifacts/galley-test-a99cf400cb3343eeac7/_suite_context/istio-deployment-577548603/istio-config-only.yaml": etcdserver: request timed out
What you expected to happen:
etcd doesn't timeout.
How to reproduce it (as minimally and precisely as possible):
This is the hard part, I am not sure how to reproduce this consistently. I do, however, have a bunch of logs from where it occurred - attached below.
I realize this is probably not a very actionable bug report, my main question is what info do we need to collect to root cause this?
Environment:
kind version): v0.3.0kubectl version): I think 1.14? That is what comes with 0.3.0 right?docker info): 18.06.1-ce/etc/os-release): Ubuntu 16.04Logs:
I would look at resource usage on the host, this is probably a CPU exhaustion or disk I/O issue?
Around the time this happened the node was at ~60% CPU utilization and disk IO was peaking at 108MB/s and 648 operations/s (write). The disk claims to support 122.8mb/s sustained throughput (256gb GCP SSD).
So it does seem like it was potentially hitting the limit?
That might be it, we had some problems with I/O throttling breaking various prowjobs (kind or otherwise) on prow.k8s.io due to heavy throttling because of highly exceeding the limits, I believe prow.k8s.io switched to appropriately sized pd-ssd on the nodes to avoid throttling. (IIRC 250gb pd-ssd?)
There should be a graph with I/O throttling in the VM details IIRC, though it averages so you can miss it if you're looking at broad timespans.
You are right, heavily throttled, spikes at 100MB/s throttled. We just switched to 256gb SSDs (to match yours) because of semi-related issues where our docker instances were getting killed. The problem is we build a bunch of docker images at the start of all our tests which probably uses tons of disk. So I guess if we can optimize that to do less work this problem may resolve itself
That may help, another option is to try to avoid disk heavy jobs from co-scheduling.
Unfortunately Kubernetes / Prow are not terribly helpful there. Pod anti-affinity (or at least in the past a cheaper option was identical hostPorts on the pods, eg hostPort: 9999 on all pods that shouldn't be co-scheduled), could help, but currently disk I/O is not really something that is accounted for :/
As far as I know for disk heavy workloads the standard pattern is to have dedicated nodes with taints and labels for those specific workloads and schedule a limited number of them (say one per node) to those nodes, with other workloads on other nodes. That may not be the most practical for Prow.. :disappointed:
We are getting this one a lot too, I suspect the same problem. Looks like kubeadm is health checking and timing out
+ kind create cluster --name=e2e-suite --loglevel debug
time="23:24:28" level=debug msg="Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\\t{{.Label \"io.k8s.sigs.kind.cluster\"}}]"
Creating cluster "e2e-suite" ...
โข Ensuring node image (kindest/node:v1.14.2) ๐ผ ...
time="23:24:28" level=debug msg="Running: /usr/bin/docker [docker inspect --type=image kindest/node:v1.14.2]"
time="23:24:28" level=info msg="Pulling image: kindest/node:v1.14.2 ..."
time="23:24:28" level=debug msg="Running: /usr/bin/docker [docker pull kindest/node:v1.14.2]"
โ Ensuring node image (kindest/node:v1.14.2) ๐ผ
โข Preparing nodes ๐ฆ ...
time="23:24:54" level=debug msg="Running: /usr/bin/docker [docker info --format '{{json .SecurityOptions}}']"
time="23:24:54" level=debug msg="Running: /usr/bin/docker [docker run -d -t --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname e2e-suite-control-plane --name e2e-suite-control-plane --label io.k8s.sigs.kind.cluster=e2e-suite --label io.k8s.sigs.kind.role=control-plane --expose 41951 -p 127.0.0.1:41951:6443 kindest/node:v1.14.2@sha256:33539d830a6cf20e3e0a75d0c46a4e94730d78c7375435e6b49833d81448c319]"
โ Preparing nodes ๐ฆ
time="23:25:32" level=debug msg="Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\\t{{.Label \"io.k8s.sigs.kind.cluster\"}} --filter label=io.k8s.sigs.kind.cluster=e2e-suite]"
time="23:25:32" level=debug msg="Running: /usr/bin/docker [docker inspect -f {{index .Config.Labels \"io.k8s.sigs.kind.role\"}} e2e-suite-control-plane]"
โข Creating kubeadm config ๐ ...
time="23:25:32" level=debug msg="Running: /usr/bin/docker [docker exec --privileged e2e-suite-control-plane cat /kind/version]"
time="23:25:32" level=debug msg="Using kubeadm config:\napiServer:\n certSANs:\n - localhost\napiVersion: kubeadm.k8s.io/v1beta1\nclusterName: e2e-suite\ncontrollerManager:\n extraArgs:\n enable-hostpath-provisioner: \"true\"\nkind: ClusterConfiguration\nkubernetesVersion: v1.14.2\nnetworking:\n podSubnet: 10.244.0.0/16\n---\napiVersion: kubeadm.k8s.io/v1beta1\nbootstrapTokens:\n- token: abcdef.0123456789abcdef\nkind: InitConfiguration\nlocalAPIEndpoint:\n bindPort: 6443\nnodeRegistration:\n criSocket: /run/containerd/containerd.sock\n---\napiVersion: kubeadm.k8s.io/v1beta1\nkind: JoinConfiguration\nnodeRegistration:\n criSocket: /run/containerd/containerd.sock\n---\napiVersion: kubelet.config.k8s.io/v1beta1\nevictionHard:\n imagefs.available: 0%\n nodefs.available: 0%\n nodefs.inodesFree: 0%\nimageGCHighThresholdPercent: 100\nkind: KubeletConfiguration\n---\napiVersion: kubeproxy.config.k8s.io/v1alpha1\nkind: KubeProxyConfiguration\n"
time="23:25:32" level=debug msg="Running: /usr/bin/docker [docker exec --privileged e2e-suite-control-plane mkdir -p /kind]"
time="23:25:33" level=debug msg="Running: /usr/bin/docker [docker exec --privileged -i e2e-suite-control-plane cp /dev/stdin /kind/kubeadm.conf]"
โ Creating kubeadm config ๐
โข Starting control-plane ๐น๏ธ ...
time="23:25:33" level=debug msg="Running: /usr/bin/docker [docker exec --privileged e2e-suite-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6]"
time="23:27:34" level=debug msg="I0718 23:25:34.203518 101 initconfiguration.go:186] loading configuration from \"/kind/kubeadm.conf\"\n[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta1, Kind=JoinConfiguration\nI0718 23:25:34.207689 101 interface.go:384] Looking for default routes with IPv4 addresses\nI0718 23:25:34.207708 101 interface.go:389] Default route transits interface \"eth0\"\nI0718 23:25:34.207838 101 interface.go:196] Interface eth0 is up\nI0718 23:25:34.207913 101 interface.go:244] Interface \"eth0\" has 1 addresses :[172.17.0.2/16].\nI0718 23:25:34.207947 101 interface.go:211] Checking addr 172.17.0.2/16.\nI0718 23:25:34.207957 101 interface.go:218] IP found 172.17.0.2\nI0718 23:25:34.207970 101 interface.go:250] Found valid IPv4 address 172.17.0.2 for interface \"eth0\".\nI0718 23:25:34.207978 101 interface.go:395] Found active IP 172.17.0.2 \nI0718 23:25:34.208228 101 feature_gate.go:226] feature gates: &{map[]}\n[init] Using Kubernetes version: v1.14.2\n[preflight] Running pre-flight checks\nI0718 23:25:34.208507 101 checks.go:581] validating Kubernetes and kubeadm version\nI0718 23:25:34.208532 101 checks.go:172] validating if the firewall is enabled and active\nI0718 23:25:34.232891 101 checks.go:209] validating availability of port 6443\nI0718 23:25:34.233091 101 checks.go:209] validating availability of port 10251\nI0718 23:25:34.233117 101 checks.go:209] validating availability of port 10252\nI0718 23:25:34.233145 101 checks.go:292] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml\nI0718 23:25:34.233157 101 checks.go:292] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml\nI0718 23:25:34.233164 101 checks.go:292] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml\nI0718 23:25:34.233171 101 checks.go:292] validating the existence of file /etc/kubernetes/manifests/etcd.yaml\nI0718 23:25:34.233180 101 checks.go:439] validating if the connectivity type is via proxy or direct\nI0718 23:25:34.233231 101 checks.go:475] validating http connectivity to first IP address in the CIDR\nI0718 23:25:34.233248 101 checks.go:475] validating http connectivity to first IP address in the CIDR\nI0718 23:25:34.233258 101 checks.go:105] validating the container runtime\nI0718 23:25:34.288326 101 checks.go:382] validating the presence of executable crictl\nI0718 23:25:34.288396 101 checks.go:341] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables\n\t[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist\nI0718 23:25:34.288473 101 checks.go:341] validating the contents of file /proc/sys/net/ipv4/ip_forward\nI0718 23:25:34.288521 101 checks.go:653] validating whether swap is enabled or not\nI0718 23:25:34.288574 101 checks.go:382] validating the presence of executable ip\nI0718 23:25:34.288605 101 checks.go:382] validating the presence of executable iptables\nI0718 23:25:34.288733 101 checks.go:382] validating the presence of executable mount\nI0718 23:25:34.288764 101 checks.go:382] validating the presence of executable nsenter\nI0718 23:25:34.288857 101 checks.go:382] validating the presence of executable ebtables\nI0718 23:25:34.288922 101 checks.go:382] validating the presence of executable ethtool\nI0718 23:25:34.288959 101 checks.go:382] validating the presence of executable socat\nI0718 23:25:34.289022 101 checks.go:382] validating the presence of executable tc\nI0718 23:25:34.289058 101 checks.go:382] validating the presence of executable touch\nI0718 23:25:34.289111 101 checks.go:524] running all checks\nI0718 23:25:34.298118 101 checks.go:412] checking whether the given node name is reachable using net.LookupHost\nI0718 23:25:34.298357 101 checks.go:622] validating kubelet version\nI0718 23:25:34.478800 101 checks.go:131] validating if the service is enabled and active\nI0718 23:25:34.497175 101 checks.go:209] validating availability of port 10250\nI0718 23:25:34.497266 101 checks.go:209] validating availability of port 2379\nI0718 23:25:34.497290 101 checks.go:209] validating availability of port 2380\nI0718 23:25:34.497315 101 checks.go:254] validating the existence and emptiness of directory /var/lib/etcd\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\nI0718 23:25:34.510225 101 checks.go:842] image exists: k8s.gcr.io/kube-apiserver:v1.14.2\nI0718 23:25:34.583347 101 checks.go:842] image exists: k8s.gcr.io/kube-controller-manager:v1.14.2\nI0718 23:25:34.595777 101 checks.go:842] image exists: k8s.gcr.io/kube-scheduler:v1.14.2\nI0718 23:25:34.609813 101 checks.go:842] image exists: k8s.gcr.io/kube-proxy:v1.14.2\nI0718 23:25:34.627592 101 checks.go:842] image exists: k8s.gcr.io/pause:3.1\nI0718 23:25:34.642628 101 checks.go:842] image exists: k8s.gcr.io/etcd:3.3.10\nI0718 23:25:34.659836 101 checks.go:842] image exists: k8s.gcr.io/coredns:1.3.1\nI0718 23:25:34.659876 101 kubelet.go:61] Stopping the kubelet\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\nI0718 23:25:34.676713 101 kubelet.go:79] Starting the kubelet\n[kubelet-start] Activating the kubelet service\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\nI0718 23:25:34.872872 101 certs.go:110] creating a new certificate authority for front-proxy-ca\n[certs] Generating \"front-proxy-ca\" certificate and key\n[certs] Generating \"front-proxy-client\" certificate and key\nI0718 23:25:35.492604 101 certs.go:110] creating a new certificate authority for etcd-ca\n[certs] Generating \"etcd/ca\" certificate and key\n[certs] Generating \"etcd/server\" certificate and key\n[certs] etcd/server serving cert is signed for DNS names [e2e-suite-control-plane localhost] and IPs [172.17.0.2 127.0.0.1 ::1]\n[certs] Generating \"etcd/healthcheck-client\" certificate and key\n[certs] Generating \"apiserver-etcd-client\" certificate and key\n[certs] Generating \"etcd/peer\" certificate and key\n[certs] etcd/peer serving cert is signed for DNS names [e2e-suite-control-plane localhost] and IPs [172.17.0.2 127.0.0.1 ::1]\nI0718 23:25:37.308288 101 certs.go:110] creating a new certificate authority for ca\n[certs] Generating \"ca\" certificate and key\n[certs] Generating \"apiserver\" certificate and key\n[certs] apiserver serving cert is signed for DNS names [e2e-suite-control-plane kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost] and IPs [10.96.0.1 172.17.0.2]\nI0718 23:25:37.839914 101 certs.go:69] creating a new public/private key files for signing service account users\n[certs] Generating \"apiserver-kubelet-client\" certificate and key\n[certs] Generating \"sa\" key and public key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\nI0718 23:25:37.927876 101 kubeconfig.go:94] creating kubeconfig file for admin.conf\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\nI0718 23:25:38.125191 101 kubeconfig.go:94] creating kubeconfig file for kubelet.conf\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\nI0718 23:25:38.445718 101 kubeconfig.go:94] creating kubeconfig file for controller-manager.conf\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\nI0718 23:25:38.635879 101 kubeconfig.go:94] creating kubeconfig file for scheduler.conf\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\nI0718 23:25:38.905193 101 manifests.go:114] [control-plane] getting StaticPodSpecs\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\nI0718 23:25:38.911844 101 manifests.go:130] [control-plane] wrote static Pod manifest for component \"kube-apiserver\" to \"/etc/kubernetes/manifests/kube-apiserver.yaml\"\nI0718 23:25:38.911874 101 manifests.go:114] [control-plane] getting StaticPodSpecs\nI0718 23:25:38.914252 101 manifests.go:130] [control-plane] wrote static Pod manifest for component \"kube-controller-manager\" to \"/etc/kubernetes/manifests/kube-controller-manager.yaml\"\nI0718 23:25:38.915626 101 manifests.go:114] [control-plane] getting StaticPodSpecs\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"\nI0718 23:25:38.916525 101 manifests.go:130] [control-plane] wrote static Pod manifest for component \"kube-scheduler\" to \"/etc/kubernetes/manifests/kube-scheduler.yaml\"\nI0718 23:25:38.917370 101 local.go:60] [etcd] wrote Static Pod manifest for a local etcd member to \"/etc/kubernetes/manifests/etcd.yaml\"\nI0718 23:25:38.917403 101 waitcontrolplane.go:80] [wait-control-plane] Waiting for the API server to be healthy\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\nI0718 23:25:38.938873 101 loader.go:359] Config loaded from file /etc/kubernetes/admin.conf\nI0718 23:25:38.940563 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:39.442098 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:39.941195 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:40.441748 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:40.941833 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:41.441258 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:41.941248 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:42.441143 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:42.941231 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:43.441190 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:43.941150 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:44.441200 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:44.943183 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:45.441798 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:45.941973 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:46.441309 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:46.941184 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:47.441169 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:47.941224 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:48.441602 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:48.943495 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 1 milliseconds\nI0718 23:25:49.441970 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:49.941200 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:50.441597 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:50.941187 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:51.442974 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:51.942905 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:52.441242 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:52.941245 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:53.442853 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:53.941709 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:54.441271 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:54.941646 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:55.441535 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:55.941133 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:56.441116 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:56.941155 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:57.441164 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:57.941526 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:58.441244 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:58.941124 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:59.442062 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:25:59.941546 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:00.442171 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:00.941180 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:01.441096 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:01.941416 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:02.442609 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:02.941626 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:03.441891 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:03.941597 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:04.442628 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:04.941575 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:05.442278 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:05.941456 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:06.441183 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:06.942243 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:07.441599 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:07.941608 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:08.441116 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:08.941216 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:09.441416 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:09.941592 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:10.441123 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:10.941237 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:11.441500 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:11.941154 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:12.441575 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:12.941122 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:13.441267 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:13.941119 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:14.442407 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:14.941178 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:15.443057 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:15.941616 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:16.441580 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:16.941527 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:17.441163 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:17.941243 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:18.441273 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\n[kubelet-check] Initial timeout of 40s passed.\nI0718 23:26:18.942217 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.\nI0718 23:26:19.442259 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:19.943322 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 1 milliseconds\nI0718 23:26:20.441180 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:20.941137 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:21.441427 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:21.941404 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:22.441487 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:22.941178 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:23.441194 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:23.941496 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.\nI0718 23:26:24.441190 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:24.941524 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:25.441150 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:25.942183 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 1 milliseconds\nI0718 23:26:26.441159 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:26.941131 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:27.441144 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:27.941265 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:28.441190 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:28.941229 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:29.441608 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:29.941629 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:30.441545 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:30.941143 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:31.442358 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:31.941189 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:32.441507 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:32.941431 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:33.441453 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:33.941191 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.\nI0718 23:26:34.441183 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:34.941633 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:35.441771 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:35.941225 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:36.441421 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:36.941285 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:37.441285 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:37.941254 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:38.441586 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:38.941178 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:39.441229 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:39.941357 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:40.442364 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 1 milliseconds\nI0718 23:26:40.941448 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:41.441597 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:41.942121 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:42.441359 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:42.941243 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:43.443731 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 2 milliseconds\nI0718 23:26:43.941355 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:44.441211 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:44.941148 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:45.441456 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:45.941423 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:46.441184 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:46.941557 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:47.441595 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:47.941660 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:48.441492 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:48.942219 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:49.441888 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:49.941138 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:50.441227 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:50.941206 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:51.441281 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:51.941265 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:52.441209 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:52.941503 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:53.442424 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 1 milliseconds\nI0718 23:26:53.941183 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.\nI0718 23:26:54.441155 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:54.941167 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:55.442248 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:55.941566 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:56.442139 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:56.941176 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:57.441263 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:57.941169 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:58.442079 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:58.942095 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:59.441187 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:26:59.941098 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:00.441237 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:00.941420 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:01.441554 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:01.941572 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:02.441141 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:02.941166 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:03.441317 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:03.941503 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:04.441265 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:04.941919 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:05.441284 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:05.941471 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:06.441137 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:06.941644 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:07.441546 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:07.941644 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:08.441594 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:08.941041 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:09.441554 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:09.941179 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:10.441061 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:10.941225 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:11.441598 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:11.941198 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:12.441136 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:12.941212 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:13.441271 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:13.941497 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:14.441194 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:14.941198 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:15.441290 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:15.941492 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:16.441185 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:16.941478 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:17.441308 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:17.941339 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:18.441611 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:18.941981 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:19.441188 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:19.941149 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:20.441284 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:20.941182 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:21.441198 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:21.941600 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:22.441214 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:22.941131 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:23.441145 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:23.941478 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:24.441264 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:24.941605 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:25.441214 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:25.941405 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:26.441574 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:26.941179 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:27.441560 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:27.941219 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:28.441138 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:28.942359 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:29.441557 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:29.942183 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:30.441394 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:30.941548 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:31.441261 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:31.941410 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:32.441513 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:32.941500 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:33.441518 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 0 milliseconds\nI0718 23:27:33.943599 101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s in 1 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.\n\nUnfortunately, an error has occurred:\n\ttimed out waiting for the condition\n\nThis error is likely caused by:\n\t- The kubelet is not running\n\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\nIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t- 'systemctl status kubelet'\n\t- 'journalctl -xeu kubelet'\n\nAdditionally, a control plane component may have crashed or exited when started by the container runtime.\nTo troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.\nHere is one example how you may list all Kubernetes containers running in docker:\n\t- 'docker ps -a | grep kube | grep -v pause'\n\tOnce you have found the failing container, you can inspect its logs with:\n\t- 'docker logs CONTAINERID'\nerror execution phase wait-control-plane: couldn't initialize a Kubernetes cluster"
โ Starting control-plane ๐น๏ธ
time="23:27:34" level=debug msg="Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\\t{{.Label \"io.k8s.sigs.kind.cluster\"}} --filter label=io.k8s.sigs.kind.cluster=e2e-suite]"
time="23:27:34" level=debug msg="Running: /usr/bin/docker [docker rm -f -v e2e-suite-control-plane]"
Error: failed to create cluster: failed to init node with kubeadm: exit status 1
+ attempts=1
+ echo 'Could not setup KinD environment. Something wrong with KinD setup. Trying again.'
Could not setup KinD environment. Something wrong with KinD setup. Trying again.
So we are seeing the a fair amount so I added some retries in https://github.com/istio/istio/pull/15637 (and set loglevel=debug now for further debugging) to the setup. Any other ideas to mitigate this?
:disappointed:
Symptomatically the API Server is not coming up healthily, the most common case is that it was killed or evicted. That could be due to this.
kind / kubeadm can't do much there, it's expected that if the host doesn't have enough capacity to bring up the API server healthily in that time frame (controlled by kubeadm) that it won't come up. IIRC they've timed it for Raspberry PIs.
If you run create with --retain (so it doesn't cleanup on failure) and run a kind export logs before kind delete cluster (to cleanup in some exit handler) we can capture if the kubelet evicted the API server and what eviction threshold was crossed from the kubelet logs.
I would add that we don't see either of these on prow.k8s.io and I've not seen these locally*, I would still guess a resource management issue on the underlying hosts. I.E. the prow cluster nodes are overloaded / throttled :grimacing: Are the nodes being I/O throttled still?
Additional notes:
We have most of the conformance tests running continuously (every ~15m or so a run completes) on prow.k8s.io with very few flakes https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-kind-conformance-parallel/1151917042656350210 (though you will see some occasional upstream breakage).
* there used to be an issue where if the host had low disk you'd see the more recent one you posted but we avoid that now by disabling disk based eviction.
Ok so dug into it a bit more. With the retry PR, it doesn't seem to help. Either kind comes up, or it fails 3 times - it never seems to fail once then succeed.
I ran another test now that the cluster is pretty much empty. The node had no pods running on it until the test was scheduled on it. The first step of the test is setting up the kind cluster: https://gubernator.k8s.io/build/istio-prow/pr-logs/pull/istio_istio/15642/integ-galley-k8s-presubmit-tests-master/3435/
Looking at the node metrics, IO write bytes throttle is peaking at 15mb/s. So even just the kind create cluster is causing it to get throttled to some extent - not sure how normal that is.
I did the retaining + log dump, hopefully that can help. Example failure is at https://prow.istio.io/view/gcs/istio-prow/pr-logs/pull/istio_istio/15637/integ-telemetry-k8s-presubmit-tests-master/2039, artifiacts will have all the logs
From these logs the most obvious looking error:
Jul 19 03:43:15 e2e-suite-control-plane containerd[46]: time="2019-07-19T03:43:15.596990905Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
There is also this:
Jul 19 03:42:12 e2e-suite-control-plane kubelet[231]: E0719 03:42:12.897914 231 raw.go:146] Failed to watch directory "/sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket": inotify_add_watch /sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket: no space left on device
Jul 19 03:42:12 e2e-suite-control-plane kubelet[231]: E0719 03:42:12.897963 231 raw.go:146] Failed to watch directory "/sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice": inotify_add_watch /sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket: no space left on device
Jul 19 03:42:12 e2e-suite-control-plane kubelet[231]: E0719 03:42:12.897975 231 raw.go:146] Failed to watch directory "/sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded": inotify_add_watch /sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket: no space left on device
Jul 19 03:42:12 e2e-suite-control-plane kubelet[231]: E0719 03:42:12.897988 231 raw.go:146] Failed to watch directory "/sys/fs/cgroup/devices/docker": inotify_add_watch /sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket: no space left on device
Jul 19 03:42:12 e2e-suite-control-plane kubelet[231]: F0719 03:42:12.898008 231 kubelet.go:1344] Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket: no space left on device
We had some other tests where we got kind running, but then one of our containers running in kind failed to start a server with "no space left on device". I am pretty sure this is not literal disk space, but inotify limit? It seems it repeats that no space left error about 10x then it finally exits, which leads me to believe that is the root cause.
So seems like maybe increasing fs.inotify.max_user_watches may resolve this?
Either kind comes up, or it fails 3 times - it never seems to fail once then succeed.
That sounds _very_ suspicious.
Looking at the node metrics, IO write bytes throttle is peaking at 15mb/s. So even just the kind create cluster is causing it to get throttled to some extent - not sure how normal that is.
I'm not sure either.
Jul 19 03:43:15 e2e-suite-control-plane containerd[46]: time="2019-07-19T03:43:15.596990905Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
That's normal there's a period during startup where the CNI config is not written out yet. This error will show up on many, many clusters while the CNI daemonset is still coming up and writing that out. As long as it eventually gets the CNI config this is normal and fine ๐
I am pretty sure this is not literal disk space, but inotify limit? It seems it repeats that no space left error about 10x then it finally exits, which leads me to believe that is the root cause.
Yes! Now this error I have seen before, but not much on prow.k8s.io. This is indeed an issue with running out of inotify watches.
So seems like maybe increasing fs.inotify.max_user_watches may resolve this?
Yes, and we should be able to write a daemonset to do this.
This should go in our known-issues guide as well, and prow.k8s.io should probably bump its inotify watches as well, IIRC they're relatively low on the cluster nodes by default.
great, thanks for all the help! Now that we have the logs being dumped hopefully we can track down future issues faster now too.
@Katharine can you help us get the limit raised? I don't have access to the cluster
adapted to stable APIs from https://github.com/Azure/AKS/issues/772#issuecomment-477760184
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: tune-sysctls
namespace: kube-system
labels:
app: tune-sysctls
spec:
selector:
matchLabels:
name: tune-sysctls
template:
metadata:
labels:
name: tune-sysctls
spec:
hostNetwork: true
hostPID: true
hostIPC: true
initContainers:
- name: setsysctls
command:
- sh
- -c
- sysctl -w fs.inotify.max_user_watches=524288;
image: alpine:3.6
imagePullPolicy: IfNotPresent
name: setsysctls
resources: {}
securityContext:
privileged: true
volumeMounts:
- name: sys
mountPath: /sys
containers:
- name: sleepforever
resources:
requests:
cpu: 0.01
image: alpine:3.6
name: sleepforever
command: ["tail"]
args: ["-f", "/dev/null"]
volumes:
- name: sys
hostPath:
path: /sys
(Note: arbitrary-ish number of watches, the default on some distros is like ~8k ish IIRC).
Filed https://github.com/kubernetes/test-infra/pull/13515 to check this in somewhere.
I've applied the above across the istio test cluster. We should check it in to istio/test-infra so it accurately reflects reality.
We've deployed the daemonset today, the current version for prow is at https://github.com/kubernetes/test-infra/blob/f96fa91682a29c57838d4df17d9ef8d4ecf7260f/prow/cluster/tune-sysctls_daemonset.yaml (more or less the same, minor tweaks).
With the fix things are going smoothly, thanks for the help. feel free to close this
Excellent! :smile:
/close
@BenTheElder: Closing this issue.
In response to this:
Excellent! :smile:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.