kubeadm init error marking master: timed out waiting for the condition

Created on 4 Sep 2018 · 21Comments · Source: kubernetes/kubeadm

Versions

kubeadm version (use kubeadm version):
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

OS (e.g. from /etc/os-release):
CentOS 7.1
Kernel (e.g. uname -a):
Linux master1 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Docker
Docker version 17.03.1-ce, build c6d412e

What happened?

When i used kubeadm init to creating a single cluster, it ended up with an error as follow.

[root@master1 kubeadm]# kubeadm init --apiserver-advertise-address=172.16.6.64 --kubernetes-version=v1.11.1 --pod-network-cidr=192.168.0.0/16
[init] using Kubernetes version: v1.11.1
[preflight] running pre-flight checks
I0904 14:29:33.474299   28529 kernel_validator.go:81] Validating kernel version
I0904 14:29:33.474529   28529 kernel_validator.go:96] Validating kernel config
[preflight/images] Pulling images required for setting up a Kubernetes cluster
[preflight/images] This might take a minute or two, depending on the speed of your internet connection
[preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[preflight] Activating the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.16.6.64]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [master1 localhost] and IPs [127.0.0.1 ::1]
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [master1 localhost] and IPs [172.16.6.64 127.0.0.1 ::1]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" 
[init] this might take a minute or longer if the control plane images have to be pulled
[apiclient] All control plane components are healthy after 23.503472 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.11" in namespace kube-system with the configuration for the kubelets in the cluster
[markmaster] Marking the node master1 as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node master1 as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
error marking master: timed out waiting for the condition

But, all docker containers were work fine.

[root@master1 kubeadm]# docker ps 
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS               NAMES
53886ee1db02        272b3a60cd68           "kube-scheduler --..."   5 minutes ago       Up 5 minutes                            k8s_kube-scheduler_kube-schedu
05f9e74cb1ae        b8df3b177be2           "etcd --advertise-..."   5 minutes ago       Up 5 minutes                            k8s_etcd_etcd-master1_kube-sys
ac00773b050d        52096ee87d0e           "kube-controller-m..."   5 minutes ago       Up 5 minutes                            k8s_kube-controller-manager_ku
ebeae2ea255b        816332bd9d11           "kube-apiserver --..."   5 minutes ago       Up 5 minutes                            k8s_kube-apiserver_kube-apiser
74a0d0b1346e        k8s.gcr.io/pause:3.1   "/pause"                 5 minutes ago       Up 5 minutes                            k8s_POD_etcd-master1_kube-syst
b693b16e39cc        k8s.gcr.io/pause:3.1   "/pause"                 5 minutes ago       Up 5 minutes                            k8s_POD_kube-scheduler-master1
0ce92c0afa62        k8s.gcr.io/pause:3.1   "/pause"                 5 minutes ago       Up 5 minutes                            k8s_POD_kube-controller-manage
c43f05f27c01        k8s.gcr.io/pause:3.1   "/pause"                 5 minutes ago       Up 5 minutes                            k8s_POD_kube-apiserver-master

And, it's so weird that when i added --dry-run option it work out.

[root@master1 kubeadm]# kubeadm init --apiserver-advertise-address 172.16.6.64 --pod-network-cidr=192.168.0.0/16 --node-name=master1 --dry-run --kubernetes-version=v1.11.1
[init] using Kubernetes version: v1.11.1
[preflight] running pre-flight checks
I0904 16:07:56.101221   23703 kernel_validator.go:81] Validating kernel version
I0904 16:07:56.101565   23703 kernel_validator.go:96] Validating kernel config
[preflight/images] Would pull the required images (like 'kubeadm config images pull')
[kubelet] Writing kubelet environment file with flags to file "/tmp/kubeadm-init-dryrun016982898/kubeadm-flags.env"
[kubelet] Writing kubelet configuration to file "/tmp/kubeadm-init-dryrun016982898/config.yaml"
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.16.6.64]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [master1 localhost] and IPs [127.0.0.1 ::1]
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [master1 localhost] and IPs [172.16.6.64 127.0.0.1 ::1]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] valid certificates and keys now exist in "/tmp/kubeadm-init-dryrun016982898"
[kubeconfig] Wrote KubeConfig file to disk: "/tmp/kubeadm-init-dryrun016982898/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/tmp/kubeadm-init-dryrun016982898/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/tmp/kubeadm-init-dryrun016982898/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/tmp/kubeadm-init-dryrun016982898/scheduler.conf"
[controlplane] wrote Static Pod manifest for component kube-apiserver to "/tmp/kubeadm-init-dryrun016982898/kube-apiserver.yaml"
[controlplane] wrote Static Pod manifest for component kube-controller-manager to "/tmp/kubeadm-init-dryrun016982898/kube-controller-manager.yaml"
[controlplane] wrote Static Pod manifest for component kube-scheduler to "/tmp/kubeadm-init-dryrun016982898/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/tmp/kubeadm-init-dryrun016982898/etcd.yaml"
[dryrun] wrote certificates, kubeconfig files and control plane manifests to the "/tmp/kubeadm-init-dryrun016982898" directory
[dryrun] the certificates or kubeconfig files would not be printed due to their sensitive nature
[dryrun] please examine the "/tmp/kubeadm-init-dryrun016982898" directory for details about what would be written
[dryrun] Would write file "/etc/kubernetes/manifests/kube-apiserver.yaml" with content:
...

[markmaster] Marking the node master1 as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node master1 as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "master1"
[dryrun] Would perform action PATCH on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "master1"
[dryrun] Attached patch:
    {"metadata":{"labels":{"node-role.kubernetes.io/master":""}},"spec":{"taints":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"}]}}
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master1" as an annotation
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "master1"
[dryrun] Would perform action PATCH on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "master1"
[dryrun] Attached patch:
    {"metadata":{"annotations":{"kubeadm.alpha.kubernetes.io/cri-socket":"/var/run/dockershim.sock"}}}
[bootstraptoken] using token: 3gvy0t.amka3xc9u1oljlla
[dryrun] Would perform action GET on resource "secrets" in API group "core/v1"
[dryrun] Resource name: "bootstrap-token-3gvy0t"
[dryrun] Would perform action CREATE on resource "secrets" in API group "core/v1"
[dryrun] Attached object:
    apiVersion: v1
    data:
      auth-extra-groups: c3lzdGVtOmJvb3RzdHJhcHBlcnM6a3ViZWFkbTpkZWZhdWx0LW5vZGUtdG9rZW4=
      description: VGhlIGRlZmF1bHQgYm9vdHN0cmFwIHRva2VuIGdlbmVyYXRlZCBieSAna3ViZWFkbSBpbml0Jy4=
      expiration: MjAxOC0wOS0wNVQxNjowODowNSswODowMA==
      token-id: M2d2eTB0
      token-secret: YW1rYTN4Yzl1MW9samxsYQ==
      usage-bootstrap-authentication: dHJ1ZQ==
      usage-bootstrap-signing: dHJ1ZQ==
    kind: Secret
    metadata:
      creationTimestamp: null
      name: bootstrap-token-3gvy0t
      namespace: kube-system
    type: bootstrap.kubernetes.io/token
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[dryrun] Would perform action CREATE on resource "clusterrolebindings" in API group "rbac.authorization.k8s.io/v1"
[dryrun] Attached object:
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      creationTimestamp: null
      name: kubeadm:kubelet-bootstrap
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:node-bootstrapper
    subjects:
    - kind: Group
      name: system:bootstrappers:kubeadm:default-node-token
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[dryrun] Would perform action CREATE on resource "clusterrolebindings" in API group "rbac.authorization.k8s.io/v1"
[dryrun] Attached object:
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      creationTimestamp: null
      name: kubeadm:node-autoapprove-bootstrap
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
    subjects:
    - kind: Group
      name: system:bootstrappers:kubeadm:default-node-token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[dryrun] Would perform action CREATE on resource "clusterrolebindings" in API group "rbac.authorization.k8s.io/v1"
[dryrun] Attached object:
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      creationTimestamp: null
      name: kubeadm:node-autoapprove-certificate-rotation
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
    subjects:
    - kind: Group
      name: system:nodes
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
...

[dryrun] Attached object:
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      creationTimestamp: null
      name: kubeadm:node-proxier
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:node-proxier
    subjects:
    - kind: ServiceAccount
      name: kube-proxy
      namespace: kube-system
[addons] Applied essential addon: kube-proxy
[dryrun] finished dry-running successfully. Above are the resources that would be created

What you expected to happen?

How could i solve this problem, and create a single master cluster with kubeadm.

prioritawaiting-more-evidence

Source

heng-Yuan

Most helpful comment

$ kubeadm reset
$ ifconfig cni0 down && ip link delete cni0
$ ifconfig flannel.1 down && ip link delete flannel.1
$ rm -rf /var/lib/cni/

good luck!

zt706 on 12 Sep 2018

👍12 ❤1 🎉1

All 21 comments

Hi @heng-Yuan and thanks for filing this issue!

Can you check the state and logs of kubelet and the API server container (of course you can filter out any information you deem sensitive):

systemctl status kubelet
journalctl -xeu kubelet
docker logs ebeae2ea255b

Note that ebeae2ea255b is your API server container ID.

rosti on 4 Sep 2018

@rosti Thanks for your reply sincerely. I have reseted the master using kubeadm reset, and reinitialized it. Nevertheless it still had above errors . And I had checked the state and logs as you methioned.

[root@master1 kubeadm]# systemctl status kubelet.service -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf, 20-etcd-service-manager.conf
   Active: active (running) since Tue 2018-09-04 16:47:14 CST; 3min 3s ago
     Docs: http://kubernetes.io/docs/
 Main PID: 32505 (kubelet)
   Memory: 43.0M
   CGroup: /system.slice/kubelet.service
           └─32505 /usr/bin/kubelet --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true

Sep 04 16:49:26 master1 kubelet[32505]: I0904 16:49:26.817311   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:49:36 master1 kubelet[32505]: I0904 16:49:36.850186   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:49:37 master1 kubelet[32505]: I0904 16:49:37.053103   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:49:46 master1 kubelet[32505]: I0904 16:49:46.880508   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:49:56 master1 kubelet[32505]: I0904 16:49:56.910928   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:50:06 master1 kubelet[32505]: I0904 16:50:06.941318   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:50:09 master1 kubelet[32505]: I0904 16:50:09.053222   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:50:09 master1 kubelet[32505]: I0904 16:50:09.053483   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:50:10 master1 kubelet[32505]: I0904 16:50:10.053315   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:50:16 master1 kubelet[32505]: I0904 16:50:16.979911   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach

docker logs API server output lots of TLS handshake errorlike this:

[root@master1 kubeadm]# docker logs 23bb9ca0598b
I0905 01:06:56.928270       1 logs.go:49] http: TLS handshake error from 172.16.6.65:37562: read tcp 172.16.6.64:6443->172.16.6.65:37562: read: connection reset by peer
I0905 01:07:01.930357       1 logs.go:49] http: TLS handshake error from 172.16.6.65:37565: read tcp 172.16.6.64:6443->172.16.6.65:37565: read: connection reset by peer
I0905 01:07:06.931092       1 logs.go:49] http: TLS handshake error from 172.16.6.65:37568: read tcp 172.16.6.64:6443->172.16.6.65:37568: read: connection reset by peer
I0905 01:07:11.932974       1 logs.go:49] http: TLS handshake error from 172.16.6.65:37571: read tcp 172.16.6.64:6443->172.16.6.65:37571: read: connection reset by peer

172.16.6.64 is where I execute the command of kubeadm init server, and 172.16.6.65 is another server which i intend to use it as node.

And from my docker log , I also got some error logs as following:

[root@master1 ~]# journalctl -u docker.service -f
-- Logs begin at Mon 2018-09-03 04:20:53 CST. --
Sep 04 16:47:14 master1 dockerd[28979]: time="2018-09-04T16:47:14.583834446+08:00" level=error msg="Handler for GET /v1.27/containers/k8s.gcr.io/pause:3.1/json returned error: No such container: k8s.gcr.io/pause:3.1"
Sep 04 16:47:14 master1 dockerd[28979]: time="2018-09-04T16:47:14.617667342+08:00" level=error msg="Handler for GET /v1.27/containers/k8s.gcr.io/etcd-amd64:3.2.18/json returned error: No such container: k8s.gcr.io/etcd-amd64:3.2.18"
Sep 04 16:47:14 master1 dockerd[28979]: time="2018-09-04T16:47:14.652892678+08:00" level=error msg="Handler for GET /v1.27/containers/k8s.gcr.io/coredns:1.1.3/json returned error: No such container: k8s.gcr.io/coredns:1.1.3"
Sep 04 16:47:16 master1 dockerd[28979]: time="2018-09-04T16:47:16.198158654+08:00" level=error msg="Handler for GET /containers/13430f7e8177925ec6f51b5881f9e27cae98868256c83653be03e8dc6467bf18/json returned error: No such container: 13430f7e8177925ec6f51b5881f9e27cae98868256c83653be03e8dc6467bf18"
Sep 04 16:47:24 master1 dockerd[28979]: time="2018-09-04T16:47:24.729158296+08:00" level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in container 23bb9ca0598bd9183e0a289cfd128367f261f2673e93b675a530e9a66ff4bc37"
Sep 04 16:47:24 master1 dockerd[28979]: time="2018-09-04T16:47:24.953014165+08:00" level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in container 30ba9554ad122c5317e54ba11d6e4b44ca50fe5bb497716a93f7a2fb85b9c808"
Sep 04 16:47:25 master1 dockerd[28979]: time="2018-09-04T16:47:25.754823422+08:00" level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in container d27dde76000f2261051ff29277658c09471810cae72802087554bef12b54bc1e"
Sep 04 16:47:26 master1 dockerd[28979]: time="2018-09-04T16:47:26.215033314+08:00" level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in container 45c934c67f9c555c8e3dbe3311f01766485d9a09859d96798e744a5830545b23"
Sep 04 16:55:12 master1 dockerd[28979]: time="2018-09-04T16:55:12.034057676+08:00" level=error msg="Error setting up exec command in container http:: No such container: http:"
Sep 04 16:55:12 master1 dockerd[28979]: time="2018-09-04T16:55:12.034173274+08:00" level=error msg="Handler for POST /v1.27/containers/http:/exec returned error: No such container: http:"

However, all of those image are here locally.

[root@master1 kubeadm]# docker image ls
REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-proxy-amd64                v1.11.1             d5c25579d0ff        7 weeks ago         97.8 MB
k8s.gcr.io/kube-apiserver-amd64            v1.11.1             816332bd9d11        7 weeks ago         187 MB
k8s.gcr.io/kube-controller-manager-amd64   v1.11.1             52096ee87d0e        7 weeks ago         155 MB
k8s.gcr.io/kube-scheduler-amd64            v1.11.1             272b3a60cd68        7 weeks ago         56.8 MB
k8s.gcr.io/coredns                         1.1.3               b3b94275d97c        3 months ago        45.6 MB
k8s.gcr.io/etcd-amd64                      3.2.18              b8df3b177be2        4 months ago        219 MB
k8s.gcr.io/pause                           3.1                 da86e6ba6ca1        8 months ago        742 kB

heng-Yuan on 5 Sep 2018

best to also include:
journalctl -xeu kubelet

neolit123 on 5 Sep 2018

👍1

@neolit123 Thanks . As shown above of systemctl status kubelet.service -l , journalctl -xeu kubelet output Setting node annotation to enable volume controller attach/detach all the time .

[root@master1 kubeadm]# journalctl -xeu kubelet
Sep 05 10:25:30 master1 kubelet[32505]: I0905 10:25:30.053407   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:25:39 master1 kubelet[32505]: I0905 10:25:39.537799   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:25:49 master1 kubelet[32505]: I0905 10:25:49.575277   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:25:58 master1 kubelet[32505]: I0905 10:25:58.057656   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:25:59 master1 kubelet[32505]: I0905 10:25:59.613457   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:09 master1 kubelet[32505]: I0905 10:26:09.641498   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:19 master1 kubelet[32505]: I0905 10:26:19.674790   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:29 master1 kubelet[32505]: I0905 10:26:29.716115   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:39 master1 kubelet[32505]: I0905 10:26:39.749659   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:43 master1 kubelet[32505]: I0905 10:26:43.053217   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:44 master1 kubelet[32505]: I0905 10:26:44.053360   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/

heng-Yuan on 5 Sep 2018

are you sure you can advertise the API server on 172.16.6.64?
please check the connectivity to the address.

I0905 01:06:56.928270       1 logs.go:49] http: TLS handshake error from 172.16.6.65:37562: read tcp 172.16.6.64:6443->172.16.6.65:37562: read: connection reset by peer

these seem to be logs from another time.

neolit123 on 5 Sep 2018

@neolit123 Yes, I can connect to this address from another node(172.16.6.71).

[root@node1 ~]#  ping -c 2 172.16.6.64
PING 172.16.6.64 (172.16.6.64) 56(84) bytes of data.
64 bytes from 172.16.6.64: icmp_seq=1 ttl=64 time=0.473 ms
64 bytes from 172.16.6.64: icmp_seq=2 ttl=64 time=0.346 ms

--- 172.16.6.64 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.346/0.409/0.473/0.066 ms

And this TLS handshake error output all the time after apiserver container created.

Furthermore, I can use another server (172.16.6.65) to initialize master with the same config .

[root@master kubernetes]# kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=172.16.6.65 --kubernetes-version=v1.11.1 --node-name=master.XXX.com 
[init] using Kubernetes version: v1.11.1
[preflight] running pre-flight checks
I0905 10:40:52.988876   16823 kernel_validator.go:81] Validating kernel version
I0905 10:40:52.989249   16823 kernel_validator.go:96] Validating kernel config

...
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!
...
You can now join any number of machines by running the following on each node
as root:

  kubeadm join 172.16.6.65:6443 --token okt9xh.s12faifwcsXXXXXX --discovery-token-ca-cert-hash sha256:86b132ac50ffc055dacca29f86077d5fc09c5b6eb26f51696740a5d309b08351

Therefore, I think there is something I have overlooked.

heng-Yuan on 5 Sep 2018

@neolit123 Hi neolit, I use this server(172.16.6.64) as a worker node, to join into the master , it also output an error timed out waiting for the condition .

[root@node2 ~]# kubeadm join 172.16.6.65:6443 --token okt9xh.s12faifwcsqa1ly3 --discovery-token-ca-cert-hash sha256:86b132ac50ffc055dacca29f86077d5fc09c5b6eb26f51696740a5d309b08351
[preflight] running pre-flight checks
    [WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

I0905 11:48:51.819871   32477 kernel_validator.go:81] Validating kernel version
I0905 11:48:51.820073   32477 kernel_validator.go:96] Validating kernel config
[discovery] Trying to connect to API Server "172.16.6.65:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.16.6.65:6443"
[discovery] Requesting info from "https://172.16.6.65:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "172.16.6.65:6443"
[discovery] Successfully established connection with API Server "172.16.6.65:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'
timed out waiting for the condition

And ,the kubelet logs showing:

[root@node2 ~]# journalctl -xeu kubelet
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057056   32577 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057114   32577 status_manager.go:148] Kubernetes client is nil, not starting status manager.
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057155   32577 kubelet.go:1758] Starting kubelet main sync loop.
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057214   32577 kubelet.go:1775] skipping pod synchronization - [container runtime is down PLEG is not he
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057256   32577 volume_manager.go:247] Starting Kubelet Volume Manager
Sep 05 11:48:53 node2 kubelet[32577]: E0905 11:48:53.057444   32577 kubelet.go:1261] Image garbage collection failed once. Stats initialization may not have 
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057534   32577 server.go:302] Adding debug handlers to kubelet server.
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057814   32577 desired_state_of_world_populator.go:130] Desired state populator starts to run
Sep 05 11:48:53 node2 kubelet[32577]: E0905 11:48:53.095989   32577 factory.go:340] devicemapper filesystem stats will not be reported: RHEL/Centos 7.x kerne
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.158177   32577 kubelet.go:1775] skipping pod synchronization - [container runtime is down]
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.294942   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.298652   32577 cpu_manager.go:155] [cpumanager] starting with none policy
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.298700   32577 cpu_manager.go:156] [cpumanager] reconciling every 10s
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.298732   32577 policy_none.go:42] [cpumanager] none policy: Start
Sep 05 11:48:53 node2 kubelet[32577]: Starting Device Plugin manager
Sep 05 11:48:53 node2 kubelet[32577]: W0905 11:48:53.299950   32577 manager.go:496] Failed to retrieve checkpoint for "kubelet_internal_checkpoint": checkpoi
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.300381   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:03 node2 kubelet[32577]: I0905 11:49:03.324526   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:13 node2 kubelet[32577]: I0905 11:49:13.014114   32577 reconciler.go:154] Reconciler: start to sync state
Sep 05 11:49:13 node2 kubelet[32577]: I0905 11:49:13.354442   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:23 node2 kubelet[32577]: I0905 11:49:23.385356   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:33 node2 kubelet[32577]: I0905 11:49:33.415169   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:43 node2 kubelet[32577]: I0905 11:49:43.445193   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:53 node2 kubelet[32577]: I0905 11:49:53.478646   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:03 node2 kubelet[32577]: I0905 11:50:03.517594   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:13 node2 kubelet[32577]: I0905 11:50:13.542841   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:23 node2 kubelet[32577]: I0905 11:50:23.578388   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:33 node2 kubelet[32577]: I0905 11:50:33.605220   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:43 node2 kubelet[32577]: I0905 11:50:43.629862   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:53 node2 kubelet[32577]: I0905 11:50:53.668571   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:51:03 node2 kubelet[32577]: I0905 11:51:03.701641   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de

heng-Yuan on 5 Sep 2018

Hi @heng-Yuan , I think, that the TLS handshake error is caused by the use of FQDN for the --node-name parameter to kubeadm init. Can you reset your cluster and try to specify a simple host name to that?

You can also try to supply the host name via --apiserver-cert-extra-sans if you want to try to keep the FQDN for --node-name.

rosti on 5 Sep 2018

@rosti , I have noticed this issue , and have used simple host name to init the master, but it's also had this problem .

https://github.com/kubernetes/kubernetes/issues/64312

heng-Yuan on 5 Sep 2018

@heng-Yuan can you verify that you have forwarding enabled?

cat /proc/sys/net/ipv4/ip_forward

rosti on 5 Sep 2018

@heng-Yuan - I'd make certain SELinux is disabled FWIW.

timothysc on 5 Sep 2018

👍1

@rosti Yes, I have checked it ,and forwarding has already enabled .

[root@master1 ~]# cat /proc/sys/net/ipv4/ip_forward
1

@timothysc Also, SElinux was disabled .

[root@master1 ~]# getenforce
Disabled

heng-Yuan on 6 Sep 2018

$ kubeadm reset
$ ifconfig cni0 down && ip link delete cni0
$ ifconfig flannel.1 down && ip link delete flannel.1
$ rm -rf /var/lib/cni/

good luck!

zt706 on 12 Sep 2018

👍12 ❤1 🎉1

We are adding a separate timeout to the config in 1.13.
Closing this issue.

timothysc on 26 Oct 2018

👎6

@heng-Yuan Hi, did you fix the issue?

zaijianwutian on 7 Jan 2019

@heng-Yuan Hi, did you fix the issue?

d10raghu on 9 Jan 2019

@timothysc Perhaps you can share the issue/PR which prompted this to be closed?

ntwrkguru on 11 Mar 2019

👍1

I'm also having this issue. what is the fix?

stevenmunro on 6 May 2019

I'm also having this issue. what is the fix?

I tried what @zt706 suggested and it works.
In my case, there wasnt a cni0 interface, just a flannel.1.
I deleted the /var/lib/cni. Worked for me. thx, @zt706