Environment:
three fedora 31 in openstack, 4cpu/8GB memroy
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
fedora@fcos-master-dev ~]$ uname -srm
Linux 5.3.7-301.fc31.x86_64 x86_64
[fedora@fcos-master-dev ~]$ cat /etc/os-release
NAME=Fedora
VERSION="31 (Cloud Edition)"
ID=fedora
VERSION_ID=31
VERSION_CODENAME=""
PLATFORM_ID="platform:f31"
PRETTY_NAME="Fedora 31 (Cloud Edition)"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:31"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f31/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=31
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=31
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Cloud Edition"
VARIANT_ID=cloud
ansible --version):ansible 2.9.6
config file = None
configured module search path = ['/Users/hgye/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /Users/hgye/Envs/kubespray/lib/python3.7/site-packages/ansible
executable location = /Users/hgye/Envs/kubespray/bin/ansible
python version = 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)]
python --version):Python 3.7.4Kubespray version (commit) (git rev-parse --short HEAD):
current master
5e225744
Network plugin used:
calico
Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):
https://gist.github.com/hgye/b57553036f908493be4df59e64f0ca1c
Command used to invoke ansible:
ansible-playbook -i inventory/onuware-dev/inventory.ini --become --become-user=root upgrade-cluster.yml --private-key=/Users/hgye/Downloads/openstack-key/onu-k8s-dev.pem
Output of ansible run:
https://gist.github.com/hgye/3831c78add843dc048359b286d53e01c
Anything else do we need to know:
hgye@HuaiguangYedeMacBook-Pro ๎ฐ ~/git/kubespray ๎ฐ ๎ master โ โ1 ๎ฐ kubectl get all -n kube-system ๎ฒ SIGINT(2) โต ๎ฒ 12151 ๎ฒ 21:20:43 ๎ฒ 100% ๐
NAME READY STATUS RESTARTS AGE
pod/calico-kube-controllers-fbcb89fbd-v9j6f 1/1 Running 1 87m
pod/calico-node-4ddpr 1/1 Running 1 87m
pod/calico-node-kwpdw 1/1 Running 5 93m
pod/calico-node-xw44x 1/1 Running 4 89m
pod/coredns-dff8fc7d-4665f 1/1 Running 1 77m
pod/coredns-dff8fc7d-t46cx 1/1 Running 1 77m
pod/dns-autoscaler-66498f5c5f-nftlf 1/1 Running 1 87m
pod/kube-apiserver-fcos-master-dev 1/1 Running 17 4h29m
pod/kube-controller-manager-fcos-master-dev 0/1 CrashLoopBackOff 50 121m
pod/kube-proxy-5x2hs 1/1 Running 1 3h1m
pod/kube-proxy-hsb8n 1/1 Running 1 165m
pod/kube-proxy-x5j57 1/1 Running 1 175m
pod/kube-scheduler-fcos-master-dev 0/1 CrashLoopBackOff 19 4h28m
pod/kubernetes-dashboard-57777fbdcb-7nvq2 1/1 Running 5 87m
pod/kubernetes-metrics-scraper-54fbb4d595-7d2n5 1/1 Running 1 87m
pod/nginx-proxy-fcos-slave1-dev 1/1 Running 5 8h
pod/nginx-proxy-fcos-slave2-dev 1/1 Running 5 8h
pod/nodelocaldns-52w6z 1/1 Running 5 8h
pod/nodelocaldns-8pr77 1/1 Running 5 8h
pod/nodelocaldns-lmjz5 1/1 Running 5 8h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 8h
service/dashboard-metrics-scraper ClusterIP 10.233.18.165 <none> 8000/TCP 8h
service/kubernetes-dashboard ClusterIP 10.233.49.42 <none> 443/TCP 8h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/calico-node 3 3 3 3 3 <none> 8h
daemonset.apps/kube-proxy 3 3 3 3 3 kubernetes.io/os=linux 8h
daemonset.apps/nodelocaldns 3 3 3 3 3 <none> 8h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/calico-kube-controllers 1/1 1 1 8h
deployment.apps/coredns 2/2 2 2 84m
deployment.apps/dns-autoscaler 1/1 1 1 8h
deployment.apps/kubernetes-dashboard 1/1 1 1 8h
deployment.apps/kubernetes-metrics-scraper 1/1 1 1 8h
NAME DESIRED CURRENT READY AGE
replicaset.apps/calico-kube-controllers-fbcb89fbd 1 1 1 8h
replicaset.apps/coredns-dff8fc7d 2 2 2 77m
replicaset.apps/dns-autoscaler-66498f5c5f 1 1 1 8h
replicaset.apps/kubernetes-dashboard-57777fbdcb 1 1 1 8h
replicaset.apps/kubernetes-metrics-scraper-54fbb4d595 1 1 1 8h
kubectl get cs ๎ฒ โ ๎ฒ 12117 ๎ฒ 21:29:35 ๎ฒ 100% ๐
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
hgye@HuaiguangYedeMacBook-Pro ๎ฐ ~/git/kubespray ๎ฐ ๎ master โ โ1 ๎ฐ kubectl describe pod kube-scheduler-fcos-master-dev -n kube-system ๎ฒ โ ๎ฒ 12118 ๎ฒ 21:29:40 ๎ฒ 100% ๐
Name: kube-scheduler-fcos-master-dev
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: fcos-master-dev/192.168.30.118
Start Time: Wed, 15 Jul 2020 17:36:05 +0800
Labels: component=kube-scheduler
tier=control-plane
Annotations: kubernetes.io/config.hash: 22e409dccec0667ed93153f76c5b2111
kubernetes.io/config.mirror: 22e409dccec0667ed93153f76c5b2111
kubernetes.io/config.seen: 2020-07-15T08:52:06.525395751Z
kubernetes.io/config.source: file
Status: Running
IP: 192.168.30.118
IPs:
IP: 192.168.30.118
Controlled By: Node/fcos-master-dev
Containers:
kube-scheduler:
Container ID: docker://8471885ba8976eadb792ef4509efd012b3adfa823caf97491119bb9ce0e00b93
Image: k8s.gcr.io/kube-scheduler:v1.18.5
Image ID: docker-pullable://k8s.gcr.io/kube-scheduler@sha256:6c8327a5a2527cc990e2bcfb4386220a025c42351546f3cc577efbac2d0b0efe
Port: <none>
Host Port: <none>
Command:
kube-scheduler
--authentication-kubeconfig=/etc/kubernetes/scheduler.conf
--authorization-kubeconfig=/etc/kubernetes/scheduler.conf
--bind-address=0.0.0.0
--kubeconfig=/etc/kubernetes/scheduler.conf
--leader-elect=true
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 15 Jul 2020 21:21:22 +0800
Finished: Wed, 15 Jul 2020 21:25:49 +0800
Ready: False
Restart Count: 20
Requests:
cpu: 100m
Liveness: http-get https://:10259/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment:
HTTP_PROXY: http://192.168.20.1:9091/
http_proxy: http://192.168.20.1:9091/
NO_PROXY: 192.168.30.118,fcos-master-dev,fcos-master-dev.cluster.onuware-dev.loc,192.168.30.239,fcos-slave1-dev,fcos-slave1-dev.cluster.onuware-dev.loc,192.168.30.250,fcos-slave2-dev,fcos-slave2-dev.cluster.onuware-dev.loc,127.0.0.1,localhost,10.233.0.0/18,10.233.64.0/18
no_proxy: 192.168.30.118,fcos-master-dev,fcos-master-dev.cluster.onuware-dev.loc,192.168.30.239,fcos-slave1-dev,fcos-slave1-dev.cluster.onuware-dev.loc,192.168.30.250,fcos-slave2-dev,fcos-slave2-dev.cluster.onuware-dev.loc,127.0.0.1,localhost,10.233.0.0/18,10.233.64.0/18
HTTPS_PROXY: http://192.168.20.1:9091/
https_proxy: http://192.168.20.1:9091/
Mounts:
/etc/kubernetes/scheduler.conf from kubeconfig (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubeconfig:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/scheduler.conf
HostPathType: FileOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 4m24s (x13 over 70m) kubelet, fcos-master-dev Liveness probe failed: Get https://192.168.30.118:10259/healthz: dial tcp 192.168.30.118:10259: connect: connection refused
Warning BackOff 92s (x205 over 76m) kubelet, fcos-master-dev Back-off restarting failed container
hgye@HuaiguangYedeMacBook-Pro ๎ฐ ~/git/kubespray ๎ฐ ๎ master โ โ1 ๎ฐ kubectl describe pod kube-controller-manager-fcos-master-dev -n kube-system ๎ฒ โ ๎ฒ 12119 ๎ฒ 21:30:20 ๎ฒ 100% ๐
Name: kube-controller-manager-fcos-master-dev
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: fcos-master-dev/192.168.30.118
Start Time: Wed, 15 Jul 2020 15:57:19 +0800
Labels: component=kube-controller-manager
tier=control-plane
Annotations: kubernetes.io/config.hash: 1e3b725cfad100018d51ae8adef941d1
kubernetes.io/config.mirror: 1e3b725cfad100018d51ae8adef941d1
kubernetes.io/config.seen: 2020-07-15T10:17:08.798542813Z
kubernetes.io/config.source: file
Status: Running
IP: 192.168.30.118
IPs:
IP: 192.168.30.118
Controlled By: Node/fcos-master-dev
Containers:
kube-controller-manager:
Container ID: docker://a6289d72cfd3e27bfd1c0cb50ce47a3d2cd150125de54e820f9870289bd1b9c2
Image: k8s.gcr.io/kube-controller-manager:v1.18.5
Image ID: docker-pullable://k8s.gcr.io/kube-controller-manager@sha256:74927dcab888f1ddec67bca3bb8fa5e682d5ed427a9de6ba77304555bf458865
Port: <none>
Host Port: <none>
Command:
kube-controller-manager
--allocate-node-cidrs=true
--authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
--authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
--bind-address=0.0.0.0
--client-ca-file=/etc/kubernetes/ssl/ca.crt
--cluster-cidr=10.233.64.0/18
--cluster-name=cluster.onuware-dev.loc
--cluster-signing-cert-file=/etc/kubernetes/ssl/ca.crt
--cluster-signing-key-file=/etc/kubernetes/ssl/ca.key
--configure-cloud-routes=false
--controllers=*,bootstrapsigner,tokencleaner
--kubeconfig=/etc/kubernetes/controller-manager.conf
--leader-elect=true
--node-cidr-mask-size=24
--node-monitor-grace-period=40s
--node-monitor-period=5s
--pod-eviction-timeout=5m0s
--profiling=False
--requestheader-client-ca-file=/etc/kubernetes/ssl/front-proxy-ca.crt
--root-ca-file=/etc/kubernetes/ssl/ca.crt
--service-account-private-key-file=/etc/kubernetes/ssl/sa.key
--service-cluster-ip-range=10.233.0.0/18
--terminated-pod-gc-threshold=12500
--use-service-account-credentials=true
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 15 Jul 2020 21:24:58 +0800
Finished: Wed, 15 Jul 2020 21:25:57 +0800
Ready: False
Restart Count: 51
Requests:
cpu: 200m
Liveness: http-get https://:10257/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment:
HTTP_PROXY: http://192.168.20.1:9091/
http_proxy: http://192.168.20.1:9091/
NO_PROXY: 192.168.30.118,fcos-master-dev,fcos-master-dev.cluster.onuware-dev.loc,192.168.30.239,fcos-slave1-dev,fcos-slave1-dev.cluster.onuware-dev.loc,192.168.30.250,fcos-slave2-dev,fcos-slave2-dev.cluster.onuware-dev.loc,127.0.0.1,localhost,10.233.0.0/18,10.233.64.0/18
no_proxy: 192.168.30.118,fcos-master-dev,fcos-master-dev.cluster.onuware-dev.loc,192.168.30.239,fcos-slave1-dev,fcos-slave1-dev.cluster.onuware-dev.loc,192.168.30.250,fcos-slave2-dev,fcos-slave2-dev.cluster.onuware-dev.loc,127.0.0.1,localhost,10.233.0.0/18,10.233.64.0/18
HTTPS_PROXY: http://192.168.20.1:9091/
https_proxy: http://192.168.20.1:9091/
Mounts:
/etc/kubernetes/controller-manager.conf from kubeconfig (ro)
/etc/kubernetes/ssl from k8s-certs (ro)
/etc/pki from etc-pki (ro)
/etc/ssl/certs from ca-certs (ro)
/usr/libexec/kubernetes/kubelet-plugins/volume/exec from flexvolume-dir (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
ca-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
HostPathType: DirectoryOrCreate
etc-pki:
Type: HostPath (bare host directory volume)
Path: /etc/pki
HostPathType: DirectoryOrCreate
flexvolume-dir:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
HostPathType: DirectoryOrCreate
k8s-certs:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/ssl
HostPathType: DirectoryOrCreate
kubeconfig:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/controller-manager.conf
HostPathType: FileOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 46m (x8 over 78m) kubelet, fcos-master-dev Container image "k8s.gcr.io/kube-controller-manager:v1.18.5" already present on machine
Warning Unhealthy 40m (x14 over 75m) kubelet, fcos-master-dev Liveness probe failed: Get https://192.168.30.118:10257/healthz: dial tcp 192.168.30.118:10257: connect: connection refused
Warning BackOff 114s (x181 over 66m) kubelet, fcos-master-dev Back-off restarting failed container
[fedora@fcos-master-dev ~]$ sudo docker logs k8s_kube-scheduler_kube-scheduler-fcos-master-dev_kube-system_22e409dccec0667ed93153f76c5b2111_20
I0715 13:21:22.857794 1 registry.go:150] Registering EvenPodsSpread predicate and priority function
I0715 13:21:22.857894 1 registry.go:150] Registering EvenPodsSpread predicate and priority function
I0715 13:21:23.768574 1 serving.go:313] Generated self-signed cert in-memory
I0715 13:21:24.990596 1 registry.go:150] Registering EvenPodsSpread predicate and priority function
I0715 13:21:24.990625 1 registry.go:150] Registering EvenPodsSpread predicate and priority function
W0715 13:21:24.993078 1 authorization.go:47] Authorization is disabled
W0715 13:21:24.993097 1 authentication.go:40] Authentication is disabled
I0715 13:21:24.993134 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
I0715 13:21:24.994572 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0715 13:21:24.994606 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0715 13:21:24.994624 1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0715 13:21:24.994624 1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0715 13:21:24.995000 1 secure_serving.go:178] Serving securely on [::]:10259
I0715 13:21:24.995041 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0715 13:21:25.095000 1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0715 13:21:25.095021 1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0715 13:21:25.095227 1 leaderelection.go:242] attempting to acquire leader lease kube-system/kube-scheduler...
I0715 13:21:46.534795 1 leaderelection.go:252] successfully acquired lease kube-system/kube-scheduler
E0715 13:25:49.761395 1 leaderelection.go:356] Failed to update lock: resource name may not be empty
I0715 13:25:49.763133 1 leaderelection.go:277] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
F0715 13:25:49.763212 1 server.go:244] leaderelection lost
[fedora@fcos-master-dev ~]$
[fedora@fcos-master-dev ~]$ sudo docker logs k8s_kube-controller-manager_kube-controller-manager-fcos-master-dev_kube-system_1e3b725cfad100018d51ae8adef941d1_51
I0715 13:24:59.347185 1 serving.go:313] Generated self-signed cert in-memory
I0715 13:24:59.810406 1 controllermanager.go:161] Version: v1.18.5
I0715 13:24:59.811800 1 dynamic_cafile_content.go:167] Starting request-header::/etc/kubernetes/ssl/front-proxy-ca.crt
I0715 13:24:59.812047 1 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/ssl/ca.crt
I0715 13:24:59.812194 1 secure_serving.go:178] Serving securely on [::]:10257
I0715 13:24:59.812371 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0715 13:24:59.812988 1 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252
I0715 13:24:59.813116 1 leaderelection.go:242] attempting to acquire leader lease kube-system/kube-controller-manager...
I0715 13:25:22.788036 1 leaderelection.go:252] successfully acquired lease kube-system/kube-controller-manager
I0715 13:25:22.790408 1 event.go:278] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"5ecf7d67-c833-41f5-8ce7-ebc317f624c4", APIVersion:"v1", ResourceVersion:"32728", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' fcos-master-dev.novalocal_e2924774-3c97-4259-9b5d-e91f1a1cd5a7 became leader
I0715 13:25:22.790479 1 event.go:278] Event(v1.ObjectReference{Kind:"Lease", Namespace:"kube-system", Name:"kube-controller-manager", UID:"04946580-bd36-4ed5-a78d-409baab838b5", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"32729", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' fcos-master-dev.novalocal_e2924774-3c97-4259-9b5d-e91f1a1cd5a7 became leader
I0715 13:25:23.718755 1 plugins.go:100] No cloud provider specified.
I0715 13:25:23.721213 1 shared_informer.go:223] Waiting for caches to sync for tokens
I0715 13:25:23.822412 1 shared_informer.go:230] Caches are synced for tokens
I0715 13:25:23.844064 1 controllermanager.go:533] Started "namespace"
I0715 13:25:23.844233 1 namespace_controller.go:200] Starting namespace controller
I0715 13:25:23.844246 1 shared_informer.go:223] Waiting for caches to sync for namespace
I0715 13:25:24.080937 1 controllermanager.go:533] Started "replicaset"
I0715 13:25:24.081125 1 replica_set.go:181] Starting replicaset controller
I0715 13:25:24.081140 1 shared_informer.go:223] Waiting for caches to sync for ReplicaSet
I0715 13:25:24.347794 1 controllermanager.go:533] Started "csrcleaner"
I0715 13:25:24.347880 1 cleaner.go:82] Starting CSR cleaner controller
I0715 13:25:24.357656 1 node_lifecycle_controller.go:384] Sending events to api server.
I0715 13:25:24.358009 1 taint_manager.go:163] Sending events to api server.
I0715 13:25:24.358134 1 node_lifecycle_controller.go:512] Controller will reconcile labels.
I0715 13:25:24.358221 1 controllermanager.go:533] Started "nodelifecycle"
I0715 13:25:24.358443 1 node_lifecycle_controller.go:546] Starting node controller
I0715 13:25:24.358468 1 shared_informer.go:223] Waiting for caches to sync for taint
I0715 13:25:24.369712 1 node_lifecycle_controller.go:78] Sending events to api server
E0715 13:25:24.369788 1 core.go:229] failed to start cloud node lifecycle controller: no cloud provider provided
W0715 13:25:24.369808 1 controllermanager.go:525] Skipping "cloud-node-lifecycle"
I0715 13:25:24.378549 1 controllermanager.go:533] Started "attachdetach"
I0715 13:25:24.378886 1 attach_detach_controller.go:348] Starting attach detach controller
I0715 13:25:24.378943 1 shared_informer.go:223] Waiting for caches to sync for attach detach
I0715 13:25:24.386571 1 controllermanager.go:533] Started "clusterrole-aggregation"
I0715 13:25:24.386735 1 clusterroleaggregation_controller.go:149] Starting ClusterRoleAggregator
I0715 13:25:24.386846 1 shared_informer.go:223] Waiting for caches to sync for ClusterRoleAggregator
I0715 13:25:24.393561 1 controllermanager.go:533] Started "daemonset"
I0715 13:25:24.393758 1 daemon_controller.go:257] Starting daemon sets controller
I0715 13:25:24.393986 1 shared_informer.go:223] Waiting for caches to sync for daemon sets
I0715 13:25:24.401043 1 controllermanager.go:533] Started "csrapproving"
I0715 13:25:24.401086 1 certificate_controller.go:119] Starting certificate controller "csrapproving"
I0715 13:25:24.401262 1 shared_informer.go:223] Waiting for caches to sync for certificate-csrapproving
I0715 13:25:24.408226 1 controllermanager.go:533] Started "persistentvolume-binder"
I0715 13:25:24.408378 1 pv_controller_base.go:295] Starting persistent volume controller
I0715 13:25:24.408391 1 shared_informer.go:223] Waiting for caches to sync for persistent volume
I0715 13:25:24.415913 1 controllermanager.go:533] Started "replicationcontroller"
I0715 13:25:24.416043 1 replica_set.go:181] Starting replicationcontroller controller
I0715 13:25:24.416060 1 shared_informer.go:223] Waiting for caches to sync for ReplicationController
I0715 13:25:24.955559 1 controllermanager.go:533] Started "garbagecollector"
I0715 13:25:24.956128 1 garbagecollector.go:133] Starting garbage collector controller
I0715 13:25:24.956232 1 shared_informer.go:223] Waiting for caches to sync for garbage collector
I0715 13:25:24.956332 1 graph_builder.go:282] GraphBuilder running
I0715 13:25:24.967173 1 controllermanager.go:533] Started "cronjob"
I0715 13:25:24.967242 1 cronjob_controller.go:97] Starting CronJob Manager
I0715 13:25:24.976608 1 controllermanager.go:533] Started "bootstrapsigner"
I0715 13:25:24.976678 1 shared_informer.go:223] Waiting for caches to sync for bootstrap_signer
E0715 13:25:24.984131 1 core.go:89] Failed to start service controller: WARNING: no cloud provider provided, services of type LoadBalancer will fail
W0715 13:25:24.984160 1 controllermanager.go:525] Skipping "service"
I0715 13:25:25.045966 1 controllermanager.go:533] Started "pv-protection"
I0715 13:25:25.046065 1 pv_protection_controller.go:83] Starting PV protection controller
I0715 13:25:25.046257 1 shared_informer.go:223] Waiting for caches to sync for PV protection
I0715 13:25:25.192772 1 controllermanager.go:533] Started "podgc"
I0715 13:25:25.192853 1 gc_controller.go:89] Starting GC controller
I0715 13:25:25.192871 1 shared_informer.go:223] Waiting for caches to sync for GC
I0715 13:25:25.343639 1 controllermanager.go:533] Started "csrsigning"
I0715 13:25:25.343733 1 certificate_controller.go:119] Starting certificate controller "csrsigning"
I0715 13:25:25.343744 1 shared_informer.go:223] Waiting for caches to sync for certificate-csrsigning
I0715 13:25:25.343767 1 dynamic_serving_content.go:130] Starting csr-controller::/etc/kubernetes/ssl/ca.crt::/etc/kubernetes/ssl/ca.key
I0715 13:25:25.492841 1 controllermanager.go:533] Started "tokencleaner"
I0715 13:25:25.492879 1 core.go:239] Will not configure cloud provider routes for allocate-node-cidrs: true, configure-cloud-routes: false.
W0715 13:25:25.492920 1 controllermanager.go:525] Skipping "route"
W0715 13:25:25.492934 1 controllermanager.go:525] Skipping "ttl-after-finished"
W0715 13:25:25.492953 1 controllermanager.go:525] Skipping "root-ca-cert-publisher"
I0715 13:25:25.493041 1 tokencleaner.go:118] Starting token cleaner controller
I0715 13:25:25.493057 1 shared_informer.go:223] Waiting for caches to sync for token_cleaner
I0715 13:25:25.493089 1 shared_informer.go:230] Caches are synced for token_cleaner
I0715 13:25:25.643329 1 controllermanager.go:533] Started "endpoint"
I0715 13:25:25.643360 1 endpoints_controller.go:182] Starting endpoint controller
I0715 13:25:25.643401 1 shared_informer.go:223] Waiting for caches to sync for endpoint
I0715 13:25:25.992701 1 request.go:621] Throttling request took 1.032016097s, request: GET:https://192.168.30.118:6443/apis/autoscaling/v1?timeout=32s
I0715 13:25:26.044487 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for events.events.k8s.io
I0715 13:25:26.044531 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for horizontalpodautoscalers.autoscaling
I0715 13:25:26.044551 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for ingresses.networking.k8s.io
I0715 13:25:26.044576 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for controllerrevisions.apps
I0715 13:25:26.044595 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for deployments.apps
I0715 13:25:26.044900 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for roles.rbac.authorization.k8s.io
I0715 13:25:26.044930 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for leases.coordination.k8s.io
I0715 13:25:26.045108 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for endpoints
I0715 13:25:26.045175 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for jobs.batch
I0715 13:25:26.045313 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for statefulsets.apps
I0715 13:25:26.045358 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for limitranges
I0715 13:25:26.045374 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for replicasets.apps
I0715 13:25:26.045427 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for cronjobs.batch
I0715 13:25:26.045454 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for networkpolicies.networking.k8s.io
I0715 13:25:26.045551 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for poddisruptionbudgets.policy
I0715 13:25:26.045693 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for serviceaccounts
I0715 13:25:26.045743 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for ingresses.extensions
I0715 13:25:26.045847 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for daemonsets.apps
I0715 13:25:26.045897 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for rolebindings.rbac.authorization.k8s.io
I0715 13:25:26.045987 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for podtemplates
I0715 13:25:26.046082 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for endpointslices.discovery.k8s.io
I0715 13:25:26.046108 1 controllermanager.go:533] Started "resourcequota"
I0715 13:25:26.046193 1 resource_quota_controller.go:272] Starting resource quota controller
I0715 13:25:26.046297 1 shared_informer.go:223] Waiting for caches to sync for resource quota
I0715 13:25:26.046340 1 resource_quota_monitor.go:303] QuotaMonitor running
I0715 13:25:26.393302 1 controllermanager.go:533] Started "horizontalpodautoscaling"
I0715 13:25:26.393557 1 horizontal.go:169] Starting HPA controller
I0715 13:25:26.393571 1 shared_informer.go:223] Waiting for caches to sync for HPA
I0715 13:25:26.692426 1 controllermanager.go:533] Started "disruption"
I0715 13:25:26.692513 1 disruption.go:331] Starting disruption controller
I0715 13:25:26.692541 1 shared_informer.go:223] Waiting for caches to sync for disruption
I0715 13:25:26.842858 1 controllermanager.go:533] Started "statefulset"
I0715 13:25:26.843027 1 stateful_set.go:146] Starting stateful set controller
I0715 13:25:26.843046 1 shared_informer.go:223] Waiting for caches to sync for stateful set
I0715 13:25:26.993160 1 controllermanager.go:533] Started "endpointslice"
I0715 13:25:26.993286 1 endpointslice_controller.go:213] Starting endpoint slice controller
I0715 13:25:26.993318 1 shared_informer.go:223] Waiting for caches to sync for endpoint_slice
I0715 13:25:27.143635 1 controllermanager.go:533] Started "job"
I0715 13:25:27.143705 1 job_controller.go:144] Starting job controller
I0715 13:25:27.143713 1 shared_informer.go:223] Waiting for caches to sync for job
I0715 13:25:27.293272 1 controllermanager.go:533] Started "ttl"
I0715 13:25:27.293386 1 ttl_controller.go:118] Starting TTL controller
I0715 13:25:27.293398 1 shared_informer.go:223] Waiting for caches to sync for TTL
I0715 13:25:27.442916 1 node_ipam_controller.go:94] Sending events to api server.
I0715 13:25:37.752535 1 range_allocator.go:82] Sending events to api server.
I0715 13:25:37.752782 1 range_allocator.go:116] No Secondary Service CIDR provided. Skipping filtering out secondary service addresses.
I0715 13:25:37.753109 1 controllermanager.go:533] Started "nodeipam"
I0715 13:25:37.753495 1 node_ipam_controller.go:162] Starting ipam controller
I0715 13:25:37.753530 1 shared_informer.go:223] Waiting for caches to sync for node
I0715 13:25:37.773147 1 controllermanager.go:533] Started "serviceaccount"
I0715 13:25:37.773474 1 serviceaccounts_controller.go:117] Starting service account controller
I0715 13:25:37.773490 1 shared_informer.go:223] Waiting for caches to sync for service account
I0715 13:25:39.682830 1 controllermanager.go:533] Started "deployment"
I0715 13:25:39.683163 1 deployment_controller.go:153] Starting deployment controller
I0715 13:25:39.683244 1 shared_informer.go:223] Waiting for caches to sync for deployment
I0715 13:25:43.730355 1 controllermanager.go:533] Started "persistentvolume-expander"
I0715 13:25:43.730579 1 expand_controller.go:319] Starting expand controller
I0715 13:25:43.730608 1 shared_informer.go:223] Waiting for caches to sync for expand
I0715 13:25:45.141496 1 controllermanager.go:533] Started "pvc-protection"
I0715 13:25:45.141534 1 pvc_protection_controller.go:101] Starting PVC protection controller
I0715 13:25:45.141551 1 shared_informer.go:223] Waiting for caches to sync for PVC protection
I0715 13:25:45.144940 1 shared_informer.go:223] Waiting for caches to sync for garbage collector
I0715 13:25:45.146805 1 shared_informer.go:223] Waiting for caches to sync for resource quota
W0715 13:25:45.168537 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="fcos-master-dev" does not exist
W0715 13:25:45.168760 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="fcos-slave1-dev" does not exist
W0715 13:25:45.170562 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="fcos-slave2-dev" does not exist
I0715 13:25:45.173662 1 shared_informer.go:230] Caches are synced for service account
I0715 13:25:45.176900 1 shared_informer.go:230] Caches are synced for bootstrap_signer
I0715 13:25:45.179162 1 shared_informer.go:230] Caches are synced for attach detach
I0715 13:25:45.187102 1 shared_informer.go:230] Caches are synced for ClusterRoleAggregator
I0715 13:25:45.192996 1 shared_informer.go:230] Caches are synced for GC
I0715 13:25:45.193476 1 shared_informer.go:230] Caches are synced for endpoint_slice
I0715 13:25:45.194527 1 shared_informer.go:230] Caches are synced for HPA
I0715 13:25:45.194567 1 shared_informer.go:230] Caches are synced for TTL
I0715 13:25:45.194768 1 shared_informer.go:230] Caches are synced for daemon sets
I0715 13:25:45.207187 1 shared_informer.go:230] Caches are synced for certificate-csrapproving
I0715 13:25:45.216229 1 shared_informer.go:230] Caches are synced for ReplicationController
I0715 13:25:45.241793 1 shared_informer.go:230] Caches are synced for PVC protection
I0715 13:25:45.243355 1 shared_informer.go:230] Caches are synced for stateful set
I0715 13:25:45.243875 1 shared_informer.go:230] Caches are synced for certificate-csrsigning
I0715 13:25:45.244362 1 shared_informer.go:230] Caches are synced for namespace
I0715 13:25:45.244444 1 shared_informer.go:230] Caches are synced for job
I0715 13:25:45.246481 1 shared_informer.go:230] Caches are synced for PV protection
I0715 13:25:45.253727 1 shared_informer.go:230] Caches are synced for node
I0715 13:25:45.253771 1 range_allocator.go:172] Starting range CIDR allocator
I0715 13:25:45.253795 1 shared_informer.go:223] Waiting for caches to sync for cidrallocator
I0715 13:25:45.253801 1 shared_informer.go:230] Caches are synced for cidrallocator
I0715 13:25:45.258720 1 shared_informer.go:230] Caches are synced for taint
I0715 13:25:45.258944 1 node_lifecycle_controller.go:1433] Initializing eviction metric for zone:
I0715 13:25:45.258902 1 taint_manager.go:187] Starting NoExecuteTaintManager
W0715 13:25:45.259154 1 node_lifecycle_controller.go:1048] Missing timestamp for Node fcos-slave1-dev. Assuming now as a timestamp.
W0715 13:25:45.259263 1 node_lifecycle_controller.go:1048] Missing timestamp for Node fcos-slave2-dev. Assuming now as a timestamp.
I0715 13:25:45.259215 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"fcos-slave1-dev", UID:"bb924c54-0697-432f-a9f3-ea51dbc8bf97", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node fcos-slave1-dev event: Registered Node fcos-slave1-dev in Controller
I0715 13:25:45.259342 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"fcos-slave2-dev", UID:"8727b383-f0f8-4c2f-a2ca-2aac710085a8", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node fcos-slave2-dev event: Registered Node fcos-slave2-dev in Controller
I0715 13:25:45.259355 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"fcos-master-dev", UID:"dc3d11d9-b56d-4787-9590-eaa61f8cc253", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node fcos-master-dev event: Registered Node fcos-master-dev in Controller
W0715 13:25:45.259324 1 node_lifecycle_controller.go:1048] Missing timestamp for Node fcos-master-dev. Assuming now as a timestamp.
I0715 13:25:45.259472 1 node_lifecycle_controller.go:1249] Controller detected that zone is now in state Normal.
I0715 13:25:45.343598 1 shared_informer.go:230] Caches are synced for endpoint
I0715 13:25:45.382536 1 shared_informer.go:230] Caches are synced for ReplicaSet
I0715 13:25:45.583482 1 shared_informer.go:230] Caches are synced for deployment
I0715 13:25:45.592706 1 shared_informer.go:230] Caches are synced for disruption
I0715 13:25:45.592741 1 disruption.go:339] Sending events to api server.
I0715 13:25:45.808565 1 shared_informer.go:230] Caches are synced for persistent volume
I0715 13:25:45.830840 1 shared_informer.go:230] Caches are synced for expand
I0715 13:25:46.745143 1 shared_informer.go:230] Caches are synced for garbage collector
I0715 13:25:46.746586 1 shared_informer.go:230] Caches are synced for resource quota
I0715 13:25:46.746984 1 shared_informer.go:230] Caches are synced for resource quota
I0715 13:25:46.756490 1 shared_informer.go:230] Caches are synced for garbage collector
I0715 13:25:46.756533 1 garbagecollector.go:142] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
W0715 13:25:55.198189 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: Error creating EndpointSlice for Service kube-system/coredns: Internal error occurred: resource quota evaluates timeout
W0715 13:25:55.198435 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/dashboard-metrics-scraper", retrying. Error: Error creating EndpointSlice for Service kube-system/dashboard-metrics-scraper: Internal error occurred: resource quota evaluates timeout
W0715 13:25:55.198774 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/kubernetes-dashboard", retrying. Error: Error creating EndpointSlice for Service kube-system/kubernetes-dashboard: Internal error occurred: resource quota evaluates timeout
W0715 13:25:55.199017 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "gitlab-managed-apps/tiller-deploy", retrying. Error: Error creating EndpointSlice for Service gitlab-managed-apps/tiller-deploy: Internal error occurred: resource quota evaluates timeout
I0715 13:25:55.199215 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"dashboard-metrics-scraper", UID:"53a1397a-0a38-4379-a3da-349327eace3f", APIVersion:"v1", ResourceVersion:"2285", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/dashboard-metrics-scraper: Error creating EndpointSlice for Service kube-system/dashboard-metrics-scraper: Internal error occurred: resource quota evaluates timeout
I0715 13:25:55.199244 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"946a067c-0f31-4097-b010-9f59bc5c9ac1", APIVersion:"v1", ResourceVersion:"2244", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: Error creating EndpointSlice for Service kube-system/coredns: Internal error occurred: resource quota evaluates timeout
I0715 13:25:55.199256 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"kubernetes-dashboard", UID:"50017748-677e-4099-a35d-5049e2360da2", APIVersion:"v1", ResourceVersion:"2282", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/kubernetes-dashboard: Error creating EndpointSlice for Service kube-system/kubernetes-dashboard: Internal error occurred: resource quota evaluates timeout
I0715 13:25:55.199272 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"gitlab-managed-apps", Name:"tiller-deploy", UID:"97e0de27-cb08-4fc7-a508-f24336a57fe0", APIVersion:"v1", ResourceVersion:"21317", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service gitlab-managed-apps/tiller-deploy: Error creating EndpointSlice for Service gitlab-managed-apps/tiller-deploy: Internal error occurred: resource quota evaluates timeout
E0715 13:25:57.133041 1 leaderelection.go:320] error retrieving resource lock kube-system/kube-controller-manager: Get https://192.168.30.118:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: context deadline exceeded
I0715 13:25:57.133127 1 leaderelection.go:277] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition
F0715 13:25:57.133220 1 controllermanager.go:279] leaderelection lost
endpoints
kubectl get endpoints --all-namespaces
NAMESPACE NAME ENDPOINTS AGE
default kubernetes 192.168.30.118:6443 9h
gitlab-managed-apps tiller-deploy 10.233.79.40:44134 3h20m
kube-system coredns 10.233.68.0:53,10.233.79.44:53,10.233.68.0:9153 + 3 more... 9h
kube-system dashboard-metrics-scraper 10.233.79.41:8000 9h
kube-system kube-controller-manager <none> 9h
kube-system kube-scheduler <none> 9h
kube-system kubernetes-dashboard 10.233.79.43:8443 9h
I'm not sure why kube-controller-manager and kube-scheduler are
I'm not sure this can be pinned to kubespray, looks like there already is some kubernetes issue about this weird one
https://github.com/kubernetes/kubernetes/issues/74340
Finally I know the reason, which is because the network issue. The leadership lost.
I0716 11:08:05.725721 1 leaderelection.go:287] failed to renew lease kube-system/kube-controller-manager: failed to tryAcquireOrRenew context deadline exceeded
F0716 11:08:05.726270 1 controllermanager.go:279] leaderelection lost
According to kuberneters document:
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/
--leader-elect-lease-duration duration Default: 15s
--leader-elect-renew-deadline duration Default: 10s
After update file
/etc/kubernetes/manifests/kube-controller-manager.yaml
with value "--leader-elect-lease-duration --leader-elect-renew-deadline"
This problem fixed.
So I think need ability to tune this value in k8s-cluster.yaml file
Finally I know the reason, which is because the network issue. The leadership lost.
I0716 11:08:05.725721 1 leaderelection.go:287] failed to renew lease kube-system/kube-controller-manager: failed to tryAcquireOrRenew context deadline exceeded F0716 11:08:05.726270 1 controllermanager.go:279] leaderelection lostAccording to kuberneters document:
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/
--leader-elect-lease-duration duration Default: 15s
--leader-elect-renew-deadline duration Default: 10sAfter update file
/etc/kubernetes/manifests/kube-controller-manager.yaml
with value "--leader-elect-lease-duration --leader-elect-renew-deadline"This problem fixed.
So I think need ability to tune this value in
k8s-cluster.yamlfile
If you care to submit a PR this would be very welcome
/close
@floryut: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
Finally I know the reason, which is because the network issue. The leadership lost.
According to kuberneters document:
After update file
/etc/kubernetes/manifests/kube-controller-manager.yaml
with value "--leader-elect-lease-duration --leader-elect-renew-deadline"
This problem fixed.
So I think need ability to tune this value in
k8s-cluster.yamlfile