stacked method results in anomalous restarts of of etcd kube-controller and kube-scheduler.
etcd kube-controller kube-scheduler restarts stacked
Choose one: BUG REPORT or FEATURE REQUEST
/kind bug
kubeadm version (use kubeadm version):
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:51:33Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Environment:
kubectl version):kubectl version --kubeconfig=/etc/kubernetes/admin.conf
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:54:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:43:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
ubuntu xenial on baremetal and on KVM virtual machines
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
uname -a):Linux testymaster1 4.4.0-131-generic #157-Ubuntu SMP Thu Jul 12 15:51:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
After starting a stacked master cluster following this method I get a few anomalous restarts. This does not happen when I follow the external etcd method, Is this normal? The cluster has been up for 19 hours now and the restarts have certainly quit, but I'm left wondering if something else is wrong. Each master has just 2 GB of ram (as per minimum specs here, I am going to rebuild the cluster with more RAM to see if this solves the issue.
Pods to be listed with zero restarts like my cluster with the external etcd method.
Follow the instructions here and give masters the minimum amount of RAM.
EDIT: I get less restarts if I add more RAM, but maybe more interestingly I can force much more if I lower a masters RAM to 1.5 GB).
kubectl get pods --all-namespaces:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-576cbf47c7-pnwvk 1/1 Running 0 19h
kube-system coredns-576cbf47c7-x9zhp 1/1 Running 0 19h
kube-system etcd-stackedmaster1 1/1 Running 0 19h
kube-system etcd-stackedmaster2 1/1 Running 4 19h
kube-system etcd-stackedmaster3 1/1 Running 4 19h
kube-system kube-apiserver-stackedmaster1 1/1 Running 0 19h
kube-system kube-apiserver-stackedmaster2 1/1 Running 0 19h
kube-system kube-apiserver-stackedmaster3 1/1 Running 0 19h
kube-system kube-controller-manager-stackedmaster1 1/1 Running 2 19h
kube-system kube-controller-manager-stackedmaster2 1/1 Running 0 19h
kube-system kube-controller-manager-stackedmaster3 1/1 Running 0 19h
kube-system kube-flannel-ds-amd64-8w2gb 1/1 Running 0 19h
kube-system kube-flannel-ds-amd64-bzmqh 1/1 Running 0 19h
kube-system kube-flannel-ds-amd64-hlthp 1/1 Running 0 19h
kube-system kube-flannel-ds-amd64-jwrgv 1/1 Running 0 19h
kube-system kube-flannel-ds-amd64-l2fxp 1/1 Running 0 19h
kube-system kube-flannel-ds-amd64-sr4r8 1/1 Running 0 19h
kube-system kube-flannel-ds-amd64-wkblh 1/1 Running 0 19h
kube-system kube-proxy-f687t 1/1 Running 0 19h
kube-system kube-proxy-nq27d 1/1 Running 0 19h
kube-system kube-proxy-qfx5d 1/1 Running 0 19h
kube-system kube-proxy-scp7w 1/1 Running 0 19h
kube-system kube-proxy-t7pvf 1/1 Running 0 19h
kube-system kube-proxy-t9q92 1/1 Running 0 19h
kube-system kube-proxy-x7rzh 1/1 Running 0 19h
kube-system kube-scheduler-stackedmaster1 1/1 Running 2 19h
kube-system kube-scheduler-stackedmaster2 1/1 Running 0 19h
kube-system kube-scheduler-stackedmaster3 1/1 Running 0 19h
logs from exited kube-scheduler:
I1125 21:33:32.383721 1 server.go:128] Version: v1.12.2
W1125 21:33:32.383901 1 defaults.go:210] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
W1125 21:33:32.385234 1 authorization.go:47] Authorization is disabled
W1125 21:33:32.385259 1 authentication.go:55] Authentication is disabled
I1125 21:33:32.385276 1 deprecated_insecure_serving.go:48] Serving healthz insecurely on 127.0.0.1:10251
I1125 21:33:33.287412 1 controller_utils.go:1027] Waiting for caches to sync for scheduler controller
I1125 21:33:33.387778 1 controller_utils.go:1034] Caches are synced for scheduler controller
I1125 21:33:33.387876 1 leaderelection.go:187] attempting to acquire leader lease kube-system/kube-scheduler...
E1125 21:33:43.388731 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-scheduler: Get https://10.0.23.138:6443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I1125 21:34:07.809823 1 leaderelection.go:196] successfully acquired lease kube-system/kube-scheduler
I1125 21:34:46.084808 1 leaderelection.go:231] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E1125 21:34:46.084955 1 server.go:207] lost master
lost lease
and logs from an exited kube-controller:
Flag --address has been deprecated, see --bind-address instead.
I1125 21:33:33.748218 1 serving.go:293] Generated self-signed cert (/var/run/kubernetes/kube-controller-manager.crt, /var/run/kubernetes/kube-controller-manager.key)
I1125 21:33:34.551311 1 controllermanager.go:143] Version: v1.12.2
I1125 21:33:34.552092 1 secure_serving.go:116] Serving securely on [::]:10257
I1125 21:33:34.552732 1 deprecated_insecure_serving.go:50] Serving insecurely on 127.0.0.1:10252
I1125 21:33:34.553016 1 leaderelection.go:187] attempting to acquire leader lease kube-system/kube-controller-manager...
E1125 21:33:44.553786 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://10.0.23.138:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I1125 21:34:05.473148 1 leaderelection.go:196] successfully acquired lease kube-system/kube-controller-manager
I1125 21:34:05.473303 1 event.go:221] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"86603655-f0f9-11e8-8876-525400e29e11", APIVersion:"v1", ResourceVersion:"577", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' stackedmaster1_c3192bc3-f0f9-11e8-98e3-525400e29e11 became leader
I1125 21:34:05.495117 1 plugins.go:99] No cloud provider specified.
I1125 21:34:05.498011 1 controller_utils.go:1027] Waiting for caches to sync for tokens controller
I1125 21:34:05.527405 1 controllermanager.go:497] Started "tokencleaner"
I1125 21:34:05.527586 1 tokencleaner.go:116] Starting token cleaner controller
I1125 21:34:05.527617 1 controller_utils.go:1027] Waiting for caches to sync for token_cleaner controller
I1125 21:34:05.598302 1 controller_utils.go:1034] Caches are synced for tokens controller
I1125 21:34:05.614620 1 controllermanager.go:497] Started "horizontalpodautoscaling"
I1125 21:34:05.614692 1 horizontal.go:156] Starting HPA controller
I1125 21:34:05.614749 1 controller_utils.go:1027] Waiting for caches to sync for HPA controller
I1125 21:34:05.627913 1 controller_utils.go:1034] Caches are synced for token_cleaner controller
I1125 21:34:05.642177 1 controllermanager.go:497] Started "persistentvolume-expander"
I1125 21:34:05.642357 1 expand_controller.go:153] Starting expand controller
I1125 21:34:05.642400 1 controller_utils.go:1027] Waiting for caches to sync for expand controller
I1125 21:34:05.667740 1 controllermanager.go:497] Started "persistentvolume-binder"
I1125 21:34:05.667945 1 pv_controller_base.go:271] Starting persistent volume controller
I1125 21:34:05.667979 1 controller_utils.go:1027] Waiting for caches to sync for persistent volume controller
I1125 21:34:05.712740 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {extensions deployments}
I1125 21:34:05.712838 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {apps deployments}
I1125 21:34:05.712890 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {rbac.authorization.k8s.io rolebindings}
I1125 21:34:05.713085 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {extensions replicasets}
I1125 21:34:05.713153 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for { podtemplates}
I1125 21:34:05.713201 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {extensions ingresses}
I1125 21:34:05.713262 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {events.k8s.io events}
I1125 21:34:05.713323 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {autoscaling horizontalpodautoscalers}
I1125 21:34:05.713375 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {extensions daemonsets}
I1125 21:34:05.713430 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {batch jobs}
I1125 21:34:05.713514 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {apps statefulsets}
I1125 21:34:05.713642 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {apps controllerrevisions}
I1125 21:34:05.713705 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {networking.k8s.io networkpolicies}
I1125 21:34:05.713765 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {coordination.k8s.io leases}
I1125 21:34:05.713821 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for { limitranges}
I1125 21:34:05.713878 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {apps daemonsets}
I1125 21:34:05.713949 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {apps replicasets}
I1125 21:34:05.714017 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {batch cronjobs}
I1125 21:34:05.714105 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for { endpoints}
W1125 21:34:05.714148 1 shared_informer.go:311] resyncPeriod 58614244363215 is smaller than resyncCheckPeriod 73824011016736 and the informer has already started. Changing it to 73824011016736
I1125 21:34:05.714264 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for { serviceaccounts}
I1125 21:34:05.714331 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {rbac.authorization.k8s.io roles}
I1125 21:34:05.714415 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for {policy poddisruptionbudgets}
E1125 21:34:05.714538 1 resource_quota_controller.go:173] initial monitor sync has error: couldn't start monitor for resource {"extensions" "v1beta1" "networkpolicies"}: unable to monitor quota for resource "extensions/v1beta1, Resource=networkpolicies"
I1125 21:34:05.714581 1 controllermanager.go:497] Started "resourcequota"
I1125 21:34:05.714638 1 resource_quota_controller.go:278] Starting resource quota controller
I1125 21:34:05.714766 1 controller_utils.go:1027] Waiting for caches to sync for resource quota controller
I1125 21:34:05.714831 1 resource_quota_monitor.go:301] QuotaMonitor running
W1125 21:34:05.752778 1 garbagecollector.go:649] failed to discover preferred resources: the cache has not been filled yet
I1125 21:34:05.753360 1 garbagecollector.go:133] Starting garbage collector controller
I1125 21:34:05.753410 1 controller_utils.go:1027] Waiting for caches to sync for garbage collector controller
I1125 21:34:05.753423 1 controllermanager.go:497] Started "garbagecollector"
I1125 21:34:05.753449 1 graph_builder.go:308] GraphBuilder running
I1125 21:34:05.787386 1 controllermanager.go:497] Started "disruption"
I1125 21:34:05.787434 1 disruption.go:288] Starting disruption controller
I1125 21:34:05.787690 1 controller_utils.go:1027] Waiting for caches to sync for disruption controller
I1125 21:34:05.837700 1 controllermanager.go:497] Started "csrsigning"
I1125 21:34:05.837764 1 certificate_controller.go:113] Starting certificate controller
I1125 21:34:05.837805 1 controller_utils.go:1027] Waiting for caches to sync for certificate controller
I1125 21:34:05.986368 1 taint_manager.go:190] Sending events to api server.
I1125 21:34:05.986513 1 node_lifecycle_controller.go:349] Controller will taint node by condition.
I1125 21:34:05.986572 1 controllermanager.go:497] Started "nodelifecycle"
I1125 21:34:05.986678 1 node_lifecycle_controller.go:386] Starting node controller
I1125 21:34:05.986703 1 controller_utils.go:1027] Waiting for caches to sync for taint controller
I1125 21:34:06.135096 1 controllermanager.go:497] Started "endpoint"
I1125 21:34:06.135206 1 endpoints_controller.go:149] Starting endpoint controller
I1125 21:34:06.135224 1 controller_utils.go:1027] Waiting for caches to sync for endpoint controller
I1125 21:34:06.286285 1 controllermanager.go:497] Started "replicaset"
I1125 21:34:06.286404 1 replica_set.go:182] Starting replicaset controller
I1125 21:34:06.286425 1 controller_utils.go:1027] Waiting for caches to sync for ReplicaSet controller
I1125 21:34:06.435850 1 controllermanager.go:497] Started "job"
I1125 21:34:06.435904 1 job_controller.go:143] Starting job controller
I1125 21:34:06.435929 1 controller_utils.go:1027] Waiting for caches to sync for job controller
I1125 21:34:06.576722 1 node_ipam_controller.go:99] Sending events to api server.
I1125 21:34:16.593119 1 range_allocator.go:78] Sending events to api server.
I1125 21:34:16.593302 1 range_allocator.go:99] No Service CIDR provided. Skipping filtering out service addresses.
I1125 21:34:16.593326 1 range_allocator.go:108] Node stackedmaster1 has CIDR 10.244.0.0/24, occupying it in CIDR map
I1125 21:34:16.593357 1 range_allocator.go:108] Node stackedmaster2 has CIDR 10.244.1.0/24, occupying it in CIDR map
I1125 21:34:16.593411 1 controllermanager.go:497] Started "nodeipam"
I1125 21:34:16.593580 1 node_ipam_controller.go:168] Starting ipam controller
I1125 21:34:16.593620 1 controller_utils.go:1027] Waiting for caches to sync for node controller
I1125 21:34:16.634339 1 controllermanager.go:497] Started "attachdetach"
I1125 21:34:16.634398 1 attach_detach_controller.go:315] Starting attach detach controller
I1125 21:34:16.634427 1 controller_utils.go:1027] Waiting for caches to sync for attach detach controller
I1125 21:34:16.672925 1 controllermanager.go:497] Started "namespace"
I1125 21:34:16.673076 1 namespace_controller.go:186] Starting namespace controller
I1125 21:34:16.673108 1 controller_utils.go:1027] Waiting for caches to sync for namespace controller
I1125 21:34:16.696433 1 controllermanager.go:497] Started "daemonset"
I1125 21:34:16.696473 1 daemon_controller.go:270] Starting daemon sets controller
I1125 21:34:16.696516 1 controller_utils.go:1027] Waiting for caches to sync for daemon sets controller
I1125 21:34:16.720609 1 controllermanager.go:497] Started "deployment"
I1125 21:34:16.720843 1 deployment_controller.go:152] Starting deployment controller
I1125 21:34:16.720883 1 controller_utils.go:1027] Waiting for caches to sync for deployment controller
I1125 21:34:16.750427 1 controllermanager.go:497] Started "statefulset"
I1125 21:34:16.750669 1 stateful_set.go:151] Starting stateful set controller
I1125 21:34:16.750705 1 controller_utils.go:1027] Waiting for caches to sync for stateful set controller
I1125 21:34:16.773852 1 controllermanager.go:497] Started "ttl"
I1125 21:34:16.774049 1 ttl_controller.go:116] Starting TTL controller
I1125 21:34:16.774080 1 controller_utils.go:1027] Waiting for caches to sync for TTL controller
I1125 21:34:16.797242 1 controllermanager.go:497] Started "bootstrapsigner"
I1125 21:34:16.797347 1 controller_utils.go:1027] Waiting for caches to sync for bootstrap_signer controller
I1125 21:34:16.820402 1 controllermanager.go:497] Started "replicationcontroller"
I1125 21:34:16.820572 1 replica_set.go:182] Starting replicationcontroller controller
I1125 21:34:16.820609 1 controller_utils.go:1027] Waiting for caches to sync for ReplicationController controller
I1125 21:34:16.844411 1 controllermanager.go:497] Started "serviceaccount"
I1125 21:34:16.844453 1 serviceaccounts_controller.go:115] Starting service account controller
I1125 21:34:16.844521 1 controller_utils.go:1027] Waiting for caches to sync for service account controller
W1125 21:34:16.844463 1 controllermanager.go:489] Skipping "ttl-after-finished"
I1125 21:34:16.866398 1 controllermanager.go:497] Started "csrcleaner"
I1125 21:34:16.866603 1 cleaner.go:81] Starting CSR cleaner controller
E1125 21:34:16.905582 1 core.go:76] Failed to start service controller: WARNING: no cloud provider provided, services of type LoadBalancer will fail
W1125 21:34:16.905636 1 controllermanager.go:489] Skipping "service"
I1125 21:34:17.047774 1 controllermanager.go:497] Started "csrapproving"
W1125 21:34:17.047827 1 core.go:154] configure-cloud-routes is set, but no cloud provider specified. Will not configure cloud provider routes.
W1125 21:34:17.047844 1 controllermanager.go:489] Skipping "route"
I1125 21:34:17.047914 1 certificate_controller.go:113] Starting certificate controller
I1125 21:34:17.047946 1 controller_utils.go:1027] Waiting for caches to sync for certificate controller
I1125 21:34:17.208130 1 controllermanager.go:497] Started "clusterrole-aggregation"
I1125 21:34:17.208295 1 clusterroleaggregation_controller.go:148] Starting ClusterRoleAggregator
I1125 21:34:17.208335 1 controller_utils.go:1027] Waiting for caches to sync for ClusterRoleAggregator controller
I1125 21:34:17.355897 1 controllermanager.go:497] Started "pvc-protection"
I1125 21:34:17.355954 1 pvc_protection_controller.go:99] Starting PVC protection controller
I1125 21:34:17.355981 1 controller_utils.go:1027] Waiting for caches to sync for PVC protection controller
I1125 21:34:17.505765 1 controllermanager.go:497] Started "pv-protection"
I1125 21:34:17.505808 1 pv_protection_controller.go:81] Starting PV protection controller
I1125 21:34:17.505845 1 controller_utils.go:1027] Waiting for caches to sync for PV protection controller
I1125 21:34:17.658047 1 controllermanager.go:497] Started "podgc"
I1125 21:34:17.658254 1 gc_controller.go:76] Starting GC controller
I1125 21:34:17.660931 1 controller_utils.go:1027] Waiting for caches to sync for GC controller
I1125 21:34:17.805125 1 controllermanager.go:497] Started "cronjob"
I1125 21:34:17.805223 1 cronjob_controller.go:94] Starting CronJob Manager
E1125 21:34:17.805802 1 resource_quota_controller.go:460] failed to sync resource monitors: couldn't start monitor for resource {"extensions" "v1beta1" "networkpolicies"}: unable to monitor quota for resource "extensions/v1beta1, Resource=networkpolicies"
I1125 21:34:17.823381 1 controller_utils.go:1027] Waiting for caches to sync for garbage collector controller
I1125 21:34:17.835417 1 controller_utils.go:1034] Caches are synced for endpoint controller
I1125 21:34:17.836135 1 controller_utils.go:1034] Caches are synced for job controller
I1125 21:34:17.838072 1 controller_utils.go:1034] Caches are synced for certificate controller
I1125 21:34:17.842641 1 controller_utils.go:1034] Caches are synced for expand controller
I1125 21:34:17.844657 1 controller_utils.go:1034] Caches are synced for service account controller
I1125 21:34:17.848336 1 controller_utils.go:1034] Caches are synced for certificate controller
I1125 21:34:17.856202 1 controller_utils.go:1034] Caches are synced for PVC protection controller
I1125 21:34:17.861338 1 controller_utils.go:1034] Caches are synced for GC controller
I1125 21:34:17.873455 1 controller_utils.go:1034] Caches are synced for namespace controller
I1125 21:34:17.886678 1 controller_utils.go:1034] Caches are synced for ReplicaSet controller
I1125 21:34:17.897670 1 controller_utils.go:1034] Caches are synced for bootstrap_signer controller
I1125 21:34:17.906062 1 controller_utils.go:1034] Caches are synced for PV protection controller
I1125 21:34:17.908679 1 controller_utils.go:1034] Caches are synced for ClusterRoleAggregator controller
I1125 21:34:17.915224 1 controller_utils.go:1034] Caches are synced for HPA controller
I1125 21:34:17.921092 1 controller_utils.go:1034] Caches are synced for deployment controller
I1125 21:34:18.021021 1 controller_utils.go:1034] Caches are synced for ReplicationController controller
I1125 21:34:18.051122 1 controller_utils.go:1034] Caches are synced for stateful set controller
W1125 21:34:18.059647 1 actual_state_of_world.go:491] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="stackedmaster1" does not exist
W1125 21:34:18.060318 1 actual_state_of_world.go:491] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="stackedmaster2" does not exist
I1125 21:34:18.068269 1 controller_utils.go:1034] Caches are synced for persistent volume controller
I1125 21:34:18.074430 1 controller_utils.go:1034] Caches are synced for TTL controller
I1125 21:34:18.093880 1 controller_utils.go:1034] Caches are synced for node controller
I1125 21:34:18.093946 1 range_allocator.go:157] Starting range CIDR allocator
I1125 21:34:18.093963 1 controller_utils.go:1027] Waiting for caches to sync for cidrallocator controller
I1125 21:34:18.096882 1 controller_utils.go:1034] Caches are synced for daemon sets controller
I1125 21:34:18.134837 1 controller_utils.go:1034] Caches are synced for attach detach controller
I1125 21:34:18.188114 1 controller_utils.go:1034] Caches are synced for disruption controller
I1125 21:34:18.188155 1 disruption.go:296] Sending events to api server.
I1125 21:34:18.194225 1 controller_utils.go:1034] Caches are synced for cidrallocator controller
I1125 21:34:18.215179 1 controller_utils.go:1034] Caches are synced for resource quota controller
I1125 21:34:18.287058 1 controller_utils.go:1034] Caches are synced for taint controller
I1125 21:34:18.287276 1 node_lifecycle_controller.go:1165] Initializing eviction metric for zone:
I1125 21:34:18.287295 1 taint_manager.go:211] Starting NoExecuteTaintManager
W1125 21:34:18.287382 1 node_lifecycle_controller.go:852] Missing timestamp for Node stackedmaster1. Assuming now as a timestamp.
W1125 21:34:18.287783 1 node_lifecycle_controller.go:852] Missing timestamp for Node stackedmaster2. Assuming now as a timestamp.
I1125 21:34:18.287850 1 node_lifecycle_controller.go:1065] Controller detected that zone is now in state Normal.
I1125 21:34:18.423845 1 controller_utils.go:1034] Caches are synced for garbage collector controller
I1125 21:34:18.453816 1 controller_utils.go:1034] Caches are synced for garbage collector controller
I1125 21:34:18.453870 1 garbagecollector.go:142] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
I1125 21:34:45.737644 1 leaderelection.go:231] failed to renew lease kube-system/kube-controller-manager: failed to tryAcquireOrRenew context deadline exceeded
E1125 21:34:45.737800 1 leaderelection.go:252] error retrieving resource lock kube-system/kube-controller-manager: Get https://10.0.23.138:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
F1125 21:34:45.737842 1 controllermanager.go:238] leaderelection lost
I1125 21:34:45.820653 1 daemon_controller.go:284] Shutting down daemon sets controller
logs from an exited etcd container:
2018-11-25 21:32:57.947201 I | etcdmain: etcd Version: 3.2.24
2018-11-25 21:32:57.947494 I | etcdmain: Git SHA: 420a45226
2018-11-25 21:32:57.947577 I | etcdmain: Go Version: go1.8.7
2018-11-25 21:32:57.947665 I | etcdmain: Go OS/Arch: linux/amd64
2018-11-25 21:32:57.947711 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
2018-11-25 21:32:57.948089 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-11-25 21:32:57.948143 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true
2018-11-25 21:32:57.950757 I | embed: listening for peers on https://10.0.23.139:2380
2018-11-25 21:32:57.950829 I | embed: listening for client requests on 10.0.23.139:2379
2018-11-25 21:32:57.950876 I | embed: listening for client requests on 127.0.0.1:2379
2018-11-25 21:32:58.000472 C | etcdmain: error validating peerURLs {ClusterID:8b3ed401f61ac4ba Members:[&{ID:5c12b785e3ca3e16 RaftAttributes:{PeerURLs:[https://10.0.23.138:2380]} Attributes:{Name:stackedmaster1 ClientURLs:[https://10.0.23.138:2379]}}] RemovedMemberIDs:[]}: member count is unequal
I am going to rebuild the cluster with more RAM to see if this solves the issue.
that would be a good test case.
cc @fabriziopandini
@neolit123 @fabriziopandini tried again with 2.8 GB on the primary and 2.5 on the two other masters with the same results, adding another gig to 3.8 and 3.5 respectively and gave it another go, but still a good 4 restarts for the etcd and 1 scheduler and 1 controller restart each pretty much the same logs as above.
in general this doesn't seem like a kubeadm specific problem, but rather a kubernetes problem.
@neolit123 (thanks for the correction), as for kubernetes issue possibly so, it might also just be a timing issue, I'm still investigating when and why this happens. It is also something that appears to work itself out and stop, so I might just be nitpicking.
For HA this is unsuprising to me.
For HA clustering, we should update the documentation on memory limits
/kind documentation
@timothysc adding memory did not seem to help though, but my investigations so far haven't turned up a true culprit as to what is going wrong.
lazy GC. possibly memory leaks too.
@fabriziopandini how much RAM were you giving for a CP node in your test setup?
@neolit123 I give 2GB, but I usually throw away the cluster after completing a test case (max 2/ hours).
As soon as I have a little bit of time I will create a cluster somewhere in cloud and try to keep it up for a longer time.
@joshuacox the error to be investigated are the ectd errors (when etcd fails, then scheduler and controller manager fails "by design").
@fabriziopandini
after completing a test case
might be a good idea to see the N of pod restarts right before the cluster terminates.
@joshuacox
Pods to be listed with zero restarts like my cluster with the external etcd method.
something else to note. FYI, from time to time i see random pod restarts with a lot more RAM for a single control plane node.
@fabriziopandini the freshest exited etcd container errors I get from a freshly provisioned cluster is:
2018-11-27 15:09:14.044375 I | etcdmain: etcd Version: 3.2.24
2018-11-27 15:09:14.044590 I | etcdmain: Git SHA: 420a45226
2018-11-27 15:09:14.044603 I | etcdmain: Go Version: go1.8.7
2018-11-27 15:09:14.044614 I | etcdmain: Go OS/Arch: linux/amd64
2018-11-27 15:09:14.044625 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2018-11-27 15:09:14.044737 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-11-27 15:09:14.044803 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true
2018-11-27 15:09:14.046379 I | embed: listening for peers on https://10.0.23.140:2380
2018-11-27 15:09:14.046468 I | embed: listening for client requests on 10.0.23.140:2379
2018-11-27 15:09:14.046531 I | embed: listening for client requests on 127.0.0.1:2379
2018-11-27 15:09:14.082082 C | etcdmain: error validating peerURLs {ClusterID:8b3ed401f61ac4ba Members:[&{ID:5c12b785e3ca3e16 RaftAttributes:{PeerURLs:[https://10.0.23.138:2380]} Attributes:{Name:stackedmaster1 ClientURLs:[https://10.0.23.138:2379]}} &{ID:bae4d2c7059d21f5 RaftAttributes:{PeerURLs:[https://10.0.23.139:2380]} Attributes:{Name:stackedmaster2 ClientURLs:[https://10.0.23.139:2379]}}] RemovedMemberIDs:[]}: member count is unequal
with the manifest for that etcd:
flux ~/.kubash ‹1.12.3› » ssh [email protected] 'cat /etc/kubernetes/manifests/etcd.yaml'
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://10.0.23.140:2379
- --initial-advertise-peer-urls=https://10.0.23.140:2380
- --initial-cluster=stackedmaster1=https://10.0.23.138:2380,stackedmaster2=https://10.0.23.139:2380,stackedmaster3=https://10.0.23.140:2380
- --initial-cluster-state=existing
- --listen-client-urls=https://127.0.0.1:2379,https://10.0.23.140:2379
- --listen-peer-urls=https://10.0.23.140:2380
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --name=stackedmaster3
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: k8s.gcr.io/etcd:3.2.24
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -ec
- ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
get foo
failureThreshold: 8
initialDelaySeconds: 15
timeoutSeconds: 15
name: etcd
resources: {}
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
status: {}
generated from this kubeadmcfg.yaml:
flux ~/.kubash ‹1.12.3› » ssh [email protected] 'cat /etc/kubernetes/kubeadmcfg.yaml'
apiVersion: kubeadm.k8s.io/v1alpha3
kind: ClusterConfiguration
apiServerCertSANs:
- "127.0.0.1"
- "10.0.23.158"
- "10.0.23.138"
- "10.0.23.139"
- "10.0.23.140"
- "10.0.23.144"
- "10.0.23.145"
- "10.0.23.146"
controlPlaneEndpoint: "10.0.23.138:6443"
etcd:
local:
serverCertSANs:
- "10.0.23.140"
- "stackedmaster3"
peerCertSANs:
- "10.0.23.140"
- "stackedmaster3"
extraArgs:
listen-client-urls: "https://127.0.0.1:2379,https://10.0.23.140:2379"
advertise-client-urls: "https://10.0.23.140:2379"
listen-peer-urls: "https://10.0.23.140:2380"
initial-advertise-peer-urls: "https://10.0.23.140:2380"
initial-cluster: "stackedmaster1=https://10.0.23.138:2380,stackedmaster2=https://10.0.23.139:2380,stackedmaster3=https://10.0.23.140:2380"
initial-cluster-state: existing
networking:
podSubnet: 10.244.0.0/16
seems like a etcd issue to me:
2018-11-27 15:09:14.082082 C | etcdmain: error validating peerURLs ... member count is unequal
this post mentions that this can happen from time to time:
https://crewjam.com/etcd-aws/
I’ve observed cases where new nodes fail to join existing clusters with a message like this:
etcdmain: error validating peerURLs {ClusterID:500f903265bef4ea Members:[&{ID:7452025f0b7cee3e RaftAttributes:{PeerURLs:[http://10.0.133.146:2380]} Attributes:{Name:i-c8ccfa12 ClientURLs:[http://10.0.133.146:2379]}}] RemovedMemberIDs:[]}: member count is unequal
This can be resolved by telling an existing node of the cluster about the new node just before starting the new etcd. We can do this by manually joining the node to the cluster by making a POST request to the /v2/members endpoint on one of the existing nodes.
adding members using the v3 API can be done using proto buffers:
https://github.com/etcd-io/etcd/blob/master/etcdserver/etcdserverpb/rpc.proto#L136-L143
@joshuacox
there is something strange in the error message:
etcdmain: error validating peerURLs {
ClusterID:8b3ed401f61ac4ba
Members:[
&{ID:5c12b785e3ca3e16
RaftAttributes:{PeerURLs:[https://10.0.23.138:2380]}
Attributes:{Name:stackedmaster1 ClientURLs:[https://10.0.23.138:2379]}
}
&{ID:bae4d2c7059d21f5
RaftAttributes:{PeerURLs:[https://10.0.23.139:2380]}
Attributes:{Name:stackedmaster2 ClientURLs:[https://10.0.23.139:2379]}
}
] RemovedMemberIDs:[]}: member count is unequal
I don't see the third member stackedmaster3 https://10.0.23.140:2379 . Might be you forgot to run etcd meber add on the last node?
Ohh.. I noticed just now that also @neolit123 is on the same page...
@fabriziopandini "etcd member add"? I am following the steps here
EDIT: possibly you mean the line here:
kubectl exec -n kube-system etcd-${CP0_HOSTNAME} -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://${CP0_IP}:2379 member add ${CP2_HOSTNAME} https://${CP2_IP}:2380
which is added by me (or more specifically by this script).
@joshuacox
why are you creating etcd manifest twice (L19 and L27)? IMO L19 should go away...
I'm also not sure about sleep command in L23, because in this leaves the cluster without the quorum for some time when moving from one etcd member to two...
PS. be aware that things are changing in v1.13 (part of the complexity for stacked etcd setup is going away; phases are graduating). If you are starting now might be better to take a look a the new release due this week (otherwise you can leverage at any time on the upgrade procedure)
@fabriziopandini thanks for helping me find that mistake, eliminating line 19 and 23 results in:
kube-system etcd-stackedmaster1 1/1 Running 0 3m
kube-system etcd-stackedmaster2 0/1 CrashLoopBackOff 5 3m
kube-system etcd-stackedmaster3 0/1 CrashLoopBackOff 5 3m
kube-system kube-apiserver-stackedmaster1 1/1 Running 0 2m
kube-system kube-apiserver-stackedmaster2 0/1 Error 4 3m
etcd is indeed still dying with the same messages:
root@stackedmaster2:~# docker logs a83
2018-11-28 16:06:40.722112 I | etcdmain: etcd Version: 3.2.24
2018-11-28 16:06:40.722473 I | etcdmain: Git SHA: 420a45226
2018-11-28 16:06:40.722511 I | etcdmain: Go Version: go1.8.7
2018-11-28 16:06:40.722543 I | etcdmain: Go OS/Arch: linux/amd64
2018-11-28 16:06:40.722576 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2018-11-28 16:06:40.722765 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-11-28 16:06:40.722867 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true
2018-11-28 16:06:40.724508 I | embed: listening for peers on https://10.0.23.139:2380
2018-11-28 16:06:40.724700 I | embed: listening for client requests on 10.0.23.139:2379
2018-11-28 16:06:40.724832 I | embed: listening for client requests on 127.0.0.1:2379
2018-11-28 16:06:40.759702 C | etcdmain: error validating peerURLs {ClusterID:8b3ed401f61ac4ba Members:[&{ID:5c12b785e3ca3e16 RaftAttributes:{PeerURLs:[https://10.0.23.138:2380]} Attributes:{Name:stackedmaster1 ClientURLs:[https://10.0.23.138:2379]}}] RemovedMemberIDs:[]}: member count is unequal
the script is shortened to just:
#!/bin/bash
export CP0_IP=$1
export CP0_HOSTNAME=$2
export CP1_IP=$3
export CP1_HOSTNAME=$4
export KUBECONFIG=/etc/kubernetes/admin.conf
mkdir -p ~/.kube
cp -v /etc/kubernetes/admin.conf ~/.kube/config
kubeadm alpha phase certs all --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase kubelet config write-to-disk --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase kubelet write-env-file --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase kubeconfig kubelet --config /etc/kubernetes/kubeadmcfg.yaml
systemctl restart kubelet
echo "kubectl --kubeconfig=/etc/kubernetes/admin.conf exec -n kube-system etcd-${CP0_HOSTNAME} -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://${CP0_IP}:2379 member add ${CP1_HOSTNAME} https://${CP1_IP}:2380"
kubectl --kubeconfig=/etc/kubernetes/admin.conf exec -n kube-system etcd-${CP0_HOSTNAME} -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://${CP0_IP}:2379 member add ${CP1_HOSTNAME} https://${CP1_IP}:2380
kubeadm alpha phase etcd local --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase kubeconfig all --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase controlplane all --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase kubelet config annotate-cri --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase mark-master --config /etc/kubernetes/kubeadmcfg.yaml
Is there a version of this page for v1.13 yet? I'll gladly start prepping for it.
thanks for confirming he problem @joshuacox
Is there a version of this page for v1.13 yet? I'll gladly start prepping for it.
the PR is in flight and will be merged before release date. (EDIT: already merged)
wrong button.
I will note that this works:
#!/bin/bash
#export CP0_IP=10.0.0.7
#export CP0_HOSTNAME=cp0
#export CP1_IP=10.0.0.8
#export CP1_HOSTNAME=cp1
export CP0_IP=$1
export CP0_HOSTNAME=$2
export CP1_IP=$3
export CP1_HOSTNAME=$4
export KUBECONFIG=/etc/kubernetes/admin.conf
mkdir -p ~/.kube
cp -v /etc/kubernetes/admin.conf ~/.kube/config
kubeadm alpha phase certs all --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase kubelet config write-to-disk --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase kubelet write-env-file --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase kubeconfig kubelet --config /etc/kubernetes/kubeadmcfg.yaml
systemctl restart kubelet
kubeadm alpha phase etcd local --config /etc/kubernetes/kubeadmcfg.yaml
echo "kubectl --kubeconfig=/etc/kubernetes/admin.conf exec -n kube-system etcd-${CP0_HOSTNAME} -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://${CP0_IP}:2379 member add ${CP1_HOSTNAME} https://${CP1_IP}:2380"
sleep 66
kubectl --kubeconfig=/etc/kubernetes/admin.conf exec -n kube-system etcd-${CP0_HOSTNAME} -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://${CP0_IP}:2379 member add ${CP1_HOSTNAME} https://${CP1_IP}:2380
#sleep 66
#kubeadm alpha phase etcd local --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase kubeconfig all --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase controlplane all --config /etc/kubernetes/kubeadmcfg.yaml
kubeadm alpha phase kubelet config annotate-cri --config /etc/kubernete
where the etcd config line is before the exec, and the wait is 66, if I shorten the wait to 36, it fails into crashloopbackoff
EDIT: success down to 46 second wait, with a total run time of 6 minutes 9 seconds, which is getting close to the 5:57 of the external etcd method (that method is spinning up three more VMs though)
further edit: down to 41 seconds now, watch -n1 docker ps -a on the second master shows the dance,
root@stackedmaster2:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d632e998822a 6b54f7bebd72 "kube-apiserver --..." 56 seconds ago Up 54 seconds k8s_kube-apiserver_kube-apiserver-stackedmaster2_kube-system_dfc54f21359ea31af9b1fe95ba083027_0
4d6400997016 5e75513787b1 "kube-scheduler --..." 56 seconds ago Up 55 seconds k8s_kube-scheduler_kube-scheduler-stackedmaster2_kube-system_7f99b6875de942b000954351c4ac09b5_0
27a68e37292e c79022eb8bc9 "kube-controller-m..." 56 seconds ago Up 55 seconds k8s_kube-controller-manager_kube-controller-manager-stackedmaster2_kube-system_680244d4e443bbd91d73dd339764015a_0
e081f994bbc6 k8s.gcr.io/pause:3.1 "/pause" 57 seconds ago Up 55 seconds k8s_POD_kube-apiserver-stackedmaster2_kube-system_dfc54f21359ea31af9b1fe95ba083027_0
565b285ae7ff k8s.gcr.io/pause:3.1 "/pause" 57 seconds ago Up 55 seconds k8s_POD_kube-scheduler-stackedmaster2_kube-system_7f99b6875de942b000954351c4ac09b5_0
4c5e78cce062 k8s.gcr.io/pause:3.1 "/pause" 57 seconds ago Up 56 seconds k8s_POD_kube-controller-manager-stackedmaster2_kube-system_680244d4e443bbd91d73dd339764015a_0
e649f891ae9c 3cab8e1b9802 "etcd --advertise-..." About a minute ago Up About a minute k8s_etcd_etcd-stackedmaster2_kube-system_68eaa0fd2a8b1629677785a2912c2809_3
5ddf1033d1f4 3cab8e1b9802 "etcd --advertise-..." About a minute ago Exited (1) About a minute ago k8s_etcd_etcd-stackedmaster2_kube-system_68eaa0fd2a8b1629677785a2912c2809_2
8a48140a45c4 367cdc8433a4 "/coredns -conf /e..." About a minute ago Up About a minute k8s_coredns_coredns-576cbf47c7-t9q2f_kube-system_77ab7548-f33d-11e8-a993-525400e29e11_0
ac34e606190b 367cdc8433a4 "/coredns -conf /e..." About a minute ago Up About a minute k8s_coredns_coredns-576cbf47c7-qk85h_kube-system_77adb574-f33d-11e8-a993-525400e29e11_0
7481b2f5d4ac ab97fa69b926 "/usr/local/bin/ku..." About a minute ago Up About a minute k8s_kube-proxy_kube-proxy-7cgjm_kube-system_79a74327-f33d-11e8-a993-525400e29e11_0
c3394f46a1fb k8s.gcr.io/pause:3.1 "/pause" About a minute ago Up About a minute k8s_POD_kube-proxy-7cgjm_kube-system_79a74327-f33d-11e8-a993-525400e29e11_0
20b4ac28e5bb k8s.gcr.io/pause:3.1 "/pause" About a minute ago Up About a minute k8s_POD_coredns-576cbf47c7-qk85h_kube-system_77adb574-f33d-11e8-a993-525400e29e11_0
e310515d2600 k8s.gcr.io/pause:3.1 "/pause" About a minute ago Up About a minute k8s_POD_coredns-576cbf47c7-t9q2f_kube-system_77ab7548-f33d-11e8-a993-525400e29e11_0
dc6cb91aa176 k8s.gcr.io/pause:3.1 "/pause" About a minute ago Up About a minute k8s_POD_etcd-stackedmaster2_kube-system_68eaa0fd2a8b1629677785a2912c2809_0
Here the etcd container finally succeeds, and everything goes online a few seconds later. I guess likely candidates for the race condition are either the core-dns or the etcd on the primary master
@joshuacox
i think this is expected because the kubelet needs time to pick up the new manifest. except its hard to document it in the old instructions.
in the most recent docs we've added a instruction to wait for all the pods to come up before joining new members:
https://github.com/kubernetes/website/pull/11094
do you think we can close this issue now?
closing, I'm implementing a wait sort of like the one here
@joshuacox
yes a deterministic wait is even better.
Most helpful comment
wrong button.