Kubespray: Deployment failed on Rotate Tokens with default vagrant settings

Created on 30 Oct 2017 · 27Comments · Source: kubernetes-sigs/kubespray

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Environment:

Cloud provider or hardware configuration:
Vagrant

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
(of the host)
Linux 4.9.0-3-amd64 x86_64

PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Version of Ansible (ansible --version):
ansible 2.4.1.0

Kubespray version (commit) (git rev-parse --short HEAD):
97c68810

Network plugin used:

Copy of your inventory file:

Command used to invoke ansible:
vagrant up

Output of ansible run:

Whole output:
https://gist.github.com/guenhter/289ba384f2f35c050bf1c9f928f6514b

Lines with error:

TASK [kubernetes-apps/rotate_tokens : Rotate Tokens | Get default token name] ***
Monday 30 October 2017  15:44:52 +0100 (0:00:00.090)       0:11:33.423 ******** 
fatal: [k8s-01]: FAILED! => {"changed": true, "cmd": "/usr/local/bin/kubectl get secrets -o custom-columns=name:{.metadata.name} --no-headers | grep -m1 default-token", "delta": "0:00:00.332860", "end": "2017-10-30 14:44:51.717376", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2017-10-30 14:44:51.384516", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

NO MORE HOSTS LEFT *************************************************************
    to retry, use: --limit @/home/guenthgr/workspaces/kubespray/cluster.retry

PLAY RECAP *********************************************************************
k8s-01                     : ok=327  changed=103  unreachable=0    failed=1   
k8s-02                     : ok=296  changed=97   unreachable=0    failed=0   
k8s-03 : ok=267 changed=83 unreachable=0 failed=0

Anything else do we need to know:
I can reproduce this everytime when the vagrant-cluster is newly created.
When I ssh into the affected node and execute the failing command, it works, so it seems to be some sort of timing issue (maybe the k8s secret isn't available yet).

Source

guenhter

👍1

Most helpful comment

I am still getting this same error in latest version, although with a deploy in AWS not Vagrant.

AronT-TLV on 10 Apr 2018

👍2

All 27 comments

Got same thing while trying to run latest version from master. Using AWS cloud provider, CoreOS on nodes.

TASK [kubernetes-apps/rotate_tokens : Rotate Tokens | Get default token name] **************************************************************************************************************************************************************************************************
Tuesday 31 October 2017  11:54:33 +0200 (0:00:00.078)       0:12:25.722 *******
fatal: [ip-172-17-32-207.us-west-2.compute.internal]: FAILED! => {"changed": true, "cmd": "/opt/bin/kubectl get secrets -o custom-columns=name:{.metadata.name} --no-headers | grep -m1 default-token", "delta": "0:00:02.190834", "end": "2017-10-31 09:54:35.734989", "failed": true, "rc": 1, "start": "2017-10-31 09:54:33.544155", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Controller manager was in CrashLoopBackOff status:

$ kubectl logs -f kube-controller-manager-ip-172-17-32-207.us-west-2.compute.internal -n kube-system --tail=200

I1031 10:05:46.503005       1 feature_gate.go:156] feature gates: map[Initializers:true]
I1031 10:05:46.503098       1 controllermanager.go:109] Version: v1.8.1+coreos.0
I1031 10:05:46.507338       1 leaderelection.go:174] attempting to acquire leader lease...
I1031 10:06:01.992020       1 leaderelection.go:184] successfully acquired lease kube-system/kube-controller-manager
W1031 10:06:01.992144       1 controllermanager.go:138] --use-service-account-credentials was specified without providing a --service-account-private-key-file
I1031 10:06:01.992517       1 event.go:218] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"70385c8a-be1f-11e7-b5a8-06a3d3d1ddba", APIVersion:"v1", ResourceVersion:"2683", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-172-17-32-207.us-west-2.compute.internal became leader
I1031 10:06:02.012260       1 aws.go:847] Building AWS cloudprovider
I1031 10:06:02.012297       1 aws.go:810] Zone not specified in configuration file; querying AWS metadata service
E1031 10:06:07.342231       1 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
W1031 10:06:07.342257       1 tags.go:78] AWS cloud - no clusterID filtering applied for shared resources; do not run multiple clusters in this AZ.
F1031 10:06:07.342372       1 controllermanager.go:151] error building controller context: no ClusterID Found.  A ClusterID is required for the cloud provider to function properly.  This check can be bypassed by setting the allow-untagged-cloud option

Kslaffka on 31 Oct 2017

I just got the same with this master branch commit ffbdf31ac4809fd08bd351a7f31139a84ec71cec from January 9th using AWS.

mildebrandt on 23 Jan 2018

😕2

Looks like the issue was not creating tags for the instances in the form: kuberentes.io/cluster/$clusterid

Can there be a check for the tags since this is required?

mildebrandt on 24 Jan 2018

I've got the same issue with the latest commit. Using --cloud_provider: aws flag.

vijayboopathy on 31 Jan 2018

Same error here:

TASK [kubernetes-apps/rotate_tokens : Rotate Tokens | Get default token name] ***
Wednesday 07 February 2018  11:40:57 +0100 (0:00:00.029)       0:15:04.850 ****
fatal: [k8s-01]: FAILED! => {"changed": false, "cmd": "/usr/local/bin/kubectl get secrets -o custom-columns=name:{.metadata.name} --no-headers | grep -m1 default-token", "delta": "0:00:00.126882", "end": "2018-02-07 10:40:58.038802", "msg": "non-zero return code", "rc": 1, "start": "2018-02-07 10:40:57.911920", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

NO MORE HOSTS LEFT *************************************************************
        to retry, use: --limit @/home/leseb/kubespray/cluster.retry

PLAY RECAP *********************************************************************
k8s-01                     : ok=343  changed=110  unreachable=0    failed=1
k8s-02                     : ok=232  changed=65   unreachable=0    failed=0
k8s-03                     : ok=233  changed=65   unreachable=0    failed=0

I initially proposed a commit for this, but it was rejected as @chapsuk could not reproduce since the error exists, not just for me I'm pushing a fix: https://github.com/kubernetes-incubator/kubespray/pull/2271

leseb on 7 Feb 2018

can any one remove grep from Rotate Tokens | Get default token name and reproduce error?
What kubectl get secrets full output?

chapsuk on 7 Feb 2018

+ kube-apiserver log needed

chapsuk on 7 Feb 2018

kubectl get secrets output:

TASK [kubernetes-apps/rotate_tokens : Rotate Tokens | Get default token name] ***
task path: /home/leseb/kubespray/roles/kubernetes-apps/rotate_tokens/tasks/main.yml:2
Wednesday 07 February 2018  14:49:24 +0100 (0:00:00.028)       0:04:56.435 ****
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
<192.168.121.101> ESTABLISH SSH CONNECTION FOR USER: vagrant
<192.168.121.101> SSH: EXEC ssh -vvv -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o Port=22 -o 'IdentityFile="/home/leseb/.vagrant.d/insecure_private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,g
ssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=30 -o ControlPath=/home/leseb/.ansible/cp/05cef265d7 192.168.121.101 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-cfprwezjbhpveiqjhosyotcuvulovlmg; /usr/bin/python'"'"'"'"'"'"'"'"'
&& sleep 0'"'"''
<192.168.121.101> (0, '\n{"changed": true, "end": "2018-02-07 13:49:24.806838", "stdout": "", "cmd": "/usr/local/bin/kubectl get secrets -o custom-columns=name:{.metadata.name} --no-headers", "rc": 0, "start": "2018-02-07 13:49:24.686463", "stderr": "", "delta": "0:00:00.120375", "invocation": {"module_args": {"warn"
: true, "executable": null, "_uses_shell": true, "_raw_params": "/usr/local/bin/kubectl get secrets -o custom-columns=name:{.metadata.name} --no-headers", "removes": null, "creates": null, "chdir": null, "stdin": null}}}\n', 'OpenSSH_7.6p1, OpenSSL 1.1.0g-fips  2 Nov 2017\r\ndebug1: Reading configuration data /etc/ss
h/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 2: Including file /etc/crypto-policies/back-ends/openssh.config depth 1\r\nde
bug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [[email protected],ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchan
ge-sha1,diffie-hellman-group14-sha1]\r\ndebug1: /etc/ssh/ssh_config.d/05-redhat.conf line 8: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 4 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remo
te\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 27278\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header
failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n')
ok: [k8s-01] => {
    "changed": false,
    "cmd": "/usr/local/bin/kubectl get secrets -o custom-columns=name:{.metadata.name} --no-headers",
    "delta": "0:00:00.120375",
    "end": "2018-02-07 13:49:24.806838",
    "invocation": {
        "module_args": {
            "_raw_params": "/usr/local/bin/kubectl get secrets -o custom-columns=name:{.metadata.name} --no-headers",
            "_uses_shell": true,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "rc": 0,
    "start": "2018-02-07 13:49:24.686463",
    "stderr": "",
    "stderr_lines": [],
    "stdout": "",
    "stdout_lines": []
}

leseb on 7 Feb 2018

@chapsuk ^

leseb on 7 Feb 2018

Thanks!

I'm trying understand when default token created,
think this place better for contain task for check default token exists.

chapsuk on 7 Feb 2018

@chapsuk what's not satisfying about https://github.com/kubernetes-incubator/kubespray/pull/2271 approach?

leseb on 7 Feb 2018

If we add new role before rotate_tokens or replace it with another role with kubectl get secret call, we need add same condition with do until and this can lead to new errors.

Think we should check default token created after controller-manager (apiserver?) runned if it critical for other roles.

chapsuk on 7 Feb 2018

If other roles can be affected by this bug then yes we should wait for the token to be present after its creation. Not sure where this is happening though.

leseb on 7 Feb 2018

@chapsuk any updates on this? That's really critical as this is preventing me to deploy any cluster at the moment, even though I applied #2271 already, I wish we had this either merged or found another approach. But the clock is ticking and the playbook is unusable as is. I don't want to maintain my own fork so it'll be nice if we can come to a conclusion soon. Why don't we merge #2271 now to at least unlock the situation and go back later on this? Thanks.

leseb on 8 Feb 2018

can you please provide controller-manager log for failed deployment?

chapsuk on 8 Feb 2018

logs from: docker logs k8s_kube-controller-manager_kube-controller-manager-k8s-01_kube-system_f94a72f577b50f9785ca3978705767f3_0:

I0208 11:23:23.352394       1 feature_gate.go:184] feature gates: map[Initializers:false PersistentLocalVolumes:false]
I0208 11:23:23.352465       1 controllermanager.go:108] Version: v1.9.2+coreos.0
I0208 11:23:23.425958       1 leaderelection.go:174] attempting to acquire leader lease...
I0208 11:23:23.501038       1 leaderelection.go:184] successfully acquired lease kube-system/kube-controller-manager
I0208 11:23:23.508158       1 event.go:218] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"7965c03d-0cc2-11e8-a4a6-52540094aa14", APIVersion:"v1", ResourceVersion:"278", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' k8s-01 became leader
I0208 11:23:23.541020       1 plugins.go:101] No cloud provider specified.
I0208 11:23:23.542553       1 controllermanager.go:483] Starting "pvc-protection"
W0208 11:23:23.542563       1 controllermanager.go:490] Skipping "pvc-protection"
W0208 11:23:23.542566       1 controllermanager.go:477] "tokencleaner" is disabled
I0208 11:23:23.542569       1 controllermanager.go:483] Starting "node"
I0208 11:23:23.542959       1 controller_utils.go:1019] Waiting for caches to sync for tokens controller
I0208 11:23:23.643172       1 controller_utils.go:1026] Caches are synced for tokens controller
I0208 11:23:23.675044       1 node_controller.go:246] Sending events to api server.
I0208 11:23:33.681708       1 range_allocator.go:78] Sending events to api server.
I0208 11:23:33.681773       1 range_allocator.go:105] Node k8s-01 has no CIDR, ignoring
I0208 11:23:33.681777       1 range_allocator.go:105] Node k8s-02 has no CIDR, ignoring
I0208 11:23:33.681779       1 range_allocator.go:105] Node k8s-03 has no CIDR, ignoring
I0208 11:23:33.681803       1 taint_controller.go:158] Sending events to api server.
I0208 11:23:33.684178       1 controllermanager.go:493] Started "node"
I0208 11:23:33.684199       1 controllermanager.go:483] Starting "clusterrole-aggregation"
I0208 11:23:33.684348       1 node_controller.go:558] Starting node controller
I0208 11:23:33.684367       1 controller_utils.go:1019] Waiting for caches to sync for node controller
I0208 11:23:33.700327       1 controllermanager.go:493] Started "clusterrole-aggregation"
I0208 11:23:33.700337       1 controllermanager.go:483] Starting "serviceaccount"
I0208 11:23:33.700419       1 clusterroleaggregation_controller.go:148] Starting ClusterRoleAggregator
I0208 11:23:33.700424       1 controller_utils.go:1019] Waiting for caches to sync for ClusterRoleAggregator controller
I0208 11:23:33.719289       1 controllermanager.go:493] Started "serviceaccount"
W0208 11:23:33.719298       1 controllermanager.go:477] "bootstrapsigner" is disabled
I0208 11:23:33.719302       1 controllermanager.go:483] Starting "persistentvolume-expander"
W0208 11:23:33.719307       1 controllermanager.go:490] Skipping "persistentvolume-expander"
I0208 11:23:33.719310       1 controllermanager.go:483] Starting "daemonset"
I0208 11:23:33.719393       1 serviceaccounts_controller.go:115] Starting service account controller
I0208 11:23:33.719398       1 controller_utils.go:1019] Waiting for caches to sync for service account controller
I0208 11:23:33.734768       1 controllermanager.go:493] Started "daemonset"
I0208 11:23:33.734778       1 controllermanager.go:483] Starting "deployment"
I0208 11:23:33.734847       1 daemon_controller.go:232] Starting daemon sets controller
I0208 11:23:33.734852       1 controller_utils.go:1019] Waiting for caches to sync for daemon sets controller
I0208 11:23:33.748611       1 controllermanager.go:493] Started "deployment"
I0208 11:23:33.748620       1 controllermanager.go:483] Starting "cronjob"
I0208 11:23:33.748684       1 deployment_controller.go:153] Starting deployment controller
I0208 11:23:33.748688       1 controller_utils.go:1019] Waiting for caches to sync for deployment controller
I0208 11:23:33.762842       1 controllermanager.go:493] Started "cronjob"
I0208 11:23:33.762850       1 controllermanager.go:483] Starting "podgc"
I0208 11:23:33.762950       1 cronjob_controller.go:103] Starting CronJob Manager
I0208 11:23:33.779102       1 controllermanager.go:493] Started "podgc"
I0208 11:23:33.779111       1 controllermanager.go:483] Starting "garbagecollector"
I0208 11:23:33.779176       1 gc_controller.go:76] Starting GC controller
I0208 11:23:33.779180       1 controller_utils.go:1019] Waiting for caches to sync for GC controller
I0208 11:23:35.114845       1 controllermanager.go:493] Started "garbagecollector"
I0208 11:23:35.114914       1 controllermanager.go:483] Starting "job"
I0208 11:23:35.115545       1 garbagecollector.go:135] Starting garbage collector controller
I0208 11:23:35.115568       1 controller_utils.go:1019] Waiting for caches to sync for garbage collector controller
I0208 11:23:35.115636       1 graph_builder.go:322] GraphBuilder running
I0208 11:23:35.166808       1 controllermanager.go:493] Started "job"
I0208 11:23:35.166820       1 controllermanager.go:483] Starting "disruption"
I0208 11:23:35.167128       1 job_controller.go:138] Starting job controller
I0208 11:23:35.167135       1 controller_utils.go:1019] Waiting for caches to sync for job controller
I0208 11:23:35.181380       1 controllermanager.go:493] Started "disruption"
I0208 11:23:35.181389       1 controllermanager.go:483] Starting "attachdetach"
I0208 11:23:35.181447       1 disruption.go:288] Starting disruption controller
I0208 11:23:35.181452       1 controller_utils.go:1019] Waiting for caches to sync for disruption controller
W0208 11:23:35.225940       1 probe.go:215] Flexvolume plugin directory at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ does not exist. Recreating.
I0208 11:23:35.226250       1 plugins.go:453] Loaded volume plugin "kubernetes.io/aws-ebs"
I0208 11:23:35.226260       1 plugins.go:453] Loaded volume plugin "kubernetes.io/gce-pd"
I0208 11:23:35.226266       1 plugins.go:453] Loaded volume plugin "kubernetes.io/cinder"
I0208 11:23:35.226272       1 plugins.go:453] Loaded volume plugin "kubernetes.io/portworx-volume"
I0208 11:23:35.226277       1 plugins.go:453] Loaded volume plugin "kubernetes.io/vsphere-volume"
I0208 11:23:35.226283       1 plugins.go:453] Loaded volume plugin "kubernetes.io/azure-disk"
I0208 11:23:35.226288       1 plugins.go:453] Loaded volume plugin "kubernetes.io/photon-pd"
I0208 11:23:35.226294       1 plugins.go:453] Loaded volume plugin "kubernetes.io/scaleio"
I0208 11:23:35.226301       1 plugins.go:453] Loaded volume plugin "kubernetes.io/storageos"
I0208 11:23:35.226307       1 plugins.go:453] Loaded volume plugin "kubernetes.io/fc"
I0208 11:23:35.226312       1 plugins.go:453] Loaded volume plugin "kubernetes.io/iscsi"
I0208 11:23:35.226317       1 plugins.go:453] Loaded volume plugin "kubernetes.io/rbd"
I0208 11:23:35.226376       1 controllermanager.go:493] Started "attachdetach"
I0208 11:23:35.226381       1 controllermanager.go:483] Starting "resourcequota"
I0208 11:23:35.226470       1 attach_detach_controller.go:258] Starting attach detach controller
I0208 11:23:35.226474       1 controller_utils.go:1019] Waiting for caches to sync for attach detach controller
I0208 11:23:35.311686       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {extensions daemonsets}
I0208 11:23:35.311709       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {apps deployments}
I0208 11:23:35.311720       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {apps statefulsets}
I0208 11:23:35.311729       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {apps daemonsets}
I0208 11:23:35.311740       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {networking.k8s.io networkpolicies}
I0208 11:23:35.311760       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {extensions ingresses}
W0208 11:23:35.311767       1 shared_informer.go:310] resyncPeriod 56033966745631 is smaller than resyncCheckPeriod 83829992603544 and the informer has already started. Changing it to 83829992603544
I0208 11:23:35.311802       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {apps replicasets}
I0208 11:23:35.311816       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {events.k8s.io events}
W0208 11:23:35.311823       1 shared_informer.go:310] resyncPeriod 80582472561667 is smaller than resyncCheckPeriod 83829992603544 and the informer has already started. Changing it to 83829992603544
I0208 11:23:35.311840       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for { serviceaccounts}
I0208 11:23:35.311850       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for { limitranges}
I0208 11:23:35.311860       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {batch cronjobs}
I0208 11:23:35.311880       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {rbac.authorization.k8s.io roles}
I0208 11:23:35.311893       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for { podtemplates}
I0208 11:23:35.311907       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {apps controllerrevisions}
I0208 11:23:35.311922       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {batch jobs}
I0208 11:23:35.311937       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {rbac.authorization.k8s.io rolebindings}
I0208 11:23:35.311960       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {extensions deployments}
I0208 11:23:35.311970       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {extensions replicasets}
I0208 11:23:35.311992       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {autoscaling horizontalpodautoscalers}
I0208 11:23:35.312004       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for {policy poddisruptionbudgets}
I0208 11:23:35.312027       1 resource_quota_monitor.go:215] QuotaMonitor created object count evaluator for { endpoints}
I0208 11:23:35.312037       1 controllermanager.go:493] Started "resourcequota"
I0208 11:23:35.312042       1 controllermanager.go:483] Starting "namespace"
I0208 11:23:35.312209       1 resource_quota_controller.go:272] Starting resource quota controller
I0208 11:23:35.312214       1 controller_utils.go:1019] Waiting for caches to sync for resource quota controller
I0208 11:23:35.312225       1 resource_quota_monitor.go:288] QuotaMonitor running
I0208 11:23:35.333944       1 controllermanager.go:493] Started "namespace"
I0208 11:23:35.333953       1 controllermanager.go:483] Starting "horizontalpodautoscaling"
I0208 11:23:35.333978       1 namespace_controller.go:186] Starting namespace controller
I0208 11:23:35.333983       1 controller_utils.go:1019] Waiting for caches to sync for namespace controller
I0208 11:23:36.038120       1 controllermanager.go:493] Started "horizontalpodautoscaling"
I0208 11:23:36.038140       1 controllermanager.go:483] Starting "route"
W0208 11:23:36.038148       1 core.go:136] configure-cloud-routes is set, but no cloud provider specified. Will not configure cloud provider routes.
W0208 11:23:36.038154       1 controllermanager.go:490] Skipping "route"
I0208 11:23:36.038160       1 controllermanager.go:483] Starting "persistentvolume-binder"
I0208 11:23:36.038203       1 horizontal.go:128] Starting HPA controller
I0208 11:23:36.038212       1 controller_utils.go:1019] Waiting for caches to sync for HPA controller
I0208 11:23:36.265045       1 plugins.go:453] Loaded volume plugin "kubernetes.io/host-path"
I0208 11:23:36.265087       1 plugins.go:453] Loaded volume plugin "kubernetes.io/nfs"
I0208 11:23:36.265113       1 plugins.go:453] Loaded volume plugin "kubernetes.io/glusterfs"
I0208 11:23:36.265132       1 plugins.go:453] Loaded volume plugin "kubernetes.io/rbd"
I0208 11:23:36.265154       1 plugins.go:453] Loaded volume plugin "kubernetes.io/quobyte"
I0208 11:23:36.265177       1 plugins.go:453] Loaded volume plugin "kubernetes.io/azure-file"
I0208 11:23:36.265199       1 plugins.go:453] Loaded volume plugin "kubernetes.io/flocker"
I0208 11:23:36.265219       1 plugins.go:453] Loaded volume plugin "kubernetes.io/portworx-volume"
I0208 11:23:36.265245       1 plugins.go:453] Loaded volume plugin "kubernetes.io/scaleio"
I0208 11:23:36.265298       1 plugins.go:453] Loaded volume plugin "kubernetes.io/local-volume"
I0208 11:23:36.265318       1 plugins.go:453] Loaded volume plugin "kubernetes.io/storageos"
I0208 11:23:36.265336       1 plugins.go:453] Loaded volume plugin "kubernetes.io/aws-ebs"
I0208 11:23:36.265354       1 plugins.go:453] Loaded volume plugin "kubernetes.io/gce-pd"
I0208 11:23:36.265378       1 plugins.go:453] Loaded volume plugin "kubernetes.io/cinder"
I0208 11:23:36.265397       1 plugins.go:453] Loaded volume plugin "kubernetes.io/vsphere-volume"
I0208 11:23:36.265415       1 plugins.go:453] Loaded volume plugin "kubernetes.io/azure-disk"
I0208 11:23:36.265433       1 plugins.go:453] Loaded volume plugin "kubernetes.io/photon-pd"
I0208 11:23:36.265501       1 controllermanager.go:493] Started "persistentvolume-binder"
I0208 11:23:36.265520       1 controllermanager.go:483] Starting "endpoint"
I0208 11:23:36.265635       1 pv_controller_base.go:264] Starting persistent volume controller
I0208 11:23:36.265653       1 controller_utils.go:1019] Waiting for caches to sync for persistent volume controller
I0208 11:23:36.457799       1 garbagecollector.go:182] syncing garbage collector with updated resources from discovery: map[{apiregistration.k8s.io v1beta1 apiservices}:{} {extensions v1beta1 replicasets}:{} {apps v1 deployments}:{} {autoscaling v1 horizontalpodautoscalers}:{} { v1 pods}:{} { v1 resourcequotas}:{} {extensions v1beta1 deployments}:{} {networking.k8s.io v1 networkpolicies}:{} {rbac.authorization.k8s.io v1 clusterroles}:{} {rbac.authorization.k8s.io v1 roles}:{} {extensions v1beta1 ingresses}:{} {apps v1 controllerrevisions}:{} {batch v1beta1 cronjobs}:{} {certificates.k8s.io v1beta1 certificatesigningrequests}:{} {rbac.authorization.k8s.io v1 rolebindings}:{} {apiextensions.k8s.io v1beta1 customresourcedefinitions}:{} {apps v1 statefulsets}:{} {apps v1 daemonsets}:{} { v1 configmaps}:{} { v1 replicationcontrollers}:{} { v1 podtemplates}:{} {apps v1 replicasets}:{} {admissionregistration.k8s.io v1beta1 validatingwebhookconfigurations}:{} { v1 serviceaccounts}:{} { v1 nodes}:{} { v1 persistentvolumes}:{} { v1 events}:{} { v1 secrets}:{} {batch v1 jobs}:{} {rbac.authorization.k8s.io v1 clusterrolebindings}:{} {storage.k8s.io v1 storageclasses}:{} {admissionregistration.k8s.io v1beta1 mutatingwebhookconfigurations}:{} { v1 namespaces}:{} { v1 services}:{} { v1 persistentvolumeclaims}:{} {extensions v1beta1 daemonsets}:{} {extensions v1beta1 podsecuritypolicies}:{} {policy v1beta1 poddisruptionbudgets}:{} {admissionregistration.k8s.io v1alpha1 initializerconfigurations}:{} { v1 limitranges}:{} {extensions v1beta1 networkpolicies}:{} {events.k8s.io v1beta1 events}:{} { v1 endpoints}:{}]
I0208 11:23:36.468946       1 resource_quota_controller.go:434] syncing resource quota controller with updated resources from discovery: map[{extensions v1beta1 replicasets}:{} {apps v1 replicasets}:{} {apps v1 statefulsets}:{} { v1 endpoints}:{} { v1 serviceaccounts}:{} { v1 pods}:{} {extensions v1beta1 daemonsets}:{} {rbac.authorization.k8s.io v1 rolebindings}:{} { v1 replicationcontrollers}:{} { v1 configmaps}:{} { v1 secrets}:{} {apps v1 daemonsets}:{} {apps v1 controllerrevisions}:{} {batch v1 jobs}:{} { v1 podtemplates}:{} { v1 events}:{} { v1 persistentvolumeclaims}:{} {rbac.authorization.k8s.io v1 roles}:{} { v1 limitranges}:{} {apps v1 deployments}:{} {autoscaling v1 horizontalpodautoscalers}:{} {batch v1beta1 cronjobs}:{} {extensions v1beta1 ingresses}:{} {extensions v1beta1 networkpolicies}:{} {events.k8s.io v1beta1 events}:{} { v1 resourcequotas}:{} { v1 services}:{} {extensions v1beta1 deployments}:{} {networking.k8s.io v1 networkpolicies}:{} {policy v1beta1 poddisruptionbudgets}:{}]
I0208 11:23:36.508476       1 controllermanager.go:493] Started "endpoint"
I0208 11:23:36.508493       1 controllermanager.go:483] Starting "csrapproving"
I0208 11:23:36.508534       1 endpoints_controller.go:153] Starting endpoint controller
I0208 11:23:36.508550       1 controller_utils.go:1019] Waiting for caches to sync for endpoint controller
I0208 11:23:36.765347       1 controllermanager.go:493] Started "csrapproving"
I0208 11:23:36.765384       1 controllermanager.go:483] Starting "ttl"
I0208 11:23:36.765466       1 certificate_controller.go:113] Starting certificate controller
I0208 11:23:36.765484       1 controller_utils.go:1019] Waiting for caches to sync for certificate controller
I0208 11:23:37.015454       1 controllermanager.go:493] Started "ttl"
I0208 11:23:37.015498       1 controllermanager.go:483] Starting "statefulset"
I0208 11:23:37.015631       1 ttl_controller.go:116] Starting TTL controller
I0208 11:23:37.015650       1 controller_utils.go:1019] Waiting for caches to sync for TTL controller
I0208 11:23:37.258630       1 controllermanager.go:493] Started "statefulset"
I0208 11:23:37.258647       1 controllermanager.go:483] Starting "csrcleaner"
I0208 11:23:37.258690       1 stateful_set.go:150] Starting stateful set controller
I0208 11:23:37.258699       1 controller_utils.go:1019] Waiting for caches to sync for stateful set controller
I0208 11:23:37.408457       1 controllermanager.go:493] Started "csrcleaner"
I0208 11:23:37.408490       1 controllermanager.go:483] Starting "service"
I0208 11:23:37.408603       1 cleaner.go:81] Starting CSR cleaner controller
E0208 11:23:37.661352       1 core.go:73] Failed to start service controller: WARNING: no cloud provider provided, services of type LoadBalancer will fail
W0208 11:23:37.661376       1 controllermanager.go:490] Skipping "service"
I0208 11:23:37.661392       1 controllermanager.go:483] Starting "replicationcontroller"
I0208 11:23:37.913069       1 controllermanager.go:493] Started "replicationcontroller"
I0208 11:23:37.913105       1 controllermanager.go:483] Starting "replicaset"
I0208 11:23:37.913203       1 replica_set.go:183] Starting replicationcontroller controller
I0208 11:23:37.913221       1 controller_utils.go:1019] Waiting for caches to sync for ReplicationController controller
I0208 11:23:38.163077       1 controllermanager.go:493] Started "replicaset"
I0208 11:23:38.163112       1 controllermanager.go:483] Starting "csrsigning"
I0208 11:23:38.163219       1 replica_set.go:183] Starting replicaset controller
I0208 11:23:38.163237       1 controller_utils.go:1019] Waiting for caches to sync for ReplicaSet controller
I0208 11:23:38.310571       1 controllermanager.go:493] Started "csrsigning"
I0208 11:23:38.312189       1 controller_utils.go:1019] Waiting for caches to sync for garbage collector controller
I0208 11:23:38.312338       1 certificate_controller.go:113] Starting certificate controller
I0208 11:23:38.312355       1 controller_utils.go:1019] Waiting for caches to sync for certificate controller
I0208 11:23:38.344616       1 controller_utils.go:1019] Waiting for caches to sync for resource quota controller
I0208 11:23:38.412381       1 controller_utils.go:1026] Caches are synced for HPA controller
I0208 11:23:38.412504       1 controller_utils.go:1026] Caches are synced for certificate controller
I0208 11:23:38.413944       1 controller_utils.go:1026] Caches are synced for certificate controller
I0208 11:23:38.413973       1 controller_utils.go:1026] Caches are synced for persistent volume controller
E0208 11:23:38.414218       1 actual_state_of_world.go:491] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true because nodeName="k8s-02"  does not exist
E0208 11:23:38.414663       1 actual_state_of_world.go:491] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true because nodeName="k8s-03"  does not exist
E0208 11:23:38.414671       1 actual_state_of_world.go:491] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true because nodeName="k8s-01"  does not exist
I0208 11:23:38.422355       1 controller_utils.go:1026] Caches are synced for service account controller
I0208 11:23:38.422558       1 controller_utils.go:1026] Caches are synced for TTL controller
I0208 11:23:38.435942       1 controller_utils.go:1026] Caches are synced for namespace controller
I0208 11:23:38.441749       1 range_allocator.go:228] Node k8s-01 is already in a process of CIDR assignment.
I0208 11:23:38.441925       1 ttl_controller.go:271] Changed ttl annotation for node k8s-01 to 0 seconds
I0208 11:23:38.445311       1 range_allocator.go:228] Node k8s-02 is already in a process of CIDR assignment.
I0208 11:23:38.445345       1 range_allocator.go:228] Node k8s-03 is already in a process of CIDR assignment.
I0208 11:23:38.445473       1 ttl_controller.go:271] Changed ttl annotation for node k8s-03 to 0 seconds
I0208 11:23:38.445630       1 ttl_controller.go:271] Changed ttl annotation for node k8s-02 to 0 seconds
I0208 11:23:38.500756       1 controller_utils.go:1026] Caches are synced for ClusterRoleAggregator controller
I0208 11:23:38.509243       1 controller_utils.go:1026] Caches are synced for endpoint controller
I0208 11:23:38.514297       1 controller_utils.go:1026] Caches are synced for ReplicationController controller
I0208 11:23:38.527419       1 controller_utils.go:1026] Caches are synced for attach detach controller
I0208 11:23:38.534984       1 controller_utils.go:1026] Caches are synced for daemon sets controller
E0208 11:23:38.551678       1 clusterroleaggregation_controller.go:180] admin failed with : Operation cannot be fulfilled on clusterroles.rbac.authorization.k8s.io "admin": the object has been modified; please apply your changes to the latest version and try again
E0208 11:23:38.553285       1 clusterroleaggregation_controller.go:180] view failed with : Operation cannot be fulfilled on clusterroles.rbac.authorization.k8s.io "view": the object has been modified; please apply your changes to the latest version and try again
I0208 11:23:38.558806       1 controller_utils.go:1026] Caches are synced for stateful set controller
I0208 11:23:38.567320       1 controller_utils.go:1026] Caches are synced for job controller
I0208 11:23:38.579352       1 controller_utils.go:1026] Caches are synced for GC controller
I0208 11:23:38.584561       1 controller_utils.go:1026] Caches are synced for node controller
I0208 11:23:38.584625       1 range_allocator.go:157] Starting range CIDR allocator
I0208 11:23:38.584635       1 controller_utils.go:1019] Waiting for caches to sync for cidrallocator controller
I0208 11:23:38.584682       1 node_controller.go:633] Controller observed a new Node: "k8s-02"
I0208 11:23:38.584695       1 controller_utils.go:197] Recording Registered Node k8s-02 in Controller event message for node k8s-02
I0208 11:23:38.584737       1 node_controller.go:611] Initializing eviction metric for zone: 
I0208 11:23:38.584754       1 node_controller.go:633] Controller observed a new Node: "k8s-03"
I0208 11:23:38.584760       1 controller_utils.go:197] Recording Registered Node k8s-03 in Controller event message for node k8s-03
I0208 11:23:38.584770       1 node_controller.go:633] Controller observed a new Node: "k8s-01"
I0208 11:23:38.584777       1 controller_utils.go:197] Recording Registered Node k8s-01 in Controller event message for node k8s-01
W0208 11:23:38.584814       1 node_controller.go:964] Missing timestamp for Node k8s-02. Assuming now as a timestamp.
W0208 11:23:38.584841       1 node_controller.go:964] Missing timestamp for Node k8s-03. Assuming now as a timestamp.
W0208 11:23:38.584859       1 node_controller.go:964] Missing timestamp for Node k8s-01. Assuming now as a timestamp.
I0208 11:23:38.584897       1 node_controller.go:830] Controller detected that all Nodes are not-Ready. Entering master disruption mode.
I0208 11:23:38.584950       1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"k8s-01", UID:"72b38251-0cc2-11e8-a4a6-52540094aa14", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node k8s-01 event: Registered Node k8s-01 in Controller
I0208 11:23:38.584975       1 taint_controller.go:181] Starting NoExecuteTaintManager
I0208 11:23:38.585013       1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"k8s-02", UID:"73317d35-0cc2-11e8-a4a6-52540094aa14", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node k8s-02 event: Registered Node k8s-02 in Controller
I0208 11:23:38.585030       1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"k8s-03", UID:"7332a80c-0cc2-11e8-a4a6-52540094aa14", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node k8s-03 event: Registered Node k8s-03 in Controller
I0208 11:23:38.684788       1 controller_utils.go:1026] Caches are synced for cidrallocator controller
I0208 11:23:38.705526       1 range_allocator.go:307] Set node k8s-02 PodCIDR to 10.233.64.0/24
I0208 11:23:38.708481       1 range_allocator.go:307] Set node k8s-01 PodCIDR to 10.233.66.0/24
I0208 11:23:38.708598       1 range_allocator.go:307] Set node k8s-03 PodCIDR to 10.233.65.0/24
I0208 11:23:38.712439       1 controller_utils.go:1026] Caches are synced for resource quota controller
I0208 11:23:38.744930       1 controller_utils.go:1026] Caches are synced for resource quota controller
I0208 11:23:38.748997       1 controller_utils.go:1026] Caches are synced for deployment controller
I0208 11:23:38.763410       1 controller_utils.go:1026] Caches are synced for ReplicaSet controller
I0208 11:23:38.781739       1 controller_utils.go:1026] Caches are synced for disruption controller
I0208 11:23:38.781790       1 disruption.go:296] Sending events to api server.
I0208 11:23:38.812543       1 controller_utils.go:1026] Caches are synced for garbage collector controller
I0208 11:23:38.812577       1 garbagecollector.go:219] synced garbage collector
I0208 11:23:38.815909       1 controller_utils.go:1026] Caches are synced for garbage collector controller
I0208 11:23:38.815941       1 garbagecollector.go:144] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
I0208 11:28:38.610468       1 node_controller.go:699] Node is NotReady. Adding Pods on Node k8s-02 to eviction queue: 2018-02-08 11:28:38.610430623 +0000 UTC m=+315.702892000 is later than 2018-02-08 11:23:38.584818118 +0000 UTC m=+15.677279387 + 5m0s
I0208 11:28:38.610590       1 node_controller.go:699] Node is NotReady. Adding Pods on Node k8s-03 to eviction queue: 2018-02-08 11:28:38.610571049 +0000 UTC m=+315.703032427 is later than 2018-02-08 11:23:38.584844246 +0000 UTC m=+15.677305518 + 5m0s
I0208 11:28:38.610648       1 node_controller.go:699] Node is NotReady. Adding Pods on Node k8s-01 to eviction queue: 2018-02-08 11:28:38.610631929 +0000 UTC m=+315.703093306 is later than 2018-02-08 11:23:38.584862622 +0000 UTC m=+15.677323892 + 5m0s
W0208 11:33:14.414906       1 reflector.go:341] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:86: watch of *v1beta1.Event ended with: The resourceVersion for the provided watch is too old.

leseb on 8 Feb 2018

Thanks very match!
time of failed step will be useful

chapsuk on 8 Feb 2018

@chapsuk that's the full log already.

leseb on 8 Feb 2018

Feel free for reopen If #2271 fix did not solve this problem

chapsuk on 8 Feb 2018

I am still getting this same error in latest version, although with a deploy in AWS not Vagrant.

AronT-TLV on 10 Apr 2018

👍2

@chapsuk I'm getting the same error but this seems to be because /opt/bin/kubectl get secrets returns No resources found. Is this response expected?
where should these secrets be generated?

lpaulmp on 11 Apr 2018

Same error here as reported by @AronT-TLV

UPDATE:
re-running the whole playbook somehow resolved my issue.

gsaslis on 19 Apr 2018

@gsaslis did you rerun the playbook w/o doing anything else, I.e. recreating the cluster?

UPDATE:
Doing the former did not solve my problem.

AronT-TLV on 22 Apr 2018

The problem was not following instructions in AWS documentation about adding tags to instances, subnets etc. Adding the tags solved the problem.

AronT-TLV on 22 Apr 2018

👍1

Getting the same error with Vagrant, newly created cluster with the latest code.
TASK [kubernetes-apps/rotate_tokens : Rotate Tokens | Get default token name] *
Monday 13 August 2018 13:16:54 +0300 (0:00:00.070) 0:08:50.772 ***
FAILED - RETRYING: Rotate Tokens | Get default token name (5 retries left).
FAILED - RETRYING: Rotate Tokens | Get default token name (4 retries left).
FAILED - RETRYING: Rotate Tokens | Get default token name (3 retries left).
FAILED - RETRYING: Rotate Tokens | Get default token name (2 retries left).
FAILED - RETRYING: Rotate Tokens | Get default token name (1 retries left).
fatal: [k8s-01]: FAILED! => {"attempts": 5, "changed": false, "cmd": "/usr/local/bin/kubectl get secrets -o custom-columns=name:{.metadata.name} --no-headers | grep -m1 default-token", "delta": "0:00:00.178995", "end": "2018-08-13 10:17:00.655016", "msg": "non-zero return code", "rc": 1, "start": "2018-08-13 10:17:00.476021", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}