Kubespray: blocking: Get current version of calico cluster version

Created on 30 Apr 2020 · 3Comments · Source: kubernetes-sigs/kubespray

I'm reopen the bug #4148, it's still happening with me

I do these step below:

cp -rfp inventory/sample inventory/mycluster
declare -a IPS=(192.168.1.15 192.168.1.71 192.168.1.72 192.168.1.73 192.168.1.74)
CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}

ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml

and blocking.

kinbug

Source

nghiepvo

👍3 😕1

Most helpful comment

Issue appears if you have "/etc/calico" folder from previous installation. After removing this folder for all hosts issue was fixed.

dkozlov on 3 May 2020

👍2

All 3 comments

Issue appears if you have "/etc/calico" folder from previous installation. After removing this folder for all hosts issue was fixed.

dkozlov on 3 May 2020

👍2

I'm reopen the bug #4148, it's still happening with me

Looking at that issue, It seems the commit was merged on early 2019, but its changes aren't on current tree:

The commit changes

https://github.com/kubernetes-sigs/kubespray/pull/4149/files

which is to add the following parameters to both of the 'name: "Get current version of calico cluster version"' tasks:

# roles/kubernetes/preinstall/tasks/0020-verify-settings.yml
...
- name: "Get current version of calico cluster version"
  shell: "{{ bin_dir }}/calicoctl version  | grep 'Cluster Version:' | awk '{ print $3}'"
  register: calico_version_on_server
  async: 10
  poll: 3
  run_once: yes
  delegate_to: "{{ groups['kube-master'][0] }}"
  when:
    - kube_network_plugin == 'calico'
...

# roles/network_plugin/calico/tasks/check.yml
...
- name: "Get current version of calico cluster version"
  shell: "{{ bin_dir }}/calicoctl version  | grep 'Cluster Version:' | awk '{ print $3}'"
  register: calico_version_on_server
  run_once: yes
  delegate_to: "{{ groups['kube-master'][0] }}"
  async: 10
  poll: 3
...

Current files don't have those async and poll parameters:

If you make them back to the playbooks, it stops the timeout problem.

I don't have the git skills to see why they were removed, though.

mrrandrade on 22 Jun 2020

I confirm that I reproduce the problem and that the fix in the link works.

Context for the issue :

I needed to install a new Kubernetes with Kubespray v2.13.2 with only one Master, one ETCD Node and Calico as CNI to prepare the transition of an existing Kubernetes cluster to a new one behind a bastion host without using a lot of hardware.

So I downscaled my old cluster to create a new one and move every services from the old cluster to the new one (to avoid down time for developers). After checking that everything worked fine, I shut down the old cluster and move the hardware behind the bastion host to add 2 masters and 2 ETCD and more workers to the new cluster.

I encountered the timeout issue when I ran the cluster.yml playbook. The "/etc/calico" folder was present on the first master with certs and keys inside so I didn't try to erase it. Using the link of the commit to fix the problem worked.

Hope it will be added soon in a stable release.