I'm reopen the bug #4148, it's still happening with me
I do these step below:
cp -rfp inventory/sample inventory/mycluster
declare -a IPS=(192.168.1.15 192.168.1.71 192.168.1.72 192.168.1.73 192.168.1.74)
CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}
ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml
and blocking.
Issue appears if you have "/etc/calico" folder from previous installation. After removing this folder for all hosts issue was fixed.
I'm reopen the bug #4148, it's still happening with me
Looking at that issue, It seems the commit was merged on early 2019, but its changes aren't on current tree:
The commit changes
which is to add the following parameters to both of the 'name: "Get current version of calico cluster version"' tasks:
# roles/kubernetes/preinstall/tasks/0020-verify-settings.yml
...
- name: "Get current version of calico cluster version"
shell: "{{ bin_dir }}/calicoctl version | grep 'Cluster Version:' | awk '{ print $3}'"
register: calico_version_on_server
async: 10
poll: 3
run_once: yes
delegate_to: "{{ groups['kube-master'][0] }}"
when:
- kube_network_plugin == 'calico'
...
# roles/network_plugin/calico/tasks/check.yml
...
- name: "Get current version of calico cluster version"
shell: "{{ bin_dir }}/calicoctl version | grep 'Cluster Version:' | awk '{ print $3}'"
register: calico_version_on_server
run_once: yes
delegate_to: "{{ groups['kube-master'][0] }}"
async: 10
poll: 3
...
Current files don't have those async and poll parameters:
If you make them back to the playbooks, it stops the timeout problem.
I don't have the git skills to see why they were removed, though.
I confirm that I reproduce the problem and that the fix in the link works.
Context for the issue :
I needed to install a new Kubernetes with Kubespray v2.13.2 with only one Master, one ETCD Node and Calico as CNI to prepare the transition of an existing Kubernetes cluster to a new one behind a bastion host without using a lot of hardware.
So I downscaled my old cluster to create a new one and move every services from the old cluster to the new one (to avoid down time for developers). After checking that everything worked fine, I shut down the old cluster and move the hardware behind the bastion host to add 2 masters and 2 ETCD and more workers to the new cluster.
I encountered the timeout issue when I ran the cluster.yml playbook. The "/etc/calico" folder was present on the first master with certs and keys inside so I didn't try to erase it. Using the link of the commit to fix the problem worked.
Hope it will be added soon in a stable release.
Most helpful comment
Issue appears if you have "/etc/calico" folder from previous installation. After removing this folder for all hosts issue was fixed.