Environment:
*Linux 4.4.0-174-generic x86_64
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
*
**# ansible --version
ansible 2.7.10
config file = /root/kubespray-2.11.0/ansible.cfg
configured module search path = [u'/root/kubespray-2.11.0/library']
ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
executable location = /usr/local/bin/ansible
python version = 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609]
**# python --version
Python 2.7.12
Kubespray version - 2.11.0
Network plugin used: Calico
Copy of your inventory file:
[calico-rr]
[k8s-cluster:children]
kube-master
kube-node
calico-rr
Command used to invoke ansible: ansible-playbook -i inventory/mycluster/hosts.yaml cluster.yml -u root --ask-pass -b --become-user=root -vvvv
Output of ansible run:
https://gist.github.com/mhabicht/452b838a809564ff568a15b0ae052d84
Anything else do we need to know:
journalctl -u kubelet.service
Feb 22 00:12:02 avinetarchkm-anp1 kubelet[16949]: I0222 00:12:02.495325 16949 setters.go:73] Using node IP: "10.246.4.152"
Feb 22 00:12:03 avinetarchkm-anp1 kubelet[16949]: E0222 00:12:03.842073 16949 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Feb 22 00:12:03 avinetarchkm-anp1 kubelet[16949]: W0222 00:12:03.950817 16949 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Feb 22 00:12:08 avinetarchkm-anp1 kubelet[16949]: E0222 00:12:08.842801 16949 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
We need more info like the whole ansible output to see what happened during the network CNI deployment. The inventory and the group_vars also would help.
How do I save the full playbook output? too many lines to copy, and it appears to replay log /var/log/ansible/log was not added till 2.9, I am running 2.7.10.
Here is the last 20k lines https://gist.github.com/mhabicht/8e108995fa24dd1be2ad68b4a01f60bc
You can tee the output to a file. There are a lot of lines because you are running the playbook with four verbosity levels. You can reduce it to two or at most three.
ansible-playbook -i inventory/mycluster/hosts.yaml cluster.yml -u root --ask-pass -b --become-user=root -vv | tee output.log
The deployment of Calico in output.log is the most interesting in your case.
`kubespray-2.11.0/roles/network_plugin/calico/tasks# ls
check.yml install.yml main.yml pre.yml reset.yml upgrade.yml
name: Calico | Copy calicoctl binary from download dir
copy:
src: "{{ local_release_dir }}/calicoctl"
dest: "{{ bin_dir }}/calicoctl"
mode: 0755
remote_src: yes
name: Calico | Write Calico cni config
template:
src: "cni-calico.conflist.j2"
dest: "/etc/cni/net.d/{% if calico_version is version('v3.3.0', '>=') %}calico.conflist.template{% else %}10-calico.conflist{% endif %}"
owner: kube
register: calico_conflist
notify: reset_calico_cni
name: Calico | Create calico certs directory
file:
dest: "{{ calico_cert_dir }}"
state: directory
mode: 0750
owner: root
group: root
when: calico_datastore == "etcd"
name: Calico | Link etcd certificates for calico-node
file:
src: "{{ etcd_cert_dir }}/{{ item.s }}"
dest: "{{ calico_cert_dir }}/{{ item.d }}"
state: hard
force: yes
with_items:
name: Calico | Install calicoctl wrapper script
template:
src: "calicoctl.{{ calico_datastore }}.sh.j2"
dest: "{{ bin_dir }}/calicoctl.sh"
mode: 0755
owner: root
group: root
name: Calico | wait for etcd
uri:
url: "{{ etcd_access_addresses.split(',') | first }}/health"
validate_certs: no
client_cert: "{{ calico_cert_dir }}/cert.crt"
client_key: "{{ calico_cert_dir }}/key.pem"
register: result
until: result.status == 200 or result.status == 401
retries: 10
delay: 5
run_once: true
when: calico_datastore == "etcd"
name: Calico | Check if calico network pool has already been configured
shell: >
{{ bin_dir }}/calicoctl.sh get ippool | grep -w "{{ calico_pool_cidr | default(kube_pods_subnet) }}" | wc -l
register: calico_conf
retries: 4
delay: "{{ retry_stagger | random + 3 }}"
changed_when: false
when:
name: Calico | Ensure that calico_pool_cidr is within kube_pods_subnet when defined
assert:
that: "[calico_pool_cidr] | ipaddr(kube_pods_subnet) | length == 1"
msg: "{{ calico_pool_cidr }} is not within or equal to {{ kube_pods_subnet }}"
when:
name: Calico | Create calico manifests for kdd
template:
src: "{{ item.file }}.j2"
dest: "{{ kube_config_dir }}/{{ item.file }}"
with_items:
name: Calico | Create Calico Kubernetes datastore resources
kube:
name: "{{ item.item.name }}"
namespace: "kube-system"
kubectl: "{{ bin_dir }}/kubectl"
resource: "{{ item.item.type }}"
filename: "{{ kube_config_dir }}/{{ item.item.file }}"
state: "latest"
with_items:
name: Calico | Configure calico network pool (version < v3.3.0)
shell: >
echo "
{ "kind": "IPPool",
"apiVersion": "projectcalico.org/v3",
"metadata": {
"name": "{{ calico_pool_name }}",
},
"spec": {
"cidr": "{{ calico_pool_cidr | default(kube_pods_subnet) }}",
"ipipMode": "{{ ipip_mode }}",
"natOutgoing": {{ nat_outgoing|default(false) and not peer_with_router|default(false) }} }} " | {{ bin_dir }}/calicoctl.sh create -f -
when:
name: Calico | Configure calico network pool (version >= v3.3.0)
shell: >
echo "
{ "kind": "IPPool",
"apiVersion": "projectcalico.org/v3",
"metadata": {
"name": "{{ calico_pool_name }}",
},
"spec": {
"blockSize": "{{ kube_network_node_prefix }}",
"cidr": "{{ calico_pool_cidr | default(kube_pods_subnet) }}",
"ipipMode": "{{ ipip_mode }}",
"natOutgoing": {{ nat_outgoing|default(false) and not peer_with_router|default(false) }} }} " | {{ bin_dir }}/calicoctl.sh create -f -
when:
name: "Determine nodeToNodeMesh needed state"
set_fact:
nodeToNodeMeshEnabled: "false"
when:
name: Calico | Set global as_num
shell: >
echo '
{ "kind": "BGPConfiguration",
"apiVersion": "projectcalico.org/v3",
"metadata": {
"name": "default",
},
"spec": {
"logSeverityScreen": "Info",
"nodeToNodeMeshEnabled": {{ nodeToNodeMeshEnabled|default('true') }} ,
"asNumber": {{ global_as_num }} }} ' | {{ bin_dir }}/calicoctl.sh create --skip-exists -f -
changed_when: false
when:
name: Calico | Configure peering with router(s) at global scope
shell: >
echo '{
"apiVersion": "projectcalico.org/v3",
"kind": "BGPPeer",
"metadata": {
"name": "global-{{ item.router_id }}"
},
"spec": {
"asNumber": "{{ item.as }}",
"peerIP": "{{ item.router_id }}"
}}' | {{ bin_dir }}/calicoctl.sh create --skip-exists -f -
retries: 4
delay: "{{ retry_stagger | random + 3 }}"
with_items:
name: Calico | Configure peering with route reflectors at global scope
shell: |
echo '{
"apiVersion": "projectcalico.org/v3",
"kind": "BGPPeer",
"metadata": {
"name": "peer-to-rrs"
},
"spec": {
"nodeSelector": "!has(i-am-a-route-reflector)",
"peerSelector": "has(i-am-a-route-reflector)"
}}' | {{ bin_dir }}/calicoctl.sh create --skip-exists -f -
retries: 4
delay: "{{ retry_stagger | random + 3 }}"
with_items:
name: Calico | Configure route reflectors to peer with each other
shell: >
echo '{
"apiVersion": "projectcalico.org/v3",
"kind": "BGPPeer",
"metadata": {
"name": "rr-mesh"
},
"spec": {
"nodeSelector": "has(i-am-a-route-reflector)",
"peerSelector": "has(i-am-a-route-reflector)"
}}' | {{ bin_dir }}/calicoctl.sh create --skip-exists -f -
retries: 4
delay: "{{ retry_stagger | random + 3 }}"
with_items:
name: Calico | Create calico manifests
template:
src: "{{ item.file }}.j2"
dest: "{{ kube_config_dir }}/{{ item.file }}"
with_items:
name: Calico | Create calico manifests for typha
template:
src: "{{ item.file }}.j2"
dest: "{{ kube_config_dir }}/{{ item.file }}"
with_items:
name: Start Calico resources
kube:
name: "{{ item.item.name }}"
namespace: "kube-system"
kubectl: "{{ bin_dir }}/kubectl"
resource: "{{ item.item.type }}"
filename: "{{ kube_config_dir }}/{{ item.item.file }}"
state: "latest"
with_items:
name: Wait for calico kubeconfig to be created
wait_for:
path: /etc/cni/net.d/calico-kubeconfig
when:
name: Calico | Configure node asNumber for per node peering
shell: >
echo '{
"apiVersion": "projectcalico.org/v3",
"kind": "Node",
"metadata": {
"name": "{{ inventory_hostname }}"
},
"spec": {
"bgp": {
"asNumber": "{{ local_as }}"
},
"orchRefs":[{"nodeName":"{{ inventory_hostname }}","orchestrator":"k8s"}]
}}' | {{ bin_dir }}/calicoctl.sh {{ 'apply -f -' if calico_datastore == "kdd" else 'create --skip-exists -f -' }}
retries: 4
delay: "{{ retry_stagger | random + 3 }}"
when:
name: Calico | Configure peering with router(s) at node scope
shell: >
echo '{
"apiVersion": "projectcalico.org/v3",
"kind": "BGPPeer",
"metadata": {
"name": "{{ inventory_hostname }}-{{ item.router_id }}"
},
"spec": {
"asNumber": "{{ item.as }}",
"node": "{{ inventory_hostname }}",
"peerIP": "{{ item.router_id }}"
}}' | {{ bin_dir }}/calicoctl.sh create --skip-exists -f -
retries: 4
delay: "{{ retry_stagger | random + 3 }}"
with_items:
Does the play finish there? Because I need the rest of it to understand what went wrong.
Yes the playbook stops there, with -vvvv it ended with the broken pipe error. with -vv it ends with:
TASK [network_plugin/calico : Calico | Write Calico cni config] ************************
task path: /root/kubespray-2.11.0/roles/network_plugin/calico/tasks/install.yml:9
Thursday 05 March 2020 08:00:17 -0500 (0:00:01.464) 0:14:00.558 *
ERROR! The requested handler 'reset_calico_cni' was not found in either the main handlers list nor in the listening handlers list
What do you have under: kubespray-2.11.0/roles/network_plugin/calico/handlers/ ? You should have a main.yml with the 'reset_calico_cni' task.
Also, the ansible version you should use with kubespray 2.11 is 2.7.12
You are using 2.7.10 It's a minor change but it might be problematic. you should do a pip install -r requirements.txt which is under the kubespray folder.
"You should have a main.yml with the 'reset_calico_cni' task." Yes I do.
Upgraded by running pip install -r requirements.txt
Re-running playbook
It appears that upgradeing to 2.7.12 fixed the problem.
# ls /etc/cni/net.d/
10-calico.conflist calico.conflist.template calico-kubeconfig
Complete and all nodes show ready.
NAME STATUS ROLES AGE VERSION
avinetarch-anp1 Ready
avinetarch-anp2 Ready
avinetarch-anp3 Ready
avinetarch-anp4 Ready
avinetarch-anp5 Ready
avinetarch-anp6 Ready
avinetarch-anp7 Ready
avinetarch-anp8 Ready
avinetarch-anp9 Ready
avinetarchkm-anp1 Ready master 5m53s v1.15.3
avinetarchkm-anp2 Ready master 5m23s v1.15.3
avinetarchkm-anp3 Ready master 5m25s v1.15.3
Great!
/close
@alijahnas: Closing this issue.
In response to this:
Great!
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Actually, it's not fixed. It fails in 300 seconds while waiting for the /etc/cni/net.d/calico-kubeconfig file on one host of three.
Timeout when waiting for file /etc/cni/net.d/calico-kubeconfig
Versions:
ansible==2.9.6
jinja2==2.11.1
netaddr==0.7.19
pbr==5.4.4
jmespath==0.9.5
ruamel.yaml==0.16.10
Steps to reproduce:
It fails to copy the file to the node02.
For those who may come here from google search. For me it failed on the task:
- name: Wait for calico kubeconfig to be created
wait_for:
path: /etc/cni/net.d/calico-kubeconfig
when:
- inventory_hostname not in groups['kube-master']
- calico_datastore == "kdd" # this guy
I decided to switch to another datastore in k8s-net-calico.yml: calico_datastore: "etcd", and it installed everything correctly. The node status was NotReady nevertheless:
kubectl describe node node02
# ...
Normal Starting 24m kubelet Starting kubelet.
Warning CheckLimitsForResolvConf 24m kubelet open /run/systemd/resolve/resolv.conf: no such file or directory
Somehow, systemd-resolved.service happened to be not activated on the node:
node02$ resolvectl query google.com
google.com: resolve call failed: Unit dbus-org.freedesktop.resolve1.service not found.
So, the solution was just enabling the systemd-resolved service:
systemctl enable systemd-resolved.service