Kubespray: Kubespray not adding files to /etc/cni/net.d

Created on 24 Feb 2020  路  14Comments  路  Source: kubernetes-sigs/kubespray

Environment:

  • Bare Metal: Dell Servers
  • *Linux 4.4.0-174-generic x86_64
    NAME="Ubuntu"
    VERSION="16.04.6 LTS (Xenial Xerus)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 16.04.6 LTS"
    VERSION_ID="16.04"
    HOME_URL="http://www.ubuntu.com/"
    SUPPORT_URL="http://help.ubuntu.com/"
    BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
    VERSION_CODENAME=xenial
    UBUNTU_CODENAME=xenial
    *

  • **# ansible --version
    ansible 2.7.10
    config file = /root/kubespray-2.11.0/ansible.cfg
    configured module search path = [u'/root/kubespray-2.11.0/library']
    ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
    executable location = /usr/local/bin/ansible
    python version = 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609]

  • **# python --version
    Python 2.7.12

Kubespray version - 2.11.0

Network plugin used: Calico

Copy of your inventory file:
[calico-rr]

[k8s-cluster:children]
kube-master
kube-node
calico-rr

Command used to invoke ansible: ansible-playbook -i inventory/mycluster/hosts.yaml cluster.yml -u root --ask-pass -b --become-user=root -vvvv

Output of ansible run:
https://gist.github.com/mhabicht/452b838a809564ff568a15b0ae052d84

Anything else do we need to know:
journalctl -u kubelet.service

Feb 22 00:12:02 avinetarchkm-anp1 kubelet[16949]: I0222 00:12:02.495325 16949 setters.go:73] Using node IP: "10.246.4.152" Feb 22 00:12:03 avinetarchkm-anp1 kubelet[16949]: E0222 00:12:03.842073 16949 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Feb 22 00:12:03 avinetarchkm-anp1 kubelet[16949]: W0222 00:12:03.950817 16949 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d Feb 22 00:12:08 avinetarchkm-anp1 kubelet[16949]: E0222 00:12:08.842801 16949 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

kinbug

All 14 comments

We need more info like the whole ansible output to see what happened during the network CNI deployment. The inventory and the group_vars also would help.

How do I save the full playbook output? too many lines to copy, and it appears to replay log /var/log/ansible/log was not added till 2.9, I am running 2.7.10.
Here is the last 20k lines https://gist.github.com/mhabicht/8e108995fa24dd1be2ad68b4a01f60bc

You can tee the output to a file. There are a lot of lines because you are running the playbook with four verbosity levels. You can reduce it to two or at most three.

ansible-playbook -i inventory/mycluster/hosts.yaml cluster.yml -u root --ask-pass -b --become-user=root -vv | tee output.log

The deployment of Calico in output.log is the most interesting in your case.

`kubespray-2.11.0/roles/network_plugin/calico/tasks# ls
check.yml install.yml main.yml pre.yml reset.yml upgrade.yml

kubespray-2.11.0/roles/network_plugin/calico/tasks# cat install.yml

  • name: Calico | Copy calicoctl binary from download dir
    copy:
    src: "{{ local_release_dir }}/calicoctl"
    dest: "{{ bin_dir }}/calicoctl"
    mode: 0755
    remote_src: yes

  • name: Calico | Write Calico cni config
    template:
    src: "cni-calico.conflist.j2"
    dest: "/etc/cni/net.d/{% if calico_version is version('v3.3.0', '>=') %}calico.conflist.template{% else %}10-calico.conflist{% endif %}"
    owner: kube
    register: calico_conflist
    notify: reset_calico_cni

  • name: Calico | Create calico certs directory
    file:
    dest: "{{ calico_cert_dir }}"
    state: directory
    mode: 0750
    owner: root
    group: root
    when: calico_datastore == "etcd"

  • name: Calico | Link etcd certificates for calico-node
    file:
    src: "{{ etcd_cert_dir }}/{{ item.s }}"
    dest: "{{ calico_cert_dir }}/{{ item.d }}"
    state: hard
    force: yes
    with_items:

    • {s: "{{ kube_etcd_cacert_file }}", d: "ca_cert.crt"}
    • {s: "{{ kube_etcd_cert_file }}", d: "cert.crt"}
    • {s: "{{ kube_etcd_key_file }}", d: "key.pem"}
      when: calico_datastore == "etcd"
  • name: Calico | Install calicoctl wrapper script
    template:
    src: "calicoctl.{{ calico_datastore }}.sh.j2"
    dest: "{{ bin_dir }}/calicoctl.sh"
    mode: 0755
    owner: root
    group: root

  • name: Calico | wait for etcd
    uri:
    url: "{{ etcd_access_addresses.split(',') | first }}/health"
    validate_certs: no
    client_cert: "{{ calico_cert_dir }}/cert.crt"
    client_key: "{{ calico_cert_dir }}/key.pem"
    register: result
    until: result.status == 200 or result.status == 401
    retries: 10
    delay: 5
    run_once: true
    when: calico_datastore == "etcd"

  • name: Calico | Check if calico network pool has already been configured
    shell: >
    {{ bin_dir }}/calicoctl.sh get ippool | grep -w "{{ calico_pool_cidr | default(kube_pods_subnet) }}" | wc -l
    register: calico_conf
    retries: 4
    delay: "{{ retry_stagger | random + 3 }}"
    changed_when: false
    when:

    • inventory_hostname == groups['kube-master'][0]
  • name: Calico | Ensure that calico_pool_cidr is within kube_pods_subnet when defined
    assert:
    that: "[calico_pool_cidr] | ipaddr(kube_pods_subnet) | length == 1"
    msg: "{{ calico_pool_cidr }} is not within or equal to {{ kube_pods_subnet }}"
    when:

    • inventory_hostname == groups['kube-master'][0]
    • 'calico_conf.stdout == "0"'
    • calico_pool_cidr is defined
  • name: Calico | Create calico manifests for kdd
    template:
    src: "{{ item.file }}.j2"
    dest: "{{ kube_config_dir }}/{{ item.file }}"
    with_items:

    • {name: calico, file: kdd-crds.yml, type: kdd}
      register: calico_node_kdd_manifest
      when:
    • inventory_hostname in groups['kube-master']
    • calico_datastore == "kdd"
  • name: Calico | Create Calico Kubernetes datastore resources
    kube:
    name: "{{ item.item.name }}"
    namespace: "kube-system"
    kubectl: "{{ bin_dir }}/kubectl"
    resource: "{{ item.item.type }}"
    filename: "{{ kube_config_dir }}/{{ item.item.file }}"
    state: "latest"
    with_items:

    • "{{ calico_node_kdd_manifest.results }}"
      when:
    • inventory_hostname == groups['kube-master'][0]
    • not item is skipped
      loop_control:
      label: "{{ item.item.file }}"
  • name: Calico | Configure calico network pool (version < v3.3.0)
    shell: >
    echo "
    { "kind": "IPPool",
    "apiVersion": "projectcalico.org/v3",
    "metadata": {
    "name": "{{ calico_pool_name }}",
    },
    "spec": {
    "cidr": "{{ calico_pool_cidr | default(kube_pods_subnet) }}",
    "ipipMode": "{{ ipip_mode }}",
    "natOutgoing": {{ nat_outgoing|default(false) and not peer_with_router|default(false) }} }} " | {{ bin_dir }}/calicoctl.sh create -f -
    when:

    • inventory_hostname == groups['kube-master'][0]
    • 'calico_conf.stdout == "0"'
    • calico_version is version("v3.3.0", "<")
  • name: Calico | Configure calico network pool (version >= v3.3.0)
    shell: >
    echo "
    { "kind": "IPPool",
    "apiVersion": "projectcalico.org/v3",
    "metadata": {
    "name": "{{ calico_pool_name }}",
    },
    "spec": {
    "blockSize": "{{ kube_network_node_prefix }}",
    "cidr": "{{ calico_pool_cidr | default(kube_pods_subnet) }}",
    "ipipMode": "{{ ipip_mode }}",
    "natOutgoing": {{ nat_outgoing|default(false) and not peer_with_router|default(false) }} }} " | {{ bin_dir }}/calicoctl.sh create -f -
    when:

    • inventory_hostname == groups['kube-master'][0]
    • 'calico_conf.stdout == "0"'
    • calico_version is version("v3.3.0", ">=")
  • name: "Determine nodeToNodeMesh needed state"
    set_fact:
    nodeToNodeMeshEnabled: "false"
    when:

    • peer_with_router|default(false) or peer_with_calico_rr|default(false)
    • inventory_hostname in groups['k8s-cluster']
      run_once: yes
  • name: Calico | Set global as_num
    shell: >
    echo '
    { "kind": "BGPConfiguration",
    "apiVersion": "projectcalico.org/v3",
    "metadata": {
    "name": "default",
    },
    "spec": {
    "logSeverityScreen": "Info",
    "nodeToNodeMeshEnabled": {{ nodeToNodeMeshEnabled|default('true') }} ,
    "asNumber": {{ global_as_num }} }} ' | {{ bin_dir }}/calicoctl.sh create --skip-exists -f -
    changed_when: false
    when:

    • inventory_hostname == groups['kube-master'][0]
  • name: Calico | Configure peering with router(s) at global scope
    shell: >
    echo '{
    "apiVersion": "projectcalico.org/v3",
    "kind": "BGPPeer",
    "metadata": {
    "name": "global-{{ item.router_id }}"
    },
    "spec": {
    "asNumber": "{{ item.as }}",
    "peerIP": "{{ item.router_id }}"
    }}' | {{ bin_dir }}/calicoctl.sh create --skip-exists -f -
    retries: 4
    delay: "{{ retry_stagger | random + 3 }}"
    with_items:

    • "{{ peers|selectattr('scope','defined')|selectattr('scope','equalto', 'global')|list|default([]) }}"
      when:
    • inventory_hostname == groups['kube-master'][0]
    • peer_with_router|default(false)
  • name: Calico | Configure peering with route reflectors at global scope
    shell: |
    echo '{
    "apiVersion": "projectcalico.org/v3",
    "kind": "BGPPeer",
    "metadata": {
    "name": "peer-to-rrs"
    },
    "spec": {
    "nodeSelector": "!has(i-am-a-route-reflector)",
    "peerSelector": "has(i-am-a-route-reflector)"
    }}' | {{ bin_dir }}/calicoctl.sh create --skip-exists -f -
    retries: 4
    delay: "{{ retry_stagger | random + 3 }}"
    with_items:

    • "{{ groups['calico-rr'] | default([]) }}"
      when:
    • inventory_hostname == groups['kube-master'][0]
    • peer_with_calico_rr|default(false)
  • name: Calico | Configure route reflectors to peer with each other
    shell: >
    echo '{
    "apiVersion": "projectcalico.org/v3",
    "kind": "BGPPeer",
    "metadata": {
    "name": "rr-mesh"
    },
    "spec": {
    "nodeSelector": "has(i-am-a-route-reflector)",
    "peerSelector": "has(i-am-a-route-reflector)"
    }}' | {{ bin_dir }}/calicoctl.sh create --skip-exists -f -
    retries: 4
    delay: "{{ retry_stagger | random + 3 }}"
    with_items:

    • "{{ groups['calico-rr'] | default([]) }}"
      when:
    • inventory_hostname == groups['kube-master'][0]
    • peer_with_calico_rr|default(false)
  • name: Calico | Create calico manifests
    template:
    src: "{{ item.file }}.j2"
    dest: "{{ kube_config_dir }}/{{ item.file }}"
    with_items:

    • {name: calico-config, file: calico-config.yml, type: cm}
    • {name: calico-node, file: calico-node.yml, type: ds}
    • {name: calico, file: calico-node-sa.yml, type: sa}
    • {name: calico, file: calico-cr.yml, type: clusterrole}
    • {name: calico, file: calico-crb.yml, type: clusterrolebinding}
      register: calico_node_manifests
      when:
    • inventory_hostname in groups['kube-master']
    • rbac_enabled or item.type not in rbac_resources
  • name: Calico | Create calico manifests for typha
    template:
    src: "{{ item.file }}.j2"
    dest: "{{ kube_config_dir }}/{{ item.file }}"
    with_items:

    • {name: calico, file: calico-typha.yml, type: typha}
      register: calico_node_typha_manifest
      when:
    • inventory_hostname in groups['kube-master']
    • typha_enabled and calico_datastore == "kdd"
  • name: Start Calico resources
    kube:
    name: "{{ item.item.name }}"
    namespace: "kube-system"
    kubectl: "{{ bin_dir }}/kubectl"
    resource: "{{ item.item.type }}"
    filename: "{{ kube_config_dir }}/{{ item.item.file }}"
    state: "latest"
    with_items:

    • "{{ calico_node_manifests.results }}"
    • "{{ calico_node_kdd_manifest.results }}"
    • "{{ calico_node_typha_manifest.results }}"
      when:
    • inventory_hostname == groups['kube-master'][0]
    • not item is skipped
      loop_control:
      label: "{{ item.item.file }}"
  • name: Wait for calico kubeconfig to be created
    wait_for:
    path: /etc/cni/net.d/calico-kubeconfig
    when:

    • inventory_hostname not in groups['kube-master']
    • calico_datastore == "kdd"
  • name: Calico | Configure node asNumber for per node peering
    shell: >
    echo '{
    "apiVersion": "projectcalico.org/v3",
    "kind": "Node",
    "metadata": {
    "name": "{{ inventory_hostname }}"
    },
    "spec": {
    "bgp": {
    "asNumber": "{{ local_as }}"
    },
    "orchRefs":[{"nodeName":"{{ inventory_hostname }}","orchestrator":"k8s"}]
    }}' | {{ bin_dir }}/calicoctl.sh {{ 'apply -f -' if calico_datastore == "kdd" else 'create --skip-exists -f -' }}
    retries: 4
    delay: "{{ retry_stagger | random + 3 }}"
    when:

    • peer_with_router|default(false)
    • inventory_hostname in groups['k8s-cluster']
    • local_as is defined
    • groups['calico-rr'] | default([]) | length == 0
  • name: Calico | Configure peering with router(s) at node scope
    shell: >
    echo '{
    "apiVersion": "projectcalico.org/v3",
    "kind": "BGPPeer",
    "metadata": {
    "name": "{{ inventory_hostname }}-{{ item.router_id }}"
    },
    "spec": {
    "asNumber": "{{ item.as }}",
    "node": "{{ inventory_hostname }}",
    "peerIP": "{{ item.router_id }}"
    }}' | {{ bin_dir }}/calicoctl.sh create --skip-exists -f -
    retries: 4
    delay: "{{ retry_stagger | random + 3 }}"
    with_items:

    • "{{ peers|selectattr('scope','undefined')|list|default([]) | union(peers|selectattr('scope','defined')|selectattr('scope','equalto', 'node')|list|default([])) }}"
      when:
    • peer_with_router|default(false)
    • inventory_hostname in groups['k8s-cluster']
      `

output.log

Does the play finish there? Because I need the rest of it to understand what went wrong.

Yes the playbook stops there, with -vvvv it ended with the broken pipe error. with -vv it ends with:

TASK [network_plugin/calico : Calico | Write Calico cni config] ************************
task path: /root/kubespray-2.11.0/roles/network_plugin/calico/tasks/install.yml:9
Thursday 05 March 2020 08:00:17 -0500 (0:00:01.464) 0:14:00.558
*
ERROR! The requested handler 'reset_calico_cni' was not found in either the main handlers list nor in the listening handlers list

What do you have under: kubespray-2.11.0/roles/network_plugin/calico/handlers/ ? You should have a main.yml with the 'reset_calico_cni' task.
Also, the ansible version you should use with kubespray 2.11 is 2.7.12
You are using 2.7.10 It's a minor change but it might be problematic. you should do a pip install -r requirements.txt which is under the kubespray folder.

"You should have a main.yml with the 'reset_calico_cni' task." Yes I do.

Upgraded by running pip install -r requirements.txt

Re-running playbook

It appears that upgradeing to 2.7.12 fixed the problem.
# ls /etc/cni/net.d/
10-calico.conflist calico.conflist.template calico-kubeconfig

Complete and all nodes show ready.

NAME STATUS ROLES AGE VERSION
avinetarch-anp1 Ready 3m52s v1.15.3
avinetarch-anp2 Ready 3m55s v1.15.3
avinetarch-anp3 Ready 3m51s v1.15.3
avinetarch-anp4 Ready 3m49s v1.15.3
avinetarch-anp5 Ready 3m56s v1.15.3
avinetarch-anp6 Ready 3m37s v1.15.3
avinetarch-anp7 Ready 3m57s v1.15.3
avinetarch-anp8 Ready 3m54s v1.15.3
avinetarch-anp9 Ready 3m41s v1.15.3
avinetarchkm-anp1 Ready master 5m53s v1.15.3
avinetarchkm-anp2 Ready master 5m23s v1.15.3
avinetarchkm-anp3 Ready master 5m25s v1.15.3

Great!
/close

@alijahnas: Closing this issue.

In response to this:

Great!
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Actually, it's not fixed. It fails in 300 seconds while waiting for the /etc/cni/net.d/calico-kubeconfig file on one host of three.

Timeout when waiting for file /etc/cni/net.d/calico-kubeconfig

Versions:

ansible==2.9.6
jinja2==2.11.1
netaddr==0.7.19
pbr==5.4.4
jmespath==0.9.5
ruamel.yaml==0.16.10

Steps to reproduce:

  • install the cluster (node01=master, node02=master, node03=none)
  • reset the cluster
  • try installing the cluster again with (node01=master, node02=none, node03=none)

It fails to copy the file to the node02.

For those who may come here from google search. For me it failed on the task:

- name: Wait for calico kubeconfig to be created
  wait_for:
    path: /etc/cni/net.d/calico-kubeconfig
  when:
    - inventory_hostname not in groups['kube-master']
    - calico_datastore == "kdd"  # this guy

I decided to switch to another datastore in k8s-net-calico.yml: calico_datastore: "etcd", and it installed everything correctly. The node status was NotReady nevertheless:

kubectl describe node node02
# ...
  Normal   Starting                  24m   kubelet  Starting kubelet.
  Warning  CheckLimitsForResolvConf  24m   kubelet  open /run/systemd/resolve/resolv.conf: no such file or directory

Somehow, systemd-resolved.service happened to be not activated on the node:

node02$ resolvectl query google.com
google.com: resolve call failed: Unit dbus-org.freedesktop.resolve1.service not found.

So, the solution was just enabling the systemd-resolved service:

systemctl enable systemd-resolved.service
Was this page helpful?
0 / 5 - 0 ratings