BUG REPORT
Install Kubernetes cluster with 2 worker nodes, then add one more node, using scale.yml
After finishing ansible playbook, you can see that the new node is NotReady,
journalctl -xeu kubelet on that specific node tells, NetworkNotReady ..
After digging into another node the file
/etc/cni/net.d/10-calico.conflist
was different on new node/old node...
Environment:
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Ubuntu 18.04
Version of Ansible (ansible --version):
2.7.5
Kubespray version (commit) (git rev-parse --short HEAD):
master/ adf6a712
Network plugin used:
calico / v3.4.0
Copy of your inventory file:
[all]
k8s-master-1 ansible_ssh_host=192.168.16.211 access_ip=192.168.16.211
k8s-master-2 ansible_ssh_host=192.168.16.212 access_ip=192.168.16.212
k8s-master-3 ansible_ssh_host=192.168.16.213 access_ip=192.168.16.213
k8s-node-1 ansible_ssh_host=192.168.16.215 access_ip=192.168.16.215
k8s-node-2 ansible_ssh_host=192.168.16.216 access_ip=192.168.16.216
k8s-node-3 ansible_ssh_host=192.168.16.217 access_ip=192.168.16.217
[kube-master]
k8s-master-1
k8s-master-2
k8s-master-3
[kube-master:vars]
vip_address=192.168.16.210
[etcd]
k8s-master-1
k8s-master-2
k8s-master-3
[kube-node]
k8s-node-1
k8s-node-2
k8s-node-3
[k8s-cluster:children]
kube-master
kube-node
Command used to invoke ansible:
ansible-playbook -i inventory/mine/hosts.ini -u ubuntu -b scale.yml
Output of ansible run:
playbook succeded
Anything else do we need to know:
I thought calico_version is not defined well when the playbook runs but I added a Debug:
task for calico config
- name: Calico | Write Calico cni config
template:
src: "cni-calico.conflist.j2"
dest: "/etc/cni/net.d/{% if calico_version is version('v3.3.0', '>=') %}calico.conflist.template{% else %}10-calico.conflist{% endif %}"
owner: kube
Debug version
- name: Debug Calico Version
debug:
msg: "Debug calico_version : {{calico_version}} , {{calico_version is version('v3.3.0', '>=') }}"
output is :
ok: [k8s-node-3] => {
"msg": "Debug calico_version : v3.4.0 , True"
}
File on newest node
~~~
{
"name": "k8s-pod-network",
"type": "calico",
"etcd_endpoints": "",
"etcd_key_file": "",
"etcd_cert_file": "",
"etcd_ca_cert_file": "",
"log_level": "warn",
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s",
"k8s_api_root": "https://10.233.0.1:443",
"k8s_auth_token": "eyJXxxxxxxxxxxxx"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
~~~
File on a working node
~
{
"name": "cni0",
"cniVersion":"0.3.1",
"plugins":[
{
"nodename": "k8s-node-3",
"type": "calico",
"etcd_endpoints": "https://192.168.16.211:2379,https://192.168.16.212:2379,https://192.168.16.213:2379",
"etcd_cert_file": "/etc/calico/certs/cert.crt",
"etcd_key_file": "/etc/calico/certs/key.pem",
"etcd_ca_cert_file": "/etc/calico/certs/ca_cert.crt",
"log_level": "info",
"ipam": {
"type": "calico-ipam",
"assign_ipv4": "true",
"ipv4_pools": ["10.233.64.0/18"]
},
"policy": {
"type": "k8s"
}, "kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type":"portmap",
"capabilities":{
"portMappings":true
}
}
]
}
~
I use calico with the following settings
~~~
peer_with_router: true
peers: []
~~~
And I manually set up routes in openstack router.
Kubectl describe bew node
~
SufficientPID kubelet has sufficient PID available
Ready False Mon, 04 Mar 2019 08:32:05 +0000 Mon, 04 Mar 2019 08:29:04 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
~
Kubelet logs
W0304 08:52:20.254027 18127 cni.go:149] Error loading CNI config list file /etc/cni/net.d/10-calico.conflist: error parsing configuration list: no 'plugins' key
W0304 08:52:20.254057 18127 cni.go:203] Unable to update cni config: No valid networks found in /etc/cni/net.d
E0304 08:52:20.254191 18127 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
W0304 08:52:25.255735 18127 cni.go:149] Error loading CNI config list file /etc/cni/net.d/10-calico.conflist: error parsing configuration list: no 'plugins' key
W0304 08:52:25.255774 18127 cni.go:203] Unable to update cni config: No valid networks found in /etc/cni/net.d
E0304 08:52:25.255961 18127 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I0304 08:52:27.453683 18127 setters.go:72] Using node IP: "192.168.16.218"
W0304 08:52:30.265524 18127 cni.go:149] Error loading CNI config list file /etc/cni/net.d/10-calico.conflist: error parsing configuration list: no 'plugins' key
W0304 08:52:30.265548 18127 cni.go:203] Unable to update cni config: No valid networks found in /etc/cni/net.d
E0304 08:52:30.265719 18127 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I have found out that if I delete the calico pod from the new node (restart node/ restart calico pod/container on that node) the node becomes healthy and the proper file 10-calico.conflist is put in place.
I'm having the same problem
Same error here when adding a new node with scale.yml:
W0327 18:51:54.542772 59070 cni.go:149] Error loading CNI config list file /etc/cni/net.d/10-calico.conflist: error parsing configuration list: no 'plugins' key
W0327 18:51:54.542789 59070 cni.go:203] Unable to update cni config: No valid networks found in /etc/cni/net.d
E0327 18:51:54.542854 59070 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin
Can confirm that manually deleting the calico-node pod on affected nodes fixes it.
Same issue here, same fix for me.
Cloud Provider: Packet
OS: CentOS 7
Ansible: 2.7.10
Just hit this problem too.
We've same issue scaling but also upgrading calico from v3.1.3 to v3.4.0 with upgrade-cluster.yml
This only happens with the nodes, the masters are updated ok
Apr 15 14:22:45 caas-xavi-dev-lb-01 kubelet[21136]: W0415 14:22:45.150469 21136 cni.go:149] Error loading CNI config list file /etc/cni/net.d/10-calico.conflist: error parsing configuration list: no 'plugins' key
Apr 15 14:22:45 caas-xavi-dev-lb-01 kubelet[21136]: W0415 14:22:45.150495 21136 cni.go:203] Unable to update cni config: No valid networks found in /etc/cni/net.d
Apr 15 14:22:45 caas-xavi-dev-lb-01 kubelet[21136]: E0415 14:22:45.150609 21136 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Manually deleting the calico-node pod fixes the problem
Before deleting the pod:
ls -l /etc/cni/net.d
total 12
-rw-rw-r-- 1 root root 1315 Apr 15 14:04 10-calico.conflist
-rw------- 1 root root 2566 Apr 15 14:04 calico-kubeconfig
-rw-r--r-- 1 kube root 801 Apr 15 14:06 calico.conflist.template
After deleting the pod:
ls -l /etc/cni/net.d
total 12
-rw-r--r-- 1 root root 810 Apr 15 15:23 10-calico.conflist
-rw------- 1 root root 2566 Apr 15 15:23 calico-kubeconfig
-rw-r--r-- 1 kube root 801 Apr 15 14:06 calico.conflist.template
@mirwan @wangxf1987 do you think this can be related with #4102 and the new template name? https://github.com/kubernetes-sigs/kubespray/pull/4102/files#diff-f336e04badfa2e647399ccaeed760b13R5
Also having the issue with improper config in /etc/cni/net.d/10-canal.conflist (having kube_network_plugin: canal). Restarting canal-node-xxxxx pod solves the issue.
kubespray version: 2.10.0
We just hit the problem as well here, upgrading from 2.8.5 to 2.9.0 with canal.
Having the same problem here. Deleting them works fine for me too, posted my command to delete them to give y'all a shortcut.
sudo docker ps --all --filter name=.*calico.* --no-trunc --format {{.ID}} | xargs sudo docker rm -f
also hit this issue, after delete calico pod running in new worker node, cannot fix this issue.
I'm hitting the same issue while upgrading from kubespray 2.8.x to 2.9.0 and from k8s 1.12.9 to 1.13.5.
Deleting pods is solving the problem.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Most helpful comment
I have found out that if I delete the calico pod from the new node (restart node/ restart calico pod/container on that node) the node becomes healthy and the proper file 10-calico.conflist is put in place.