Amazon-vpc-cni-k8s: network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady...

Created on 9 Jan 2019 · 38Comments · Source: aws/amazon-vpc-cni-k8s

EKS: v1.11.5
CNI: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:1.3.0
AMI: amazon-eks-node-1.11-v20181210 (ami-0a9006fb385703b54)

We are still seeing these CNI errors in pod events. e.g.

Events:
  Type     Reason           Age               From                                                Message
  ----     ------           ----              ----                                                -------
  Warning  NetworkNotReady  5s (x3 over 35s)  kubelet, ip-10-0-26-197.eu-west-1.compute.internal  network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]

I tried to run /opt/cni/bin/aws-cni-support.sh on the node with pod aws-node-hhtrt but I get this error:

[root@ip-10-0-25-4 ~]# /opt/cni/bin/aws-cni-support.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1223  100  1223    0     0   1223      0  0:00:01 --:--:--  0:00:01 1194k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   912  100   912    0     0    912      0  0:00:01 --:--:--  0:00:01  890k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   106  100   106    0     0    106      0  0:00:01 --:--:--  0:00:01  103k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    83  100    83    0     0     83      0  0:00:01 --:--:--  0:00:01 83000
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    28  100    28    0     0     28      0  0:00:01 --:--:--  0:00:01 28000
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6268  100  6268    0     0   6268      0  0:00:01 --:--:--  0:00:01 6121k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 10255: Connection refused

bug

Source

max-rocket-internet

👎1 👍1

Most helpful comment

I fixed this issue by upgrading Kubernetes components. I had the same problem in my AWS EKS cluster. So ran below commands using eksctl CLI tool.

eksctl utils update-kube-proxy --name Your_Cluster_Name --approve
eksctl utils update-aws-node --name Your_Cluster_Name --approve
eksctl utils update-coredns --name Your_Cluster_Name --approve

SarasaGunawardhana on 21 Mar 2020

👍14 🎉8 ❤2

All 38 comments

Hitting this as well with the same setup as above.

nxf5025 on 11 Jan 2019

I tried to run /opt/cni/bin/aws-cni-support.sh on the node with pod aws-node-hhtrt but I get this error:

The second one is same with https://github.com/aws/amazon-vpc-cni-k8s/issues/285

The line in the script should be updated as command -v kubectl > /dev/null && kubectl get --kubeconfig=/var/lib/kubelet/kubeconfig --raw=/api/v1/pods or something.

nak3 on 12 Jan 2019

👍1

Still seeing this now and again:

  Warning  FailedCreatePodSandBox  7m33s                  kubelet, ip-10-0-25-88.eu-west-1.compute.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2c600b4c1a8f344393f614e04706dc428ba1467ca67cb1169674807bd830646d" network for pod "ingress02-nginx-ingress-controller-dr84q": NetworkPlugin cni failed to set up pod "ingress02-nginx-ingress-controller-dr84q_default" network: add cmd: failed to assign an IP address to container

max-rocket-internet on 8 May 2019

@max-rocket-internet Hey, are you still hitting the issue of network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized?

With the newer CNI the support script should work since it addresses the change for that kubelet port.

Have you seen network: add cmd: failed to assign an IP address to container again? This went into 1.4 https://github.com/aws/amazon-vpc-cni-k8s/pull/367.

For either of these, if you have, what CNI version are you using? Thanks!

tiffanyfay on 30 May 2019

I'm on EKS 1.12.7 and CNI 1.3.3, and this error actually happened to most of my nodes for about 10 minutes, and resolved itself magically (seemingly). It was right after I re-deployed my ASGs through CloudFormation.
I'm using the latest AMI as of this date.

@tiffanyfay Do you have any insight on why this could happen?

TarekAS on 10 Jun 2019

Recording this here in case it helps others.

I had a MutatingWebhookConfiguration hanging around that was no longer relevant and there were no pods available to service it. This was stopping nodes from becoming Ready. The kubelet logs and describe node messages had the exact same error as recorded here.

In my case, running kubectl delete MutatingWebhookConfiguration <name> and then restarting one of the kubelets caused all nodes to become healthy/ready.

stephenmuss on 14 Jun 2019

👍8

I also had a similar issue today with EKS 1.12 and CNI plugin version 1.4.1.
I also got KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized on a few nodes for about 10-20 minutes, then it worked again.

I didn't find anymore debugging info and the nodes were replaced by the cluster auto-scaler. Is there anything I should look out for if this happens again?

Pharb on 27 Jun 2019

I have been seeing this problem on pods that start immediately after the kube node comes up. If I delete the pods and have them "try again" they get their IPs and there's no warnings. Could this be solved through a node readiness change?

xrl on 7 Jul 2019

Just had the same issue with 1.11.9. The cni networking failed on one of two new nodes so the failed node never joined the cluster. A reboot from the AWS Console got it working

jahoward on 26 Jul 2019

the first two warnings I see are

Jul 26 08:18:06 ip-10-2-118-4.ap-southeast-2.compute.internal kubelet[4537]: W0726 08:18:06.517655    4537 cni.go:172] Unable to update cni config: No networks found in /etc/
Jul 26 08:18:06 ip-10-2-118-4.ap-southeast-2.compute.internal kubelet[4537]: W0726 08:18:06.521509    4537 cni.go:172] Unable to update cni config: No networks found in /etc/

jahoward on 26 Jul 2019

Just had the same issue with 1.11.9. The cni networking failed on one of two new nodes so the failed node never joined the cluster. A reboot from the AWS Console got it working

This is the workaround we use as well.

Our environment where we ran into it:

k8s: Kubernetes v1.13.11-eks-5876d6
cni plugin: amazon-k8s-cni:v1.5.3

This has happened only a couple times over half a year (so on older versions too), so it's difficult for us to reproduce.

schahal on 28 Oct 2019

EKS: v1.14.7-eks-1861c5
CNI: amazon-k8s-cni:v1.5.3
AMI: amazon-eks-node-1.14-v20190927 (ami-0e21bc066a9dbabfa)

Same problem on multiple EKS cluster. New VMs cannot join the cluster.

Kubelet error on the nodes:

Oct 29 07:29:27 ip-10-1-21-123.eu-central-1.compute.internal kubelet[3727]: W1029 07:29:27.735403    3727 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Oct 29 07:29:28 ip-10-1-21-123.eu-central-1.compute.internal kubelet[3727]: E1029 07:29:28.262822    3727 kubelet.go:2172] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Events:

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

temal- on 29 Oct 2019

@temal- Thanks for the report. If the cni binary and config file is missing, ipamd must have failed to start correctly on the new node. There are a few possible options. Either the calls to the EC2 control plane got throttled and timed out, or there are no more ENIs or IPs available in the subnet. If you could get the logfiles from ipamd on a node that has this issue it would be extremely helpful.

(A comprehensive log collector script: amazon-eks-ami/log-collector-script)

mogren on 30 Oct 2019

@mogren Thanks for the quick reply.
I think it was related to an ongoing linkerd installation. Sadly, I couldn't reproduce the error afterwards (which is strange, because it happened on two different clusters) and therefore wasn't able to run the collector script. If the issue appears again, I'll come back here with more information.

temal- on 31 Oct 2019

We are trying to create a new cluster using eksctl and face the same error . The cluster is created successfully but the nodes are not become ready . Detail is below.

EKS: version 1.14
CNI: amazon-k8s-cni:v1.5.3
AMI: amazon-eks-node-1.14-v20190927 (ami-02e124a380df41614)
create cluster in the existing vpc and subnets which have sufficient ips.
Kubelet error on the nodes ↓:

W1101 03:31:48.212631    3705 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
E1101 03:31:48.430668    3705 kubelet.go:2172] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

↓ exec sudo bash eks-log-collector.sh on the node. but all log files in the ipamd directory are empty.

s-tokutake on 1 Nov 2019

We are trying to create a new cluster using eksctl and face the same error . The cluster is created successfully but the nodes are not become ready . Detail is below.

EKS: version 1.14

CNI: amazon-k8s-cni:v1.5.3

AMI: amazon-eks-node-1.14-v20190927 (ami-02e124a380df41614)

create cluster in the existing vpc and subnets which have sufficient ips.

Kubelet error on the nodes ↓:
W1101 03:31:48.212631    3705 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
E1101 03:31:48.430668    3705 kubelet.go:2172] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
↓ exec sudo bash eks-log-collector.sh on the node. but all log files in the ipamd directory are empty.

EKS : 1.13
CNI: amazon-k8s-cni:v1.5.3
AMI: ami-0619d38218e46ef86

Got the same issue while updating the EKS cluster.
Freshly created worker nodes can't become Ready state.
Rollback fromamazon-k8s-cni:v1.5.3 to amazon-k8s-cni:v1.5.1 resolved Issue.

UPD:
My main issue is about the mess with SG rules between worker_node group and ControlPlane.
After updating SG rules everything looks fine with CNI 1.5.1 and 1.5.3.
Guys don't forget to check and edit ControlPlane SG Inbound and Outbound rules.
Inbound: 443 port for worker_nodes SG
Outbound: 443 and 1025 - 65535 ports for worker_nodes SG

But really strange that with not all needed SG rules in ControlPlane SG with CNI 1.5.1 new worker nodes are become Ready state.

mak-1-sim on 2 Nov 2019

Create cluster , and downgrade v1.5.3 to v1.5.1 ( kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/ffaf737145ab3262b7afd0ddbdf613a2174f30dd/config/v1.5/aws-k8s-cni.yaml ), but the issue has been not resolved .
/etc/cni/ directory does not exists in the node.

s-tokutake on 6 Nov 2019

Had the same issue as @hardcorexcat, rollback to v1.5.1 resolved the issue too, and now after a while upgrading back to v1.5.3 works too with new nodes. Clusters are created with terraform-aws-eks.

hhamalai on 6 Nov 2019

Hi all,

Here is my case:

Environment

EKS: v1.14
CNI: v1.5.3
AMI: ami-082bb518441d3954c

I just created a fresh EKS cluster with 2 worker nodes that join the cluster but with a NotReady status. After downgrading the CNI version from v1.5.3 to v1.5.1, workers get Ready status but when checking the subnet's IP that should be assigned to the workers there is only one IP.

Regards,
Josemi.

ixjosemi on 7 Nov 2019

Hi, I am hitting the same issue with eks cluster created by terraform.

mluscon on 23 Nov 2019

Just in case someone comes across this who is using a g4dn family instance on AWS. I was stuck on this for a while because the version of the CNI plugin I was using didn't support that family. After upgrading the CNI plugin it worked. https://docs.aws.amazon.com/eks/latest/userguide/cni-upgrades.html

eightlimbed on 5 Dec 2019

👍10

For the past few days I've been experimenting with EKS cluster creation. I'm using terraform, actually a terraform module similar to the popular community module.
What I've observed:

Creating clusters below version 1.14 have no problems with worker nodes being in a "Ready" state. I'm using the latest CNI version: amazon-k8s-cni:v1.5.5
BUT,
No matter what I try when creating 1.14 version clusters, the worker nodes are in the "NotReady" state even though I've applied the aws-auth-cm.yaml configmap and the latest CNI version. Upon closer look (kubectl describe node <node_name>) I see an error that the CNI is uninitialized, also when I look at the running pods (kubectl get pods -n kube-system) I can see the core-dns pods being in a "pending" state and the aws-node pods crashing every few seconds.
I've then taken some steps to see if I could fix it:
a) downgraded the CNI version to 1.5.3 - this resulted in nodes getting to "Ready" state but this didn't fix the problem, the core-dns pods were now in "ContainerCreating" status constantly and aws-node pods had the same behaviour. Upgrading the CNI back to 1.5.5 didn't change anything.
b) Next what I tried was to create a 1.13 cluster first with nodes using a 1.14 kubernetes AMI. The nodes didn't have any problems joining the cluster and were ready. I then upgraded the cluster version and this resulted in a working 1.14 cluster with the nodes joined and being ready. - HOWEVER, if I increased the number of nodes in an auto scaling group, the new nodes had the same old problems of not being ready no matter what I tried.

To sum up, I've decided to use a 1.13 cluster in which I see no problems with nodes using a 1.14 AMI in hopes of fixing this problem in the near future.

Epilogue: I'm using a full 1.13 version cluster because every once in a while a worker node would briefly become "NotReady" and then after a few seconds revert to Ready. Very strange behaviour.

Erokos on 8 Dec 2019

I have experienced the same as @Erokos. With 1.13 works, with 1.14 nodes fail to get to ready.

I dont think the issue is related to AWS VPC CNI, because I tried replacing it with Calico and got same problem: cni pod (aws-node or calico-node) cannot connect to 10.100.0.1 which is kubernetes service clusterip.

ppaepam on 11 Dec 2019

Coming from AWS support:
It is possible that issue is caused by changes to security group requirements for worker nodes [1] introduced in EKS platform v3 [2]

Another possible cause is my old AWS provider. I use 1.60.0.

Hope this helps

[1] https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html
[2] https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html

ppaepam on 13 Dec 2019

👍3

Hi @ppaepam, is this still an issue?

mogren on 4 Mar 2020

I am also having this issue.

tried add worker node via Cloud formation.

SarasaGunawardhana on 20 Mar 2020

I fixed this issue by upgrading Kubernetes components. I had the same problem in my AWS EKS cluster. So ran below commands using eksctl CLI tool.

eksctl utils update-kube-proxy --name Your_Cluster_Name --approve
eksctl utils update-aws-node --name Your_Cluster_Name --approve
eksctl utils update-coredns --name Your_Cluster_Name --approve

SarasaGunawardhana on 21 Mar 2020

👍14 🎉8 ❤2

This issue contains a mix of CNI versions and EKS cluster versions. I think @ppaepam and @SarasaGunawardhana are both right, and if anyone has similar issues please open a new issue to track that specific case.

mogren on 22 Apr 2020

👍1

I experienced this issue after updating EKS to version 1.16 and @SarasaGunawardhana commands did the trick for me.

mlachmish on 15 May 2020

👍1

@mlachmish also struggeling with it. Thx for the confirmation :)

Alien2150 on 15 May 2020

Leaving this here as this issue was the first result on Google.

The problem for me was that my kube-proxy daemonset was using the --resource-container flag, which was removed on Kubernetes 1.16, resulting in this "cni config uninitialized" error and nodes getting stuck in the NotReady state.

I had to manually edit this daemonset and remove the flag ($ kubectl edit ds kube-proxy -n kube-system).

For reference, this is the daemonset command I'm using now, with kube-proxy 1.16.8:

      - command:
        - /bin/sh
        - -c
        - kube-proxy --oom-score-adj=-998 --master=https://MYCLUSTER.eks.amazonaws.com
          --kubeconfig=/var/lib/kube-proxy/kubeconfig --proxy-mode=iptables --v=2
          1>>/var/log/kube-proxy.log 2>&1

brianstorti on 16 May 2020

👍8 🚀7 ❤4

Thankyou @SarasaGunawardhana, This has just worked for me

brankerd on 10 Jun 2020

Coming from AWS support:
It is possible that issue is caused by changes to security group requirements for worker nodes [1] introduced in EKS platform v3 [2]

Another possible cause is my old AWS provider. I use 1.60.0.

Hope this helps

[1] https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html
[2] https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html

Just to verify, I've recently created a 1.15 cluster with an additional security group for the EKS control plane and have had no problems. Before, and that worked for 1.13 version, my EKS module used to assign the default VPC security group to the EKS cluster control plane.
Thanks to all of you.

Erokos on 11 Jun 2020

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

These logs still occurs on some occasions.

ugurarpaci on 7 Oct 2020

👍2

this just occurred to me when upgrading from EKS 1.14 -> 1.15 and CNI from 1.6.0 to 1.7.5

no matter what we did, 1.7.5 would not put nodes into a ready state. Our solution (for now) was to revert the daemonset back to 1.6.0.

End state: cluster upgraded to 1.15.11 but AWS CNI is still at 1.6.0

wmcnamee-tunein on 21 Oct 2020

Hi All,

I am still facing the issue where i am trying to update from 1.14 to 1.15. I am doing the upgrade process from AWS Console.

The cluster version upgraded successfully but for nodes i am seeing the same error runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Any help on how to workaround this would be really great, Thanks.

aksharj on 4 Dec 2020

Hi @aksharj

Can you please try the suggestion by @max-rocket-internet? Please see this - https://github.com/aws/amazon-vpc-cni-k8s/issues/284#issuecomment-601987503

Thank you!

jayanthvn on 4 Dec 2020

Leaving this here as this issue was the first result on Google.

The problem for me was that my kube-proxy daemonset was using the --resource-container flag, which was removed on Kubernetes 1.16, resulting in this "cni config uninitialized" error and nodes getting stuck in the NotReady state.

I had to manually edit this daemonset and remove the flag ($ kubectl edit ds kube-proxy -n kube-system).

For reference, this is the daemonset command I'm using now, with kube-proxy 1.16.8:
      - command:
        - /bin/sh
        - -c
        - kube-proxy --oom-score-adj=-998 --master=https://MYCLUSTER.eks.amazonaws.com
          --kubeconfig=/var/lib/kube-proxy/kubeconfig --proxy-mode=iptables --v=2
          1>>/var/log/kube-proxy.log 2>&1

I tried to use this method, kube-proxy still cannot be started properly, then I refer to this tutorial https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html

The pod security policy admission controller is enabled on Amazon EKS clusters running Kubernetes version 1.13 or later. If you're upgrading your cluster to Kubernetes version 1.13 or later, ensure that the proper pod security policies are in place before you update to avoid any issues. You can check for the default policy with the following command:

then I install default pod security policy install psp
and followed What you need to do before upgrading to 1.16 in tutorial updated my kube-proxy and everything is ok!