Amazon-vpc-cni-k8s: network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady...

Created on 9 Jan 2019  Â·  38Comments  Â·  Source: aws/amazon-vpc-cni-k8s

EKS: v1.11.5
CNI: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:1.3.0
AMI: amazon-eks-node-1.11-v20181210 (ami-0a9006fb385703b54)

We are still seeing these CNI errors in pod events. e.g.

Events:
  Type     Reason           Age               From                                                Message
  ----     ------           ----              ----                                                -------
  Warning  NetworkNotReady  5s (x3 over 35s)  kubelet, ip-10-0-26-197.eu-west-1.compute.internal  network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]

I tried to run /opt/cni/bin/aws-cni-support.sh on the node with pod aws-node-hhtrt but I get this error:

[root@ip-10-0-25-4 ~]# /opt/cni/bin/aws-cni-support.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1223  100  1223    0     0   1223      0  0:00:01 --:--:--  0:00:01 1194k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   912  100   912    0     0    912      0  0:00:01 --:--:--  0:00:01  890k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   106  100   106    0     0    106      0  0:00:01 --:--:--  0:00:01  103k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    83  100    83    0     0     83      0  0:00:01 --:--:--  0:00:01 83000
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    28  100    28    0     0     28      0  0:00:01 --:--:--  0:00:01 28000
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6268  100  6268    0     0   6268      0  0:00:01 --:--:--  0:00:01 6121k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 10255: Connection refused
bug

Most helpful comment

I fixed this issue by upgrading Kubernetes components. I had the same problem in my AWS EKS cluster. So ran below commands using eksctl CLI tool.

eksctl utils update-kube-proxy --name Your_Cluster_Name --approve
eksctl utils update-aws-node --name Your_Cluster_Name --approve
eksctl utils update-coredns --name Your_Cluster_Name --approve

All 38 comments

Hitting this as well with the same setup as above.

I tried to run /opt/cni/bin/aws-cni-support.sh on the node with pod aws-node-hhtrt but I get this error:

The second one is same with https://github.com/aws/amazon-vpc-cni-k8s/issues/285

The line in the script should be updated as command -v kubectl > /dev/null && kubectl get --kubeconfig=/var/lib/kubelet/kubeconfig --raw=/api/v1/pods or something.

Still seeing this now and again:

  Warning  FailedCreatePodSandBox  7m33s                  kubelet, ip-10-0-25-88.eu-west-1.compute.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2c600b4c1a8f344393f614e04706dc428ba1467ca67cb1169674807bd830646d" network for pod "ingress02-nginx-ingress-controller-dr84q": NetworkPlugin cni failed to set up pod "ingress02-nginx-ingress-controller-dr84q_default" network: add cmd: failed to assign an IP address to container

@max-rocket-internet Hey, are you still hitting the issue of network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized?

With the newer CNI the support script should work since it addresses the change for that kubelet port.

Have you seen network: add cmd: failed to assign an IP address to container again? This went into 1.4 https://github.com/aws/amazon-vpc-cni-k8s/pull/367.

For either of these, if you have, what CNI version are you using? Thanks!

I'm on EKS 1.12.7 and CNI 1.3.3, and this error actually happened to most of my nodes for about 10 minutes, and resolved itself magically (seemingly). It was right after I re-deployed my ASGs through CloudFormation.
I'm using the latest AMI as of this date.

@tiffanyfay Do you have any insight on why this could happen?

Recording this here in case it helps others.

I had a MutatingWebhookConfiguration hanging around that was no longer relevant and there were no pods available to service it. This was stopping nodes from becoming Ready. The kubelet logs and describe node messages had the exact same error as recorded here.

In my case, running kubectl delete MutatingWebhookConfiguration <name> and then restarting one of the kubelets caused all nodes to become healthy/ready.

I also had a similar issue today with EKS 1.12 and CNI plugin version 1.4.1.
I also got KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized on a few nodes for about 10-20 minutes, then it worked again.

I didn't find anymore debugging info and the nodes were replaced by the cluster auto-scaler. Is there anything I should look out for if this happens again?

I have been seeing this problem on pods that start immediately after the kube node comes up. If I delete the pods and have them "try again" they get their IPs and there's no warnings. Could this be solved through a node readiness change?

Just had the same issue with 1.11.9. The cni networking failed on one of two new nodes so the failed node never joined the cluster. A reboot from the AWS Console got it working

the first two warnings I see are

Jul 26 08:18:06 ip-10-2-118-4.ap-southeast-2.compute.internal kubelet[4537]: W0726 08:18:06.517655    4537 cni.go:172] Unable to update cni config: No networks found in /etc/
Jul 26 08:18:06 ip-10-2-118-4.ap-southeast-2.compute.internal kubelet[4537]: W0726 08:18:06.521509    4537 cni.go:172] Unable to update cni config: No networks found in /etc/

Just had the same issue with 1.11.9. The cni networking failed on one of two new nodes so the failed node never joined the cluster. A reboot from the AWS Console got it working

This is the workaround we use as well.

Our environment where we ran into it:

k8s: Kubernetes v1.13.11-eks-5876d6
cni plugin: amazon-k8s-cni:v1.5.3

This has happened only a couple times over half a year (so on older versions too), so it's difficult for us to reproduce.

EKS: v1.14.7-eks-1861c5
CNI: amazon-k8s-cni:v1.5.3
AMI: amazon-eks-node-1.14-v20190927 (ami-0e21bc066a9dbabfa)

Same problem on multiple EKS cluster. New VMs cannot join the cluster.

Kubelet error on the nodes:

Oct 29 07:29:27 ip-10-1-21-123.eu-central-1.compute.internal kubelet[3727]: W1029 07:29:27.735403    3727 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Oct 29 07:29:28 ip-10-1-21-123.eu-central-1.compute.internal kubelet[3727]: E1029 07:29:28.262822    3727 kubelet.go:2172] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Events:

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

@temal- Thanks for the report. If the cni binary and config file is missing, ipamd must have failed to start correctly on the new node. There are a few possible options. Either the calls to the EC2 control plane got throttled and timed out, or there are no more ENIs or IPs available in the subnet. If you could get the logfiles from ipamd on a node that has this issue it would be extremely helpful.

(A comprehensive log collector script: amazon-eks-ami/log-collector-script)

@mogren Thanks for the quick reply.
I think it was related to an ongoing linkerd installation. Sadly, I couldn't reproduce the error afterwards (which is strange, because it happened on two different clusters) and therefore wasn't able to run the collector script. If the issue appears again, I'll come back here with more information.

We are trying to create a new cluster using eksctl and face the same error . The cluster is created successfully but the nodes are not become ready . Detail is below.

  • EKS: version 1.14
  • CNI: amazon-k8s-cni:v1.5.3
  • AMI: amazon-eks-node-1.14-v20190927 (ami-02e124a380df41614)

  • create cluster in the existing vpc and subnets which have sufficient ips.

  • Kubelet error on the nodes ↓:

W1101 03:31:48.212631    3705 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
E1101 03:31:48.430668    3705 kubelet.go:2172] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
  • ↓ exec sudo bash eks-log-collector.sh on the node. but all log files in the ipamd directory are empty.

image

We are trying to create a new cluster using eksctl and face the same error . The cluster is created successfully but the nodes are not become ready . Detail is below.

  • EKS: version 1.14
  • CNI: amazon-k8s-cni:v1.5.3
  • AMI: amazon-eks-node-1.14-v20190927 (ami-02e124a380df41614)
  • create cluster in the existing vpc and subnets which have sufficient ips.
  • Kubelet error on the nodes ↓:
W1101 03:31:48.212631    3705 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
E1101 03:31:48.430668    3705 kubelet.go:2172] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
  • ↓ exec sudo bash eks-log-collector.sh on the node. but all log files in the ipamd directory are empty.

image

EKS : 1.13
CNI: amazon-k8s-cni:v1.5.3
AMI: ami-0619d38218e46ef86

Got the same issue while updating the EKS cluster.
Freshly created worker nodes can't become Ready state.
Rollback fromamazon-k8s-cni:v1.5.3 to amazon-k8s-cni:v1.5.1 resolved Issue.

UPD:
My main issue is about the mess with SG rules between worker_node group and ControlPlane.
After updating SG rules everything looks fine with CNI 1.5.1 and 1.5.3.
Guys don't forget to check and edit ControlPlane SG Inbound and Outbound rules.
Inbound: 443 port for worker_nodes SG
Outbound: 443 and 1025 - 65535 ports for worker_nodes SG

But really strange that with not all needed SG rules in ControlPlane SG with CNI 1.5.1 new worker nodes are become Ready state.

  • Create cluster , and downgrade v1.5.3 to v1.5.1 ( kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/ffaf737145ab3262b7afd0ddbdf613a2174f30dd/config/v1.5/aws-k8s-cni.yaml ), but the issue has been not resolved .
  • /etc/cni/ directory does not exists in the node.

Had the same issue as @hardcorexcat, rollback to v1.5.1 resolved the issue too, and now after a while upgrading back to v1.5.3 works too with new nodes. Clusters are created with terraform-aws-eks.

Hi all,

Here is my case:

Environment

EKS: v1.14
CNI: v1.5.3
AMI: ami-082bb518441d3954c

I just created a fresh EKS cluster with 2 worker nodes that join the cluster but with a NotReady status. After downgrading the CNI version from v1.5.3 to v1.5.1, workers get Ready status but when checking the subnet's IP that should be assigned to the workers there is only one IP.

Regards,
Josemi.

Hi, I am hitting the same issue with eks cluster created by terraform.

Just in case someone comes across this who is using a g4dn family instance on AWS. I was stuck on this for a while because the version of the CNI plugin I was using didn't support that family. After upgrading the CNI plugin it worked. https://docs.aws.amazon.com/eks/latest/userguide/cni-upgrades.html

For the past few days I've been experimenting with EKS cluster creation. I'm using terraform, actually a terraform module similar to the popular community module.
What I've observed:

Creating clusters below version 1.14 have no problems with worker nodes being in a "Ready" state. I'm using the latest CNI version: amazon-k8s-cni:v1.5.5
BUT,
No matter what I try when creating 1.14 version clusters, the worker nodes are in the "NotReady" state even though I've applied the aws-auth-cm.yaml configmap and the latest CNI version. Upon closer look (kubectl describe node <node_name>) I see an error that the CNI is uninitialized, also when I look at the running pods (kubectl get pods -n kube-system) I can see the core-dns pods being in a "pending" state and the aws-node pods crashing every few seconds.
I've then taken some steps to see if I could fix it:
a) downgraded the CNI version to 1.5.3 - this resulted in nodes getting to "Ready" state but this didn't fix the problem, the core-dns pods were now in "ContainerCreating" status constantly and aws-node pods had the same behaviour. Upgrading the CNI back to 1.5.5 didn't change anything.
b) Next what I tried was to create a 1.13 cluster first with nodes using a 1.14 kubernetes AMI. The nodes didn't have any problems joining the cluster and were ready. I then upgraded the cluster version and this resulted in a working 1.14 cluster with the nodes joined and being ready. - HOWEVER, if I increased the number of nodes in an auto scaling group, the new nodes had the same old problems of not being ready no matter what I tried.

To sum up, I've decided to use a 1.13 cluster in which I see no problems with nodes using a 1.14 AMI in hopes of fixing this problem in the near future.

Epilogue: I'm using a full 1.13 version cluster because every once in a while a worker node would briefly become "NotReady" and then after a few seconds revert to Ready. Very strange behaviour.

I have experienced the same as @Erokos. With 1.13 works, with 1.14 nodes fail to get to ready.

I dont think the issue is related to AWS VPC CNI, because I tried replacing it with Calico and got same problem: cni pod (aws-node or calico-node) cannot connect to 10.100.0.1 which is kubernetes service clusterip.

Coming from AWS support:
It is possible that issue is caused by changes to security group requirements for worker nodes [1] introduced in EKS platform v3 [2]

Another possible cause is my old AWS provider. I use 1.60.0.

Hope this helps

[1] https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html
[2] https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html

Hi @ppaepam, is this still an issue?

I am also having this issue.

tried add worker node via Cloud formation.

I fixed this issue by upgrading Kubernetes components. I had the same problem in my AWS EKS cluster. So ran below commands using eksctl CLI tool.

eksctl utils update-kube-proxy --name Your_Cluster_Name --approve
eksctl utils update-aws-node --name Your_Cluster_Name --approve
eksctl utils update-coredns --name Your_Cluster_Name --approve

This issue contains a mix of CNI versions and EKS cluster versions. I think @ppaepam and @SarasaGunawardhana are both right, and if anyone has similar issues please open a new issue to track that specific case.

I experienced this issue after updating EKS to version 1.16 and @SarasaGunawardhana commands did the trick for me.

@mlachmish also struggeling with it. Thx for the confirmation :)

Leaving this here as this issue was the first result on Google.

The problem for me was that my kube-proxy daemonset was using the --resource-container flag, which was removed on Kubernetes 1.16, resulting in this "cni config uninitialized" error and nodes getting stuck in the NotReady state.

I had to manually edit this daemonset and remove the flag ($ kubectl edit ds kube-proxy -n kube-system).

For reference, this is the daemonset command I'm using now, with kube-proxy 1.16.8:

      - command:
        - /bin/sh
        - -c
        - kube-proxy --oom-score-adj=-998 --master=https://MYCLUSTER.eks.amazonaws.com
          --kubeconfig=/var/lib/kube-proxy/kubeconfig --proxy-mode=iptables --v=2
          1>>/var/log/kube-proxy.log 2>&1

Thankyou @SarasaGunawardhana, This has just worked for me

Coming from AWS support:
It is possible that issue is caused by changes to security group requirements for worker nodes [1] introduced in EKS platform v3 [2]

Another possible cause is my old AWS provider. I use 1.60.0.

Hope this helps

[1] https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html
[2] https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html

Just to verify, I've recently created a 1.15 cluster with an additional security group for the EKS control plane and have had no problems. Before, and that worked for 1.13 version, my EKS module used to assign the default VPC security group to the EKS cluster control plane.
Thanks to all of you.

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

These logs still occurs on some occasions.

this just occurred to me when upgrading from EKS 1.14 -> 1.15 and CNI from 1.6.0 to 1.7.5

no matter what we did, 1.7.5 would not put nodes into a ready state. Our solution (for now) was to revert the daemonset back to 1.6.0.

End state: cluster upgraded to 1.15.11 but AWS CNI is still at 1.6.0

Hi All,

I am still facing the issue where i am trying to update from 1.14 to 1.15. I am doing the upgrade process from AWS Console.

The cluster version upgraded successfully but for nodes i am seeing the same error runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Any help on how to workaround this would be really great, Thanks.

Hi @aksharj

Can you please try the suggestion by @max-rocket-internet? Please see this - https://github.com/aws/amazon-vpc-cni-k8s/issues/284#issuecomment-601987503

Thank you!

Leaving this here as this issue was the first result on Google.

The problem for me was that my kube-proxy daemonset was using the --resource-container flag, which was removed on Kubernetes 1.16, resulting in this "cni config uninitialized" error and nodes getting stuck in the NotReady state.

I had to manually edit this daemonset and remove the flag ($ kubectl edit ds kube-proxy -n kube-system).

For reference, this is the daemonset command I'm using now, with kube-proxy 1.16.8:

      - command:
        - /bin/sh
        - -c
        - kube-proxy --oom-score-adj=-998 --master=https://MYCLUSTER.eks.amazonaws.com
          --kubeconfig=/var/lib/kube-proxy/kubeconfig --proxy-mode=iptables --v=2
          1>>/var/log/kube-proxy.log 2>&1

I tried to use this method, kube-proxy still cannot be started properly, then I refer to this tutorial https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html

The pod security policy admission controller is enabled on Amazon EKS clusters running Kubernetes version 1.13 or later. If you're upgrading your cluster to Kubernetes version 1.13 or later, ensure that the proper pod security policies are in place before you update to avoid any issues. You can check for the default policy with the following command:

then I install default pod security policy install psp
and followed What you need to do before upgrading to 1.16 in tutorial updated my kube-proxy and everything is ok!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

caleygoff-invitae picture caleygoff-invitae  Â·  4Comments

tirumerla picture tirumerla  Â·  5Comments

rudoi picture rudoi  Â·  4Comments

rkatti picture rkatti  Â·  4Comments

alok87 picture alok87  Â·  5Comments