Amazon-vpc-cni-k8s: Error: NetworkPluginNotReady. cni config uninitialized

Created on 4 Nov 2020  Â·  19Comments  Â·  Source: aws/amazon-vpc-cni-k8s

What happened:

Error:

Ready            False   Wed, 04 Nov 2020 10:56:25 +0000   Wed, 04 Nov 2020 10:48:23 +0000   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Yesterday evening I set ASG to zero.
This morning I set ASG to 4.

kubectl get nodes reports nodes as NotReady

kubectl describe node REDACTED
Name:               REDACTED
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=t3.large
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=REDACTED
                    failure-domain.beta.kubernetes.io/zone=REDACTED
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=REDACTED
                    kubernetes.io/os=linux
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 04 Nov 2020 10:48:23 +0000
Taints:             node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 04 Nov 2020 10:56:25 +0000   Wed, 04 Nov 2020 10:48:23 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 04 Nov 2020 10:56:25 +0000   Wed, 04 Nov 2020 10:48:23 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 04 Nov 2020 10:56:25 +0000   Wed, 04 Nov 2020 10:48:23 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 04 Nov 2020 10:56:25 +0000   Wed, 04 Nov 2020 10:48:23 +0000   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
  InternalIP:   REDACTED
  ExternalIP:   REDACTED
  Hostname:     REDACTED.compute.internal
  InternalDNS:  REDACTED.compute.internal
  ExternalDNS:  REDACTED.compute.amazonaws.com
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           20959212Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      8063660Ki
  pods:                        35
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1930m
  ephemeral-storage:           18242267924
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7305900Ki
  pods:                        35
System Info:
  Machine ID:                 REDACTED
  System UUID:                REDACTED
  Boot ID:                    REDACTED
  Kernel Version:             4.14.198-152.320.amzn2.x86_64
  OS Image:                   Amazon Linux 2
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.6
  Kubelet Version:            v1.15.11-eks-bf8eea
  Kube-Proxy Version:         v1.15.11-eks-bf8eea
ProviderID:                   aws:///REDACTED/i-REDACTED
Non-terminated Pods:          (0 in total)
  Namespace                   Name    CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----    ------------  ----------  ---------------  -------------  ---
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests  Limits
  --------                    --------  ------
  cpu                         0 (0%)    0 (0%)
  memory                      0 (0%)    0 (0%)
  ephemeral-storage           0 (0%)    0 (0%)
  hugepages-1Gi               0 (0%)    0 (0%)
  hugepages-2Mi               0 (0%)    0 (0%)
  attachable-volumes-aws-ebs  0         0
Events:
  Type    Reason                   Age                    From                                              Message
  ----    ------                   ----                   ----                                              -------
  Normal  Starting                 8m48s                  kubelet, REDACTED.compute.internal  Starting kubelet.
  Normal  NodeHasSufficientMemory  8m48s (x2 over 8m48s)  kubelet, REDACTED.compute.internal  Node REDACTED.compute.internal status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    8m48s (x2 over 8m48s)  kubelet, REDACTED.compute.internal  Node REDACTED.compute.internal status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     8m48s (x2 over 8m48s)  kubelet, REDACTED.compute.internal  Node REDACTED.compute.internal status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  8m48s                  kubelet, REDACTED.compute.internal  Updated Node Allocatable limit across pods

CNI was running: amazon-k8s-cni:v1.6.3
After seeing this error I upgraded it to: amazon-k8s-cni-init:v1.7.5 amazon-k8s-cni:v1.7.5

CNI Log attached. eks_i-REDACTED_2020-11-04_1111-UTC_0.6.2_REDACTED.zip

What you expected to happen: Nodes to join EKS cluster

How to reproduce it (as minimally and precisely as possible): That's difficult to answer.

Anything else we need to know?:

  • This is happening on amazon-eks-node-1.15-v20201007 ami-0af730da10ac8b0b7 and amazon-eks-node-1.15-v20200814 ami-04cc6ec46d6dbc4fa
  • Yesterday I installed Kubeflow on this cluster. Not that it matter as I installed kubeflow on another cluster as well and it's fine.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-065dce", GitCommit:"065dcecfcd2a91bd68a17ee0b5e895088430bd05", GitTreeState:"clean", BuildDate:"2020-07-16T01:44:47Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
  • CNI Version
amazon-k8s-cni:v1.6.3
amazon-k8s-cni-init:v1.7.5
amazon-k8s-cni:v1.7.5
  • OS (e.g: cat /etc/os-release):
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
  • Kernel (e.g. uname -a): Linux REDACTED.compute.internal 4.14.198-152.320.amzn2.x86_64 #1 SMP Wed Sep 23 23:57:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
bug

All 19 comments

Now I'm thinkin that Kubeflow might have something to do with it https://github.com/kubeflow/kubeflow/issues/5247.
I logged this with Kubeflow https://github.com/kubeflow/kubeflow/issues/5381

Hi @tretos53

I see the IPAMD logs are not created so most likely IPAMD hasnt even started. Also 10-aws.conflist config (https://github.com/aws/amazon-vpc-cni-k8s/blob/master/scripts/entrypoint.sh#L114) file is not in the logs so looks like even copying of the config file has failed. Can you please open a support case so that we can further debug the issues. Also can you please provide kubectl logs for aws-node to verify if IPAMD started.

Thank you!

Hi,

None of the pods start so I can't get any logs. I only attached one node to make things simple but all the nodes fail to get Ready.

I will try to open a support call.

kubectl get pods -A
No resources found

kubectl get nodes
NAME                                      STATUS     ROLES    AGE   VERSION
ip-xx-x-x-xx.x-xxxx-x.compute.internal   NotReady   <none>   22h   v1.15.11-eks-bf8eea

I am also experiencing this issue without kubeflow. Notably, this happening on new NodeGroups in a cluster, and when trying to create a new cluster entirely via eksctl. Also, while I found the same log message in our cluster logs, network plugin is not ready: cni config uninitialized, I found another message which may provide more insight: network plugin is not ready: cni config uninitialized, CSINode is not yet initialized, missing node capacity for resources: ephemeral-storage. This is interesting because the new nodegroups were created using the same configuration as all our other nodegroups. I have also opened a support ticket with AWS.

Hi @jayanthvn,

Can I do anything to fix this? I only have billing support, no technical support.

Hi @tretos53

Can you please share kubectl logs of aws-node (kubectl logs aws-node-9hrfc -n kube-system)? With the logs you shared I see 10-aws.conflist and ipamd.log file is not created and kubelet seems to be complaining about 10-aws.conflist file not found. Kubectl logs should show if IPAMD failed to start.

Nov 04 10:48:24 REDACTED.compute.internal kubelet[4073]: I1104 10:48:24.091283    4073 reconciler.go:150] Reconciler: start to sync state
Nov 04 10:48:28 REDACTED.compute.internal kubelet[4073]: W1104 10:48:28.723750    4073 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 04 10:48:28 REDACTED.compute.internal kubelet[4073]: E1104 10:48:28.967433    4073 kubelet.go:2179] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Nov 04 10:48:33 REDACTED.compute.internal kubelet[4073]: W1104 10:48:33.723969    4073 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 04 10:48:33 REDACTED.compute.internal kubelet[4073]: E1104 10:48:33.979992    4073 kubelet.go:2179] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Nov 04 10:48:38 REDACTED.compute.internal kubelet[4073]: W1104 10:48:38.724205    4073 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 04 10:48:38 REDACTED.compute.internal kubelet[4073]: E1104 10:48:38.990025    4073 kubelet.go:2179] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Nov 04 10:48:43 REDACTED.compute.internal kubelet[4073]: W1104 10:48:43.724379    4073 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d

Update: the source of my issue was eksctl failed to assign any of the default permissions to the nodegroups IAM role, when it previously had. It does not sound like out issues are related anymore

Hi @jayanthvn

No pods are running as all nodes are NotReady. I can't get any logs. I can enable logs on EKS cluster. Will that help?

➜ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-REDACTED. REDACTED.compute.internal NotReady 11m v1.15.11-eks-065dce
ip-REDACTED. REDACTED.compute.internal NotReady 11m v1.15.11-eks-065dce
ip-REDACTED. REDACTED.compute.internal NotReady 4d22h v1.15.11-eks-bf8eea
ip-REDACTED. REDACTED.compute.internal NotReady 11m v1.15.11-eks-065dce
ip-REDACTED. REDACTED.compute.internal NotReady 11m v1.15.11-eks-065dce

➜ kubectl get pods -A
No resources found

Hi @tretos53

Can you please email me ([email protected]) you cluster ARN? I can see if I can get any logs. Yes enabling logs on EKS cluster will definitely help.

Thank you.

In case this helps anyone, we had similar launching new clusters when aws-node began using a new cni v1.7.5 (some time late last week). We use our own pod security policies and it seems 1.7.5 requires NET_ADMIN capabilities. We didn't need this with cni 1.6.3.

@gillbee You're right, starting v1.7.* we removed privileged to true and updated securityContext with just NET_ADMIN capability (https://github.com/aws/amazon-vpc-cni-k8s/blob/master/config/v1.7/aws-k8s-cni.yaml#L177-L180) which wasn't the case with v1.6.* (https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.6.3/config/v1.6/aws-k8s-cni.yaml#L132-L133) This makes aws-node pod to run with lesser privilege than before.

Hi @tretos53

I was able to check your cluster from the ARN you provided and none of the pods [aws-node, kube-proxy, core-dns] are running as you mentioned. This doesn't look like a CNI issue, I am following up with internal team. Will update you once I get some info.

Thanks.

Hi @tretos53

Daemon set failed to create on your cluster -

Events:
  Type     Reason        Age                  From                  Message
  ----     ------        ----                 ----                  -------
  Warning  FailedCreate  3m (x1353 over 15d)  daemonset-controller  Error creating: Internal error occurred: 
failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": 
Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?
timeout=30s: no endpoints available for service "kfserving-webhook-server-service"

Hi,
ok, this is kubeflow deamon set, that doesn't explain why none of the nodes can join the cluster.
This is probably the reason why this deamon can't be created.

All nodes are in NoReady state with below error:

Ready            False   Thu, 19 Nov 2020 16:38:40 +0000   Mon, 09 Nov 2020 08:58:49 +0000   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Hi @tretos53

The error is from the aws-node DS. Looks like there is a mutating webhook which is not available. Can you please try deleting the webhook? Nodes have joined the cluster but they won't be ready until CNI is ready.

kubectl describe ds/aws-node -n kube-system
Name:           aws-node
Selector:       k8s-app=aws-node
Node-Selector:  <none>
Labels:         k8s-app=aws-node
Annotations:    deprecated.daemonset.template.generation: 2
                kubectl.kubernetes.io/last-applied-configuration:
                  {"apiVersion":"apps/v1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"k8s-app":"aws-node"},"name":"aws-node","namespace":"kub...
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           k8s-app=aws-node
  Service Account:  aws-node
  Init Containers:
   aws-vpc-cni-init:
    Image:      602401143452.dkr.ecr.eu-west-2.amazonaws.com/amazon-k8s-cni-init:v1.7.5
    Port:       <none>
    Host Port:  <none>
    Environment:
      DISABLE_TCP_EARLY_DEMUX:  false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
  Containers:
   aws-node:
    Image:      602401143452.dkr.ecr.eu-west-2.amazonaws.com/amazon-k8s-cni:v1.7.5
    Port:       61678/TCP
    Host Port:  61678/TCP
    Requests:
      cpu:      10m
    Liveness:   exec [/app/grpc-health-probe -addr=:50051] delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:  exec [/app/grpc-health-probe -addr=:50051] delay=1s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ADDITIONAL_ENI_TAGS:                 {}
      AWS_VPC_CNI_NODE_PORT_SUPPORT:       true
      AWS_VPC_ENI_MTU:                     9001
      AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER:  false
      AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG:  false
      AWS_VPC_K8S_CNI_EXTERNALSNAT:        false
      AWS_VPC_K8S_CNI_LOGLEVEL:            DEBUG
      AWS_VPC_K8S_CNI_LOG_FILE:            /host/var/log/aws-routed-eni/ipamd.log
      AWS_VPC_K8S_CNI_RANDOMIZESNAT:       prng
      AWS_VPC_K8S_CNI_VETHPREFIX:          eni
      AWS_VPC_K8S_PLUGIN_LOG_FILE:         /var/log/aws-routed-eni/plugin.log
      AWS_VPC_K8S_PLUGIN_LOG_LEVEL:        DEBUG
      DISABLE_INTROSPECTION:               false
      DISABLE_METRICS:                     false
      ENABLE_POD_ENI:                      false
      MY_NODE_NAME:                         (v1:spec.nodeName)
      WARM_ENI_TARGET:                     1
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /host/var/log/aws-routed-eni from log-dir (rw)
      /run/xtables.lock from xtables-lock (rw)
      /var/run/aws-node from run-dir (rw)
      /var/run/dockershim.sock from dockershim (rw)
  Volumes:
   cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
   cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
   dockershim:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/dockershim.sock
    HostPathType:
   xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:
   log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/aws-routed-eni
    HostPathType:  DirectoryOrCreate
   run-dir:
    Type:               HostPath (bare host directory volume)
    Path:               /var/run/aws-node
    HostPathType:       DirectoryOrCreate
  Priority Class Name:  system-node-critical
Events:
  Type     Reason        Age                  From                  Message
  ----     ------        ----                 ----                  -------
  Warning  FailedCreate  3m (x1353 over 15d)  daemonset-controller  Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: no endpoints available for service "kfserving-webhook-server-service"

Thank you. I'll check.

Adam

Hi there,

We have the same issue with our brand new Private EKS cluster (v 1.18)
A node does not come in the Ready state due to
runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialize

aws-node pod is in Running state, but constantly gets restarted eah 2-3 minutes with the following errors
Successfully assigned kube-system/aws-node-85279 to ip-10-98-77-41.ec2.internal Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni-init:v1.7.5-eksbuild.1" Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni-init:v1.7.5-eksbuild.1" Created container aws-vpc-cni-init Started container aws-vpc-cni-init Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.7.5-eksbuild.1" Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.7.5-eksbuild.1" Created container aws-node Started container aws-node Readiness probe failed: {"level":"info","ts":"2020-11-21T10:29:30.590Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"} Readiness probe failed: {"level":"info","ts":"2020-11-21T10:29:40.602Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"} Readiness probe failed: {"level":"info","ts":"2020-11-21T10:29:50.591Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"} Readiness probe failed: {"level":"info","ts":"2020-11-21T10:30:00.597Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}

Container logs for amazon-k8s-cni
{"level":"info","ts":"2020-11-21T10:29:25.981Z","caller":"entrypoint.sh","msg":"Install CNI binary.."} {"level":"info","ts":"2020-11-21T10:29:25.998Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "} {"level":"info","ts":"2020-11-21T10:29:26.000Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}

Kubelet logs show that:
Nov 21 10:34:38 ip-10-98-77-41.ec2.internal kubelet[3820]: W1121 10:34:38.585425 3820 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d Nov 21 10:34:40 ip-10-98-77-41.ec2.internal kubelet[3820]: E1121 10:34:40.142766 3820 kubelet.go:2195] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Here is the log collection:
eks_i-0b47c308f2650abb6_2020-11-21_1032-UTC_0.6.2.tar.gz

I've found that if I manually create the file _/etc/cni/net.d/10-aws.conflist_
with the following config:
{ "cniVersion": "0.3.1", "name": "aws-cni", "plugins": [ { "name": "aws-cni", "type": "aws-cni", "vethPrefix": "eni", "mtu": "9001", "pluginLogFile": "/var/log/aws-routed-eni/plugin.log", "pluginLogLevel": "Debug" }, { "type": "portmap", "capabilities": {"portMappings": true}, "snat": true } ] }
The node immediately goes UP.

What's the reason that this file gets not created automatically?

  • I don't see any error in the node cloud-init-output.log
  • NodeGroup role has AmazonEKS_CNI_Policy policy.

First I was thinking that it relates to the Custom CNI settings, but now I've created the new cluster with just three subnets and
have done nothing related to Custom CNI networking (no changes to _aws-node_ DaemonSet)

Hi there,

We have the same issue with our brand new Private EKS cluster (v 1.18)
A node does not come in the Ready state due to
runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialize
...

I have found the reason for my failure.
For anyone who gets faced the same issue:

I used _eksctl_ and within the input file, I had iam.withOIDC=true
After I have recreated a cluster without this setting, everything started to work correctly.

Seems that according to this https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-cni-walkthrough.html
it is required to add AWS_ROLE_ARN to the aws-node daemonset
However, I had no chance to check it since it is not required in my case at this point.

I deleted Kubeflow. That fixed it.
Will probably break again when I deploy Kubeflow again...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rudoi picture rudoi  Â·  4Comments

marcincuber picture marcincuber  Â·  4Comments

caleygoff-invitae picture caleygoff-invitae  Â·  4Comments

SimplySeth picture SimplySeth  Â·  4Comments

rkatti picture rkatti  Â·  4Comments