Amazon-vpc-cni-k8s: Cannot scale new nodes with CNI version 1.5.3

Created on 25 Sep 2019 · 17Comments · Source: aws/amazon-vpc-cni-k8s

When trying to scale new nodes after creating an EKS cluster, the new nodes will not leave the NotReady state. The aws-node pod is in a CrashLoopBackOff state.

This seems to be from the file 10-aws.conflist not being placed on the node. It looks like the install script had a line removed that added the file, and it was then moved to main.go. The problem is that it never reaches the code to copy the file

bash-4.2# cat /host/var/log/aws-routed-eni/ipamd.log.2019-09-25-20
2019-09-25T20:01:32.800Z [INFO] Starting L-IPAMD v1.5.3  ...
2019-09-25T20:02:02.803Z [INFO] Testing communication with server
2019-09-25T20:02:32.803Z [INFO] Failed to communicate with K8S Server. Please check instance security groups or http proxy setting
2019-09-25T20:02:32.803Z [ERROR]        Failed to create client: error communicating with apiserver: Get https://172.20.0.1:443/version?timeout=32s: dial tcp 172.20.0.1:443: i/o timeout
2019-09-25T20:02:33.901Z [INFO] Starting L-IPAMD v1.5.3  ...
2019-09-25T20:03:03.903Z [INFO] Testing communication with server

I apologize but I wasn't able to dig in far enough to know why it needs the 10-aws.conflist in /etc/cni/net.d first, but that does seem to be the issue.

bug prioritP1

Source

cgkades

👍6

Most helpful comment

To expand on what @cgkades is saying...

We've found is that k8sapi.CreateKubeClient() fails with Failed to create client: error communicating with apiserver: Get https://172.20.0.1:443/version?timeout=32s: dial tcp 172.20.0.1:443: i/o timeout (https://github.com/aws/amazon-vpc-cni-k8s/blob/release-1.5.3/main.go#L47-L51).

As such, it never gets to copy over the /etc/cni/10-aws.conflist. If we manually create the 10-aws.conflist all is right and aws-node starts up.

This behavior isn't in 1.5.1 and appears to be introduced in commit: 7fd7c93b5b3be2495b2b864d3a992840d4942a52

matthsmi on 25 Sep 2019

👍3

All 17 comments

To expand on what @cgkades is saying...

As such, it never gets to copy over the /etc/cni/10-aws.conflist. If we manually create the 10-aws.conflist all is right and aws-node starts up.

This behavior isn't in 1.5.1 and appears to be introduced in commit: 7fd7c93b5b3be2495b2b864d3a992840d4942a52

matthsmi on 25 Sep 2019

👍3

@matthsmi @cgkades Thanks for opening a new ticket. (#283 is not related to this case)

The change to not copy the config until ipamd is running happened in commit f9acdeb and was done to avoid scheduling pods to that node until we had ENIs and IPs available. If ipamd can not talk to the kubernetes API server, it will not copy the plugin binary or the config file, since that is how you update the node to be in the Ready state.

The main issue is that your nodes cannot reach your API server. If this happens again, could you please run the log-collector-script and send it through support, together with cluster ARN?

mogren on 26 Sep 2019

@mogren this happens with every cluster we launch, i'll work with @matthsmi to see if we can get the logs you need

cgkades on 26 Sep 2019

Having the same issue as @cgkades .
I've informed AWS Support and attached the results of the log-collector-script as @mogren requested.

marcel4303 on 8 Oct 2019

Thanks for the update, @marcel4303. I'm glad that we're not the only ones experiencing this, and thanks for opening up a ticket with AWS support, too.

We've also got an internal high priority ticket open with AWS. In the mean time, we've been using aws cni 1.5.1. We'll be sure to update this issue with relevant details as we hear back internally.

matthsmi on 8 Oct 2019

👍1

Looking at the reports I have gotten so far, the main issue seems to be the DNS call where the CNI tries to reach the API server. The first two lines in the ipamd logs..

2019-09-25T20:01:32.800Z [INFO] Starting L-IPAMD v1.5.3  ...
2019-09-25T20:02:02.803Z [INFO] Testing communication with server

...usually says it took around 30s to do kubeClient := k8sclient.GetKubeClient() on nodes that have issues. That looks a lot like a 30s timeout. That code is from the operator-framework and it tries to read the KUBERNETES_SERVICE_HOST environment variable, then if that fails, it does a net.LookupHost("kubernetes.default.svc"). For some reason, on these nodes, that takes 30s to time out.

Is there some custom DNS setup on these nodes? What does /etc/resolv.conf look like?

mogren on 8 Oct 2019

Your welcome.
I've tried also to downgrade the cni version to 1.5.1. This results in a ready state node(s) however the pods then encounter issues and remain in the state of ContainerCreating.

Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "f9b30e13de653234be1dec6d91a4fa7fcd9b9aaddfec191f6d8b839d2722b0c4" network for pod "coffee-7dbb5795f6-ckvrc": NetworkPlugin cni failed to set up pod "coffee-7dbb5795f6-ckvrc_cafe" network: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused", failed to clean up sandbox container "f9b30e13de653234be1dec6d91a4fa7fcd9b9aaddfec191f6d8b839d2722b0c4" network for pod "coffee-7dbb5795f6-ckvrc": NetworkPlugin cni failed to teardown pod "coffee-7dbb5795f6-ckvrc_cafe" network: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"]
  Normal   SandboxChanged          5s (x93 over 20m)  kubelet, ip-172-31-30-47.eu-central-1.compute.internal  Pod sandbox changed, it will be killed and re-created.

marcel4303 on 8 Oct 2019

FYI, also had this issue in CNI 1.5.4.
Downgrading to 1.5.1 before scaling nodes resolved this for us.

blakelead on 10 Oct 2019

Does this happen on worker nodes using 1.13.7 kubelet? The reason I ask is that there was an upstream bug:

If a pod's restart policy is "Never", and kubelet failed to create the PodSandbox (for whatever reason) once, the pod will be stuck at "ContainerCreating" forever.

1.13.8 and later has this fix https://github.com/kubernetes/kubernetes/issues/79398

mogren on 10 Oct 2019

i seem to be also hit by this. using kops i tear up a cluster, e.g. with kope.io/k8s-1.15-debian-stretch-amd64-hvm-ebs-2019-09-26 on kubernetesVersion: 1.16.4
the masters get up fine but with nodes its hit and miss. i also tried imageName: "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.6.0-rc5"

yannick on 4 Jan 2020

👀1

@yannick Are the pods crash looping unable to talk to the API server, or some other issue?

mogren on 21 Jan 2020

I've made some more progress with this, but today using the v1.6.0 of the aws-node CNI.

To re-state what I am seeing. When I initially spin up a cluster, and associated instances in an ASG to the eks cluster, the nodes join the cluster, aws-node starts, and they become Ready.

When I try to expand the ASG, and add new instances, these new instance are able to join the cluster, but aws-node does not start up. You can see them in the cluster via kubectl get nodes, but they are in the NotReady state.

When aws-k8s-agent tries to start up, it reaches this point: https://github.com/aws/amazon-vpc-cni-k8s/blob/release-1.6/pkg/k8sapi/discovery.go#L89

The kubeClient endpoint is: https://172.20.0.1:443, which is not yet available to the system, since no kube-proxy is running at that time. No kube-proxy will start since no CNI files are in place yet, since the aws-k8s-agent hasn't bootstrapped. If aws-k8s-agent doesn't bootstrap in a timely fashion, the container entrypoint.sh will fail out, and not copy any CNI files.

In my testing, I shuffled things around in the entrypoint.sh

diff --git a/scripts/entrypoint.sh b/scripts/entrypoint.sh
index d7dc7f50..636b02d3 100755
--- a/scripts/entrypoint.sh
+++ b/scripts/entrypoint.sh
@@ -58,16 +58,6 @@ echo -n "starting IPAM daemon in background ... "
 ./aws-k8s-agent > "$AGENT_LOG_PATH" 2>&1 &
 echo "ok."

-echo -n "checking for IPAM connectivity ... "
-
-if ! wait_for_ipam; then
-    echo " failed."
-    echo "timed out waiting for IPAM daemon to start."
-    exit 1
-fi
-
-echo "ok."
-
 echo -n "copying CNI plugin binaries and config files ... "

 cp portmap "$HOST_CNI_BIN_PATH"
@@ -84,6 +74,16 @@ if [[ -f "$HOST_CNI_CONFDIR_PATH/aws.conf" ]]; then
     rm "$HOST_CNI_CONFDIR_PATH/aws.conf"
 fi

+echo -n "checking for IPAM connectivity ... "
+
+if ! wait_for_ipam; then
+    echo " failed."
+    echo "timed out waiting for IPAM daemon to start."
+    exit 1
+fi
+
+echo "ok."
+
 # bring the aws-k8s-agent process back into the foreground
 echo "foregrounding IPAM daemon ... "
 fg %1 >/dev/null 2>&1 || { echo "failed (process terminated)" && cat "$AGENT_LOG_PATH" && exit 1; }

I lowered the wait_for_ipam until after the CNI files get copied. At this time kube-proxy starts up, aws-k8s-agent gets its act together, and things start working.

I am not sure the adverse affects of this from the IPAM perspective, but my initial testing is looking good, and I've been able to scale in and out an ASG with no issue with the modified entrypoint.sh.

I'm going to continue testing, but this is where I'm at.

Do you have any thoughts about this @mogren ?

thank you!

matthsmi on 29 Feb 2020

The kubeClient endpoint is: https://172.20.0.1:443, which is not yet available to the system, since no kube-proxy is running at that time. No kube-proxy will start since no CNI files are in place yet, since the aws-k8s-agent hasn't bootstrapped. If aws-k8s-agent doesn't bootstrap in a timely fashion, the container entrypoint.sh will fail out, and not copy any CNI files.

AFAIK, the above statement is not correct. kube-proxy does not have anything to do with the aws-node Pod's ability to contact the Kubernetes API server. kube-proxy enables routing for a Kubernetes Service object's backend Endpoints (other Pods).

The timeout waiting for IPAMd (aws-k8s-agent) to come up is 30 seconds. My suspicion is that there is a problem with DNS resolution, not kube-proxy, that is causing the IPAMd startup to take longer than 30 seconds. On startup, IPAMd attempts to communicate with the Kubernetes API server using the k8sclient.GetKubeClient() function from the [email protected] release (as @mogren notes above).

Is there some custom DNS setup on these nodes? What is the value of your KUBERNETES_SERVICE_HOST env var? What does your /etc/resolv.conf look like?

jaypipes on 2 Mar 2020

👍1

Hi @jaypipes ,

Thanks for the input. It seems I missed the comment from @mogren above.

That said, I'm still confident that this isn't a DNS resolution issue.

I had put some additional debug lines inside the k8s-agent, and had it print out the kubeconfig that gets provided from k8sclient.GetCubeConfig() (which should also be what is used for k8sclient.GetKubeClient()).

Here's the Host that aws-node is trying to use:

kubeConfig=&rest.Config{Host:"https://172.20.0.1:443 ", APIPath:"", ContentConfig:rest.ContentConfig{AcceptContentTypes:"",.... stuff removed}

There shouldn't be any DNS resolution happening there.

This also matches up to the ClusterIP that the aws-cni is trying to speak to:

% k get service
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   172.20.0.1   <none>        443/TCP   73m
%

That said here is what the aws-node is attempting to use for KUBERNETES_SERVICE_* and for DNS.

# docker inspect d356efd553b9 | jq '.[0].Config.Env'                           
[
  "AWS_VPC_K8S_CNI_LOGLEVEL=DEBUG",
 [...]
  "KUBERNETES_SERVICE_HOST=172.20.0.1",
  "KUBERNETES_SERVICE_PORT=443",
[...]
  "KUBERNETES_SERVICE_PORT_HTTPS=443",
[...]
  "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]
#

# cat /etc/resolv.conf 
nameserver 169.254.169.253
search us-west-2.compute.internal
#

If aws-node is attempting to communicate to the API Server via ClusterIP, then kube-proxy should be working.

matthsmi on 2 Mar 2020

The kubeClient endpoint is: https://172.20.0.1:443, which is not yet available to the system, since no kube-proxy is running at that time. No kube-proxy will start since no CNI files are in place yet, since the aws-k8s-agent hasn't bootstrapped.

I believe that adding tolerations to our kube-proxy daemonset that allow it to start on nodes with NotReady status fixed this error for us.

tolerations:
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"

acegrader33 on 3 Mar 2020

👍1

sorry for the delayed response @matthsmi! You are of course correct. We suspect that this is the same issue as #872 and #865, where we believe the root cause of the problem is, as you allude to, the aws-node Daemonset failing to start when kube-proxy has not (yet) fully come online.

We are going to add a configurable timeout to the entrypoint.sh script and bump the default timeout to a larger than 30 second default. We hope this will address the issues.

jaypipes on 18 Mar 2020

This all seems to be related to issues with kube-proxy, closing.

mogren on 8 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings