Amazon-vpc-cni-k8s: Pod connectivity problem with ENIConfig/pod-specific subnet

Created on 2 Nov 2018 · 25Comments · Source: aws/amazon-vpc-cni-k8s

Maybe related to https://github.com/aws/amazon-vpc-cni-k8s/issues/212?
We have a VPC with a primary CIDR block in 10.0.0.0/8 space and a secondary CIDR block in 100.64.0.0/10 space.
A single EKS worker node running CentOS 7 with primary IP address on a 10.x.x.x subnet and an ENIConfig annotation on the node for a 100.64.x.x subnet.
Pods running on the 100.64.x.x can communicate with pods on the same node running on the primary IP (hostNetwork: true) but cannot communicate off-node (e.g. to the control plane either directly or using the kubernetes service ClusterIP address).
I can kubectl exec into pods running on the 100.64.x.x subnet and all relevant route tables, NACLs and security groups are correctly configured.

$ kubectl --kubeconfig kubeconfig run -i --rm --tty debug --image=busybox -- sh
If you don't see a command prompt, try pressing enter.
/ # ifconfig
eth0      Link encap:Ethernet  HWaddr AE:94:60:6F:AA:2E  
          inet addr:100.64.x.x  Bcast:100.64.x.x  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:508 (508.0 B)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

/ # wget http://10.x.x.x:61678/v1/networkutils-env-settings
Connecting to 10.x.x.x:61678 (10.x.x.x:61678)
networkutils-env-set 100% |*************************************************************************************************************|   105  0:00:00 ETA
/ # wget https://10.y.y.y/
Connecting to 10.y.y.y (10.y.y.y:443)
^C

10.x.x.x is the node's primary IP, 10.y.y.y is one of the EKS control plane ENIs.

This prevents critical components like kube-dns starting

$ kubectl --kubeconfig kubeconfig logs --namespace kube-system kube-dns-d87b74b4f-f5gff kubedns
...
I1102 20:51:07.938774       1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E1102 20:51:07.940074       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://172.20.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 172.20.0.1:443: i/o timeout
...

Source

ewbankkit

👍8

Most helpful comment

@lutierigb Thanks for digging in to this.
I added code to explicitly add the primary IP on secondary ENIs in https://github.com/aws/amazon-vpc-cni-k8s/pull/271 and have verified that this works on CentOS 7 (and Amazon Linux 2 so no regression).

ewbankkit on 17 Dec 2018

🎉2

All 25 comments

@liwenwu-amazon I can provide the aws-cni-support.tar.gz via a support ticket.

ewbankkit on 2 Nov 2018

I think you might run into issue #35 . Can you try to set AWS_VPC_K8S_CNI_EXTERNALSNAT to true and see if your problem is solved? thanks
https://docs.aws.amazon.com/eks/latest/userguide/external-snat.html

liwenwu-amazon on 2 Nov 2018

I am running with AWS_VPC_K8S_CNI_EXTERNALSNAT=true:

$ kubectl --kubeconfig kubeconfig describe pod --namespace kube-system aws-node-6v94g
Name:           aws-node-6v94g
Namespace:      kube-system
Node:           ip-10-x-x-x.us-west-2.compute.internal/10.x.x.x
Start Time:     Fri, 02 Nov 2018 15:16:01 -0400
Labels:         controller-revision-hash=3496945906
                k8s-app=aws-node
                pod-template-generation=1
Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
Status:         Running
IP:             10.x.x.x
Controlled By:  DaemonSet/aws-node
Containers:
  aws-node:
    Container ID:   docker://10b1f2669628dd5cb0695160005e40576f765d46fedf1e1e19eab6d813be2a07
...
    Environment:
      AWS_VPC_K8S_CNI_LOGLEVEL:            DEBUG
      MY_NODE_NAME:                         (v1:spec.nodeName)
      WATCH_NAMESPACE:                     kube-system (v1:metadata.namespace)
      AWS_VPC_K8S_CNI_EXTERNALSNAT:        true
      AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG:  true
...

ewbankkit on 3 Nov 2018

@ewbankkit Do the Pod's subnet and the Node's subnet use same route table?

liwenwu-amazon on 4 Nov 2018

No, I have them in separate route tables - No real reason why I did it that way, just wanted to keep everything as self-contained as possible.

ewbankkit on 4 Nov 2018

Does Pods of 2 different nodes (which can NOT ping each other) uses same subnet? Also, have you checked the security groups for the Pods of these 2 nodes, are they allowed to communicate to each other?

liwenwu-amazon on 4 Nov 2018

I tried with putting the pod subnets in the same route table as the node subnets and still could not connect to the control plane from the pods. Yes, SGs are wide open.
I cannot ping pod to pod if they are on different nodes (but in the same subnet).

ewbankkit on 4 Nov 2018

@ewbankkit will it work if Pod's ENIConfig is using same security group as node?

liwenwu-amazon on 5 Nov 2018

I believe I'm running into this issue as well.

I have a small primary cidr block in the 10.0.0.0/8 range and a (disjoint) larger secondary cidr also in the 10.0.0.0/8 block. The nodes are coming up in the primary cidr block, and have eniconfigs set to use the secondary block, but with the same security groups as the node. All relevant subnets are using the same route table. I am running with AWS_VPC_K8S_CNI_EXTERNALSNAT=false, however I believe that is required given some other constraints I have.

pods can communicate with other pods on the same node, however cannot communicate with pods on other nodes or control plane using the cluster IP. In my case they can communicate with an off node proxy server (i.e. curl --proxy works, but curl https:// does not).

My thought is that some of the things I did to try to workaround #212 are allowing the off node communication to work, however it's not fully addressed and so the cluster IPs don't work between nodes. I'm not positive about that though, and could be completely off base.

lnr0626 on 5 Nov 2018

@lnr0626 , can you try pinging from one pod to another pod on different Node? Does it work?

liwenwu-amazon on 5 Nov 2018

@liwenwu-amazon Same problem if I run the pods in the same SG as the worker nodes.
Interestingly in the flow logs for the Control Plane ENIs I can see ACCEPTs on port 443 from one of the pod IPs but nothing the other way.

ewbankkit on 5 Nov 2018

@ewbankkit I few more questions:

is this a EKS cluster?
Are you using EKS optimized AMI?
Do you have ec2-net-utils installed?

liwenwu-amazon on 5 Nov 2018

@ewbankkit for debugging purpose, does it pod-to-pod ping even work if Pod ENIconfig is using exact same SG/subnt as node?

liwenwu-amazon on 5 Nov 2018

@liwenwu-amazon pinging a pod on a different node does not work.

lnr0626 on 5 Nov 2018

@lnr0626 can you provide following debugging information
Assuming you are pinging from Pod-a on Node-a to Pod-b on Node-b and both Pod-a and Pod-b are using secondary IPs from eth1 interface (where eth0 is node's primary ENI)

collect the output of kubectl describe node <Node-a>
collect the output of kubectl describe node <Node-b>
What's the Node-a's instance-id and region
What's the Node-b's instance-id and region
assuming Node-a and Node-b are using same ENIconfig,
- collect the output of kubectl describe eniconfig <pod's eniconfig>
let's collect tcpdump when you ping from Pod-a to Pod-b
- install tcpdump on node-a
- start capturing traffic on eth1 of node-a (assuming eth1 is the ENI for Pods)
- tcpdump -i eth1 -w node_a_eth1.pcap
- install tcpdump on node-b
- start capturing on on eth1 of node-b (assuming eth1 is the ENI for Pods)
- tcpdump -i eth1 -w node_b-eth1.pcap
kubectl exec -ti <pod-a> sh
ping pod-b's IP for 5 mins
collect node-a, node-b snapshot by
- running /opt/cni/bin/aws-cni-support.sh on both node
- collecting /var/log/aws-routed-eni/aws-cni-support.tar.gz
You can send these outputs to me([email protected]) or attach them to this issue

thanks

liwenwu-amazon on 5 Nov 2018

@liwenwu-amazon

Yes, an EKS cluster
The AMI is CentOS 7 100% based on the scripts at https://github.com/awslabs/amazon-eks-ami; I have verified that exactly the same issue is occurring with an Ubuntu 16.04 based AMI
No, hadn't heard of ec2-net-utils, I'll install it

ewbankkit on 5 Nov 2018

@ewbankkit Please do NOT install ec2-net-utils.
Can you run same test and provide the debug info I asked from @lnr0626 ?

thanks

liwenwu-amazon on 5 Nov 2018

After much debug to-and-fro it looks like the root cause is the inability to add a default route during IP address assignment:

[ERROR] Failed to increase pool size: failed to setup eni eni-0000000000000000 network: eni network setup: unable to add route 0.0.0.0/0 via 100.64.0.1 table 2: network is unreachable

Using the retry functionality from https://github.com/aws/amazon-vpc-cni-k8s/pull/223 doesn't resolve.
At the Linux command line the equivalent is

# ip route add default via 100.64.0.1 dev eth1 table 2
RTNETLINK answers: Network is unreachable

although the eth1 link shows as up.
My feeling is that this is caused by a configuration on our base AMI as I have success with v24 of the Amazon EKS Worker AMI based on Amazon Linux 2.

ewbankkit on 9 Nov 2018

We are running CentOS 7 as well and getting similar issues. Any help would be greatly appreciated, we are running out of options for IP space without this particular feature available. Unfortunately moving to Amazon Linux 2 really isn't an option for us at this point. It does seem suspect that this doesn't work for either CentOS 7 or Ubuntu, @liwenwu-amazon do you have any ideas or could you reach out to someone inside Amazon who maintains the Amazon Linux 2 image to see if they have any ideas?

sdavids13 on 13 Nov 2018

@sdavids13
I have installed centOS7 on a t2.medium instance. I am able to manually do following with the secondary ENI

[root@ip-172-31-35-103 centos]# ip route add 172.31.100.1 dev eth1 table 2
[root@ip-172-31-35-103 centos]# ip route add default via 172.31.100.1 table 2

Sure, i will check with our Amazon Linux engineers on why ip route add default via 172.31.100.1 table 2 is not working with some centOS7 AMI

liwenwu-amazon on 13 Nov 2018

Hi,

I have hit the same issue. Working in a VPC with 2 CIDR blocks and EKS worker nodes running on the second CIDR block.

The worker nodes use latest Amazon EKS-Optimized Amazon Linux AMI (v25).

Pods on the same node can communicate with no problem, but no communication works between pods on different nodes.

Setting AWS_VPC_K8S_CNI_EXTERNALSNAT as true in the aws-node daemonset solves the problem, but as I need to run on a public subnet with Internat access that's a problem for me.

Is there any other solution?

Thanks!!!

moshe0076 on 16 Nov 2018

I'm also seeing the same issue where pods in different nodes cannot communicate.

Using latest EKS-Optimized Amazon Linux AMI (v25) as well.

dovreshef on 18 Nov 2018

There is a PR for this

https://github.com/aws/amazon-vpc-cni-k8s/pull/234

Thanks!!!

moshe0076 on 19 Nov 2018

@sdavids13
I have installed centOS7 on a t2.medium instance. I am able to manually do following with the secondary ENI
[root@ip-172-31-35-103 centos]# ip route add 172.31.100.1 dev eth1 table 2
[root@ip-172-31-35-103 centos]# ip route add default via 172.31.100.1 table 2
Sure, i will check with our Amazon Linux engineers on why ip route add default via 172.31.100.1 table 2 is not working with some centOS7 AMI

@liwenwu-amazon you forgot the "dev eth1" in the default route command that's why you were able to add it. it was probably added via eth0.

I was working on a similar issue and it seems like it comes down to how the kernel handles that. if you would to add an IP address to eth1 it wouldn't complain about it but the aws cni plugin doesn't configure any ip addresses on the secondary interfaces.

CentOS 7 with kernel 3.10 will fail to add the default route via eth1 if eth1 doesn't have an ip address in that range.

I updated to kernel 4.x and had no issues but I don't think 4 is "officially" supported for CentOS 7.