Maybe related to https://github.com/aws/amazon-vpc-cni-k8s/issues/212?
We have a VPC with a primary CIDR block in 10.0.0.0/8 space and a secondary CIDR block in 100.64.0.0/10 space.
A single EKS worker node running CentOS 7 with primary IP address on a 10.x.x.x subnet and an ENIConfig annotation on the node for a 100.64.x.x subnet.
Pods running on the 100.64.x.x can communicate with pods on the same node running on the primary IP (hostNetwork: true) but cannot communicate off-node (e.g. to the control plane either directly or using the kubernetes service ClusterIP address).
I can kubectl exec into pods running on the 100.64.x.x subnet and all relevant route tables, NACLs and security groups are correctly configured.
$ kubectl --kubeconfig kubeconfig run -i --rm --tty debug --image=busybox -- sh
If you don't see a command prompt, try pressing enter.
/ # ifconfig
eth0 Link encap:Ethernet HWaddr AE:94:60:6F:AA:2E
inet addr:100.64.x.x Bcast:100.64.x.x Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:508 (508.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
/ # wget http://10.x.x.x:61678/v1/networkutils-env-settings
Connecting to 10.x.x.x:61678 (10.x.x.x:61678)
networkutils-env-set 100% |*************************************************************************************************************| 105 0:00:00 ETA
/ # wget https://10.y.y.y/
Connecting to 10.y.y.y (10.y.y.y:443)
^C
10.x.x.x is the node's primary IP, 10.y.y.y is one of the EKS control plane ENIs.
This prevents critical components like kube-dns starting
$ kubectl --kubeconfig kubeconfig logs --namespace kube-system kube-dns-d87b74b4f-f5gff kubedns
...
I1102 20:51:07.938774 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E1102 20:51:07.940074 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://172.20.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 172.20.0.1:443: i/o timeout
...
@liwenwu-amazon I can provide the aws-cni-support.tar.gz via a support ticket.
I think you might run into issue #35 . Can you try to set AWS_VPC_K8S_CNI_EXTERNALSNAT to true and see if your problem is solved? thanks
https://docs.aws.amazon.com/eks/latest/userguide/external-snat.html
I am running with AWS_VPC_K8S_CNI_EXTERNALSNAT=true:
$ kubectl --kubeconfig kubeconfig describe pod --namespace kube-system aws-node-6v94g
Name: aws-node-6v94g
Namespace: kube-system
Node: ip-10-x-x-x.us-west-2.compute.internal/10.x.x.x
Start Time: Fri, 02 Nov 2018 15:16:01 -0400
Labels: controller-revision-hash=3496945906
k8s-app=aws-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.x.x.x
Controlled By: DaemonSet/aws-node
Containers:
aws-node:
Container ID: docker://10b1f2669628dd5cb0695160005e40576f765d46fedf1e1e19eab6d813be2a07
...
Environment:
AWS_VPC_K8S_CNI_LOGLEVEL: DEBUG
MY_NODE_NAME: (v1:spec.nodeName)
WATCH_NAMESPACE: kube-system (v1:metadata.namespace)
AWS_VPC_K8S_CNI_EXTERNALSNAT: true
AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG: true
...
@ewbankkit Do the Pod's subnet and the Node's subnet use same route table?
No, I have them in separate route tables - No real reason why I did it that way, just wanted to keep everything as self-contained as possible.
Does Pods of 2 different nodes (which can NOT ping each other) uses same subnet? Also, have you checked the security groups for the Pods of these 2 nodes, are they allowed to communicate to each other?
I tried with putting the pod subnets in the same route table as the node subnets and still could not connect to the control plane from the pods. Yes, SGs are wide open.
I cannot ping pod to pod if they are on different nodes (but in the same subnet).
@ewbankkit will it work if Pod's ENIConfig is using same security group as node?
I believe I'm running into this issue as well.
I have a small primary cidr block in the 10.0.0.0/8 range and a (disjoint) larger secondary cidr also in the 10.0.0.0/8 block. The nodes are coming up in the primary cidr block, and have eniconfigs set to use the secondary block, but with the same security groups as the node. All relevant subnets are using the same route table. I am running with AWS_VPC_K8S_CNI_EXTERNALSNAT=false, however I believe that is required given some other constraints I have.
pods can communicate with other pods on the same node, however cannot communicate with pods on other nodes or control plane using the cluster IP. In my case they can communicate with an off node proxy server (i.e. curl --proxy
My thought is that some of the things I did to try to workaround #212 are allowing the off node communication to work, however it's not fully addressed and so the cluster IPs don't work between nodes. I'm not positive about that though, and could be completely off base.
@lnr0626 , can you try pinging from one pod to another pod on different Node? Does it work?
@liwenwu-amazon Same problem if I run the pods in the same SG as the worker nodes.
Interestingly in the flow logs for the Control Plane ENIs I can see ACCEPTs on port 443 from one of the pod IPs but nothing the other way.
@ewbankkit I few more questions:
@ewbankkit for debugging purpose, does it pod-to-pod ping even work if Pod ENIconfig is using exact same SG/subnt as node?
@liwenwu-amazon pinging a pod on a different node does not work.
@lnr0626 can you provide following debugging information
Assuming you are pinging from Pod-a on Node-a to Pod-b on Node-b and both Pod-a and Pod-b are using secondary IPs from eth1 interface (where eth0 is node's primary ENI)
kubectl describe node <Node-a>kubectl describe node <Node-b>kubectl describe eniconfig <pod's eniconfig>eth1 of node-a (assuming eth1 is the ENI for Pods)tcpdump -i eth1 -w node_a_eth1.pcapeth1 of node-b (assuming eth1 is the ENI for Pods)tcpdump -i eth1 -w node_b-eth1.pcapkubectl exec -ti <pod-a> shping pod-b's IP for 5 mins/opt/cni/bin/aws-cni-support.sh on both node/var/log/aws-routed-eni/aws-cni-support.tar.gzthanks
@liwenwu-amazon
ec2-net-utils, I'll install it@ewbankkit Please do NOT install ec2-net-utils.
Can you run same test and provide the debug info I asked from @lnr0626 ?
thanks
After much debug to-and-fro it looks like the root cause is the inability to add a default route during IP address assignment:
[ERROR] Failed to increase pool size: failed to setup eni eni-0000000000000000 network: eni network setup: unable to add route 0.0.0.0/0 via 100.64.0.1 table 2: network is unreachable
Using the retry functionality from https://github.com/aws/amazon-vpc-cni-k8s/pull/223 doesn't resolve.
At the Linux command line the equivalent is
# ip route add default via 100.64.0.1 dev eth1 table 2
RTNETLINK answers: Network is unreachable
although the eth1 link shows as up.
My feeling is that this is caused by a configuration on our base AMI as I have success with v24 of the Amazon EKS Worker AMI based on Amazon Linux 2.
We are running CentOS 7 as well and getting similar issues. Any help would be greatly appreciated, we are running out of options for IP space without this particular feature available. Unfortunately moving to Amazon Linux 2 really isn't an option for us at this point. It does seem suspect that this doesn't work for either CentOS 7 or Ubuntu, @liwenwu-amazon do you have any ideas or could you reach out to someone inside Amazon who maintains the Amazon Linux 2 image to see if they have any ideas?
@sdavids13
I have installed centOS7 on a t2.medium instance. I am able to manually do following with the secondary ENI
[root@ip-172-31-35-103 centos]# ip route add 172.31.100.1 dev eth1 table 2
[root@ip-172-31-35-103 centos]# ip route add default via 172.31.100.1 table 2
Sure, i will check with our Amazon Linux engineers on why ip route add default via 172.31.100.1 table 2 is not working with some centOS7 AMI
Hi,
I have hit the same issue. Working in a VPC with 2 CIDR blocks and EKS worker nodes running on the second CIDR block.
The worker nodes use latest Amazon EKS-Optimized Amazon Linux AMI (v25).
Pods on the same node can communicate with no problem, but no communication works between pods on different nodes.
Setting AWS_VPC_K8S_CNI_EXTERNALSNAT as true in the aws-node daemonset solves the problem, but as I need to run on a public subnet with Internat access that's a problem for me.
Is there any other solution?
Thanks!!!
I'm also seeing the same issue where pods in different nodes cannot communicate.
Using latest EKS-Optimized Amazon Linux AMI (v25) as well.
@sdavids13
I have installed centOS7 on a t2.medium instance. I am able to manually do following with the secondary ENI[root@ip-172-31-35-103 centos]# ip route add 172.31.100.1 dev eth1 table 2 [root@ip-172-31-35-103 centos]# ip route add default via 172.31.100.1 table 2Sure, i will check with our Amazon Linux engineers on why
ip route add default via 172.31.100.1 table 2is not working with some centOS7 AMI
@liwenwu-amazon you forgot the "dev eth1" in the default route command that's why you were able to add it. it was probably added via eth0.
I was working on a similar issue and it seems like it comes down to how the kernel handles that. if you would to add an IP address to eth1 it wouldn't complain about it but the aws cni plugin doesn't configure any ip addresses on the secondary interfaces.
CentOS 7 with kernel 3.10 will fail to add the default route via eth1 if eth1 doesn't have an ip address in that range.
I updated to kernel 4.x and had no issues but I don't think 4 is "officially" supported for CentOS 7.
@lutierigb Thanks for digging in to this.
I added code to explicitly add the primary IP on secondary ENIs in https://github.com/aws/amazon-vpc-cni-k8s/pull/271 and have verified that this works on CentOS 7 (and Amazon Linux 2 so no regression).
Most helpful comment
@lutierigb Thanks for digging in to this.
I added code to explicitly add the primary IP on secondary ENIs in https://github.com/aws/amazon-vpc-cni-k8s/pull/271 and have verified that this works on CentOS 7 (and Amazon Linux 2 so no regression).