Occasionally, some pods are allocated IP addresses that are not reachable through the VPC peer but can be reached within the k8s VPC. The cni logs for the plugin and ipamd seemed fine.
Take a look at AWS_VPC_K8S_CNI_EXTERNALSNAT, if your pods have internal IPs then setting this to true may resolve your issue.
We're on a public subnet right now and don't have another nat device
We've seen some of this as well, even with setting AWS_VPC_K8S_CNI_EXTERNALSNAT=true , we found out that it causes situations of asymmetric TCP routing which the VPC network does not seem to handle correctly because requests go in through one NIC but responses come out from another NIC and those responses get lost.
@Chili-Man Did you find a solution for this? I think I have a similar issue with my cluster on a VPC with a Virtual Gateway attached. I am using EKS with the latest AWS CNI with AWS_VPC_K8S_CNI_EXTERNALSNAT=true also set. It doesn't seem to change anything.
We've seen this too, on release 1.5.
We notice it from time to time in the automated cluster we use for integration tests. Creating the same kubernetes deployment & pods one time works, and the next time won't, so it's difficult to reproduce even given the repeatable nature of the test.
Experiencing the same issue on amazon-k8s-cni:v1.4.0
Able to reach the node but not the pod from another region connected through VPC peering.
Have anyone tested with v1.6.0-rc1 and AWS_VPC_K8S_CNI_EXCLUDE_SNAT_CIDRS set?
We currently are, with settings below, and have not seen this issue crop up yet.
- name: AWS_VPC_K8S_CNI_EXTERNALSNAT
value: "false"
- name: AWS_VPC_K8S_CNI_EXCLUDE_SNAT_CIDRS
value: 10.XX.XX.XX/16
- name: AWS_VPC_K8S_CNI_LOGLEVEL
value: DEBUG
Solved by adding AWS_VPC_K8S_CNI_EXCLUDE_SNAT_CIDRS support.
Most helpful comment
We've seen some of this as well, even with setting
AWS_VPC_K8S_CNI_EXTERNALSNAT=true, we found out that it causes situations of asymmetric TCP routing which the VPC network does not seem to handle correctly because requests go in through one NIC but responses come out from another NIC and those responses get lost.