Steps to reproduce the issue:
Slightly different environment is set up in my case and the test cases are demonstrated in the attached file in detail.
eks-nlb-issue-aws-support.txt
We are facing the same issue, did you find a solution?
I'm facing the same issue!
CNI v1.6.0
We have a different ingress controller (Traefik) and are seeing the same issue. In a 3-node ASG, a pod on node A that makes a request to an NLB where that request is served by same backend node running the requesting pod, the connection times out. We have two listeners and this happens on both the TLS and TCP listeners.
My suspicion is that this is because the NLB preserves the source IP of the request when connecting to the backend and the CNI iptables rules are getting tripped up with that connection.
I am seeing this issue too.
But when I call from outside cluster node, it works fine
I guess, if I am on same eks cluster, I would use k8s-svc and not NLB; I guess I am fine for now; but interested to see what how it is happening.
@ravicm @jonathan-mothership
Expected behavior with NLB when both Source and Target Nodes are same. Internal NLB do not support loopback or hair pinning. Since NLB in instance mode preserves the source ip of the packet, the response (say SYN ACK) will directly go to the local client pod. So, SYN will have client PodIP as SIP and NLB IP as DIP where as SYN ACK will have server PodIP as SIP(and not NLB IP) and client PodIP as DIP when both the pods are on the same node, so client pod will terminate/reset the TCP session.
Refer to “Connections time out for requests from a target to its load balancer” section in the below guide. It also lists out possible options. NLB in IP mode should help out in this scenario.
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html.
Hi,
Since this is an expected behavior will be closing this issue for now.
Thank you!
Most helpful comment
@ravicm @jonathan-mothership
Expected behavior with NLB when both Source and Target Nodes are same. Internal NLB do not support loopback or hair pinning. Since NLB in instance mode preserves the source ip of the packet, the response (say SYN ACK) will directly go to the local client pod. So, SYN will have client PodIP as SIP and NLB IP as DIP where as SYN ACK will have server PodIP as SIP(and not NLB IP) and client PodIP as DIP when both the pods are on the same node, so client pod will terminate/reset the TCP session.
Refer to “Connections time out for requests from a target to its load balancer” section in the below guide. It also lists out possible options. NLB in IP mode should help out in this scenario.
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html.