What happened: WE are using 1.17 eks cluster and 1.7.3 CNI version in our environment, All our liveness and readiness probs fail to connect and pods keep restarting, we do have the ports open both on the worker node SG and the inbound for pod SG.
We have the DISABLE_TCP_EARLY_DEMUX enabled to true.
We can however login to the node and telnet to the pod on the port number its listening on for the probes
our probes are simple TCP call to a port on pod.
pods are communicating to the other apps via SG just fine, we have to disabled the probes for now for the application to come up.
are we missing anything?
Environment:
kubectl version):1.17cat /etc/os-release):Amazon Linux Release 2 (karoo)uname -a):Linux 4.14.198-152.320.amzn2.x86_64Hi @hetpats
Can you please provide us the below information -
cat /proc/sys/net/ipv4/tcp_early_demux
Thank you!
email sent with details
Same for us running CNI 1.7.5. All liveness and readiness probes are failing either with Allow ANY in the SGs.
Just with basic ones:
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 3
Warning Unhealthy <invalid> (x3 over <invalid>) kubelet, ip-100-126-42-46.eu-west-1.compute.internal Liveness probe failed: Get http://100.126.17.194:80/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
@atimush do you use http-proxy in your setup as well? Let me know if you have a support ticket, I can get the cluster details and take a look at your issue or you can email me ([email protected] or [email protected]) the details as well.
@SaranBalaji90 Hey Saran, we do not use any http-proxy.
We have an on-going support case. I found the issue here and taught it would relevant to mention same problem as other people experience just to facilitate others who started to explore the new feature and might spend hours of debugging (like me) 馃憤
@atimush can you check if DISABLE_TCP_EARLY_DEMUX is set to true under initContainers env (described here https://github.com/aws/amazon-vpc-cni-k8s/blob/master/README.md)
Hi! Same issue with EKS 1.18 and 1.7.5 CNI version
@SkySonR can you verify if DISABLE_TCP_EARLY_DEMUX is set to true in your initContainer? Because looks like documentation is missing this step which is causing confusion. We are updating the documentation now. Sorry for the inconvenience.
@atimush @SkySonR we have our updated docs, let me know if this doesn't solve the problem. I will be happy to assist you here.
CNI ReadMe - https://github.com/aws/amazon-vpc-cni-k8s/pull/1273/files
Aws Docs - https://github.com/awsdocs/amazon-eks-user-guide/pull/233
Blog post - https://aws.amazon.com/blogs/containers/introducing-security-groups-for-pods/
@SaranBalaji90 Thank you. That fixed the issue.
Documentation is updated hence closing the issue. Kindly let us know if you have any more questions.