Amazon-vpc-cni-k8s: Liveness and Readiness probes fail using the CNI per pod SG

Created on 27 Oct 2020 · 11Comments · Source: aws/amazon-vpc-cni-k8s

What happened: WE are using 1.17 eks cluster and 1.7.3 CNI version in our environment, All our liveness and readiness probs fail to connect and pods keep restarting, we do have the ports open both on the worker node SG and the inbound for pod SG.
We have the DISABLE_TCP_EARLY_DEMUX enabled to true.
We can however login to the node and telnet to the pod on the port number its listening on for the probes
our probes are simple TCP call to a port on pod.
pods are communicating to the other apps via SG just fine, we have to disabled the probes for now for the application to come up.
are we missing anything?

Environment:

Kubernetes version (use kubectl version):1.17
CNI Version 1.7.3
OS (e.g: cat /etc/os-release):Amazon Linux Release 2 (karoo)
Kernel (e.g. uname -a):Linux 4.14.198-152.320.amzn2.x86_64

needs investigation question

Source

hetpats

All 11 comments

Hi @hetpats

Can you please provide us the below information -

On the instance -

cat /proc/sys/net/ipv4/tcp_early_demux

can you email me ([email protected]) your Cluster ARN

Thank you!

jayanthvn on 27 Oct 2020

email sent with details

hetpats on 27 Oct 2020

Same for us running CNI 1.7.5. All liveness and readiness probes are failing either with Allow ANY in the SGs.
Just with basic ones:
livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 3

Warning Unhealthy <invalid> (x3 over <invalid>) kubelet, ip-100-126-42-46.eu-west-1.compute.internal Liveness probe failed: Get http://100.126.17.194:80/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

atimush on 27 Oct 2020

👍1

@atimush do you use http-proxy in your setup as well? Let me know if you have a support ticket, I can get the cluster details and take a look at your issue or you can email me ([email protected] or [email protected]) the details as well.

SaranBalaji90 on 27 Oct 2020

@SaranBalaji90 Hey Saran, we do not use any http-proxy.
We have an on-going support case. I found the issue here and taught it would relevant to mention same problem as other people experience just to facilitate others who started to explore the new feature and might spend hours of debugging (like me) 👍

atimush on 28 Oct 2020

@atimush can you check if DISABLE_TCP_EARLY_DEMUX is set to true under initContainers env (described here https://github.com/aws/amazon-vpc-cni-k8s/blob/master/README.md)

SaranBalaji90 on 28 Oct 2020

Hi! Same issue with EKS 1.18 and 1.7.5 CNI version

SkySonR on 30 Oct 2020

@SkySonR can you verify if DISABLE_TCP_EARLY_DEMUX is set to true in your initContainer? Because looks like documentation is missing this step which is causing confusion. We are updating the documentation now. Sorry for the inconvenience.

SaranBalaji90 on 30 Oct 2020

@atimush @SkySonR we have our updated docs, let me know if this doesn't solve the problem. I will be happy to assist you here.
CNI ReadMe - https://github.com/aws/amazon-vpc-cni-k8s/pull/1273/files
Aws Docs - https://github.com/awsdocs/amazon-eks-user-guide/pull/233
Blog post - https://aws.amazon.com/blogs/containers/introducing-security-groups-for-pods/

SaranBalaji90 on 30 Oct 2020

@SaranBalaji90 Thank you. That fixed the issue.

atimush on 2 Nov 2020

🎉1

Documentation is updated hence closing the issue. Kindly let us know if you have any more questions.

jayanthvn on 11 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings