Amazon-vpc-cni-k8s: IP of pod stuck in Terminating assigned to a new pod

Created on 15 Jul 2020  路  5Comments  路  Source: aws/amazon-vpc-cni-k8s

I'm experiencing an issue with the same IPs assigned to multiple pods:
My cni release version is:

kubectl describe daemonset aws-node -n kube-system | grep Image | cut -d "/" -f 2
amazon-k8s-cni:v1.6.3

the output of this kubect cmd shows clearly that the same IP is assigned to 2 containers:

kubectl get pods --all-namespaces -o wide | awk '{print $7}' |  sort | uniq -c | grep "   2"
   2 10.6.18.49

those are the containers with 10.6.18.49 IP:

kubectl get pods --all-namespaces -o wide | grep 10.6.18.49
data-team-deploy    ddops-cbje-14433-78fr0-q8nmj                2/2     Running             0          27h     10.6.18.49    ip-10-6-27-66.us-west-2.compute.internal    <none>           <none>
my-team             ymlpipeline-c7ae4-c511t-w7vc7               4/5     Terminating         0          6d17h   10.6.18.49    ip-10-6-27-66.us-west-2.compute.internal    <none>           <none>

This is already the second time we are having this issue, we noticed that start happening when we get some pods stuck in Terminating status.
The AWS CNI for some reason will re-allocate the IP as the resource has been freed but as we can see it is not true.

We are using EKS:
```
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.9-eks-f459c0", GitCommit:"f459c0672169dd35e77af56c24556530a05e9ab1", GitTreeState:"clean", BuildDate:"2020-03-18T04:24:17Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
````

The only workaround we have found so far is to remove the pods in stuck Terminating status.
No interesting logs found to share yet, but as I said the denominator for this issue is the pod stuck in Terminating status.

Other k8s clusters we run on v1.6.1 do not present this issue.

bug prioritP1

All 5 comments

Briefly looking at the code, i think it could be possible to add a safeguard check in this function:
https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/ipamd/ipamd.go#L956
using the k8s api (or even better the kubelet api) and query for that IP. if the IP is in use log it and even better export a prom metric that we can alert on.

I finally got more access into the server and i was able to dig a little further.
the pod definition for ymlpipeline-c7ae4-c511t-w7vc7
has multiple containers and one of this containers if failing because of a wrong entrypoint definition (kubectl describe):

  restapi:
    Container ID:   docker://c81d98a91eda8f6527fc726d857be5a8d290c881f7999a4348eb36afe788a359
    Image:          <removed>
    Image ID:       <removed>
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       ContainerCannotRun
      Message:      OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"./run.sh\": stat ./run.sh: no such file or directory": unknown
      Exit Code:    127
      Started:      Tue, 07 Jul 2020 15:43:38 -0700
      Finished:     Tue, 07 Jul 2020 15:43:38 -0700
    Ready:          False

In the local kubelet logs I found the following recurring error:

Jul 11 15:37:29 ip-10-6-27-66.us-west-2.compute.internal kubelet[4795]: E0711 15:37:29.190364    4795 status_manager.go:331] Status update on pod my-team/ymlpipeline-c7ae4-c511t-w7vc7 aborted: terminated container restapi attempted illegal transition to non-terminated state

that led me to this bug report:
https://github.com/kubernetes/kubernetes/issues/76382

that unfortunately has never been fixed.

The issue appears to be related to kubelet PLEG loop and indeed after restarting the local kubelet (for enabling anonymous access so i could query the local kubelet api) the duplicated container disappeared from the k8s api query results.

tldr will make sure to double check our pod definition but I believe that the CNI should still offer a safeguard around this issue, because this issue triggered the failure of other systems that rely on the unique POD IP (firewall, kube2iam).

I'm concerned that if we try to check with the API server if the IP is still in use, there can be other race conditions as well.

If we could ensure that if the call by the kubelet to the CNI to delete the networking succeeds, the pod will get its IP removed. The CNI should clean up all IP rules and routes, so the pod will not have any networking anyway. We will follow up to see how this affects kube-proxy as well.

Hi @ltagliamonte-dd

We had seen similar issue with 1.6.3 and this was fixed in 1.6.4(https://github.com/aws/amazon-vpc-cni-k8s/pull/1118). Please try 1.6.4 or 1.7.X and see if the issue happens again.

Thank you.

thank you @jayanthvn and @mogren to look into this.
I will schedule and update on our side.. but will take some time to do it.. have you tried to repro it using the k8s issue report https://github.com/kubernetes/kubernetes/issues/76382 ?

Was this page helpful?
0 / 5 - 0 ratings