What happened:
Not sure if any of you have experienced the issue with ipamd leaking IP addresses and eventually failing to get new ones allocated on K8s nodes. But we have noticed this as a recurring issue in one of our small clusters.
The issue was first identified through a failed pod of a DaemonSet blocking the DaemonSet to fully upgrade. Pod on the problematic node gets stuck in the stage of ContainerCreating (output slightly altered for security purposes):
# kubectl get pods -n infra -l release=xyz -o wide --sort-by=status.phase
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
xyz-agent-7f4c7 0/1 ContainerCreating 0 15h <none> ip-10-4-185-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-4s4x9 1/1 Running 0 4d8h 10.4.220.xxx ip-10-4-219-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-5tk7n 1/1 Running 0 2d12h 10.4.193.xxx ip-10-4-215-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-5dxfr 1/1 Running 0 12h 10.4.138.xxx ip-10-4-146-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-8cljp 1/1 Running 0 2d12h 10.4.171.xxx ip-10-4-167-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-dvbhd 1/1 Running 0 3d6h 10.4.193.xxx ip-10-4-208-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-nn627 1/1 Running 0 3d6h 10.4.144.xxx ip-10-4-151-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-s9qz8 1/1 Running 0 2d12h 10.4.178.xxx ip-10-4-178-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-sxbxk 1/1 Running 0 3d7h 10.4.211.xxx ip-10-4-213-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-v4ck9 1/1 Running 0 4d13h 10.4.169.xxx ip-10-4-186-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-vc6km 1/1 Running 0 3d6h 10.4.143.xxx ip-10-4-135-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-vnbcv 1/1 Running 0 3d6h 10.4.190.xxx ip-10-4-185-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-z2rxf 1/1 Running 0 3d6h 10.4.152.xxx ip-10-4-130-xxx.us-west-2.compute.internal <none> <none>
xyz-agent-zlnvb 1/1 Running 0 3d6h 10.4.218.xxx ip-10-4-220-xxx.us-west-2.compute.internal <none> <none>
If we look at the events around the pod, the look something like:
# kubectl get events -n infra --field-selector=involvedObject.name=xyz-agent-7f4c7 -o json | kubectl describe -f -
Name: xyz-agent-7f4c7.1631a103c1176ed8
Namespace: infra
Labels: <none>
Annotations: <none>
API Version: v1
Count: 31243
Event Time: <nil>
First Timestamp: 2020-09-04T16:19:40Z
Involved Object:
API Version: v1
Kind: Pod
Name: xyz-agent-7f4c7
Namespace: infra
Resource Version: 51945182
UID: 39f8c74a-1942-4676-842b-0123456789ab
Kind: Event
Last Timestamp: 2020-09-05T07:49:39Z
Message: Pod sandbox changed, it will be killed and re-created.
Metadata:
Creation Timestamp: 2020-09-05T03:09:39Z
Resource Version: 2357175
Self Link: /api/v1/namespaces/infra/events/xyz-agent-7f4c7.1631a103c1176ed8
UID: 61982647-e14d-4364-8cca-0123456789ab
Reason: SandboxChanged
Reporting Component:
Reporting Instance:
Source:
Component: kubelet
Host: ip-10-4-185-xxx.us-west-2.compute.internal
Type: Normal
Events: <none>
Name: xyz-agent-7f4c7.1631a107e92b8447
Namespace: infra
Labels: <none>
Annotations: <none>
API Version: v1
Count: 32031
Event Time: <nil>
First Timestamp: 2020-09-04T16:19:58Z
Involved Object:
API Version: v1
Kind: Pod
Name: xyz-agent-7f4c7
Namespace: infra
Resource Version: 51945182
UID: 39f8c74a-1942-4676-842b-0123456789ab
Kind: Event
Last Timestamp: 2020-09-05T08:14:41Z
Message: (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "899ddd98194db655b4fd62c4bb75ee6212587f18e7493889771ca2f691d5e7b2" network for pod "xyz-agent-7f4c7": networkPlugin cni failed to set up pod "xyz-agent-7f4c7_infra" network: add cmd: failed to assign an IP address to container
Metadata:
Creation Timestamp: 2020-09-04T16:19:58Z
Resource Version: 2357767
Self Link: /api/v1/namespaces/infra/events/xyz-agent-7f4c7.1631a107e92b8447
UID: 0dc1f7b8-eabb-47ca-b0b7-0123456789ab
Reason: FailedCreatePodSandBox
Reporting Component:
Reporting Instance:
Source:
Component: kubelet
Host: ip-10-4-185-xxx.us-west-2.compute.internal
Type: Warning
Events: <none>
The ENI IP assignment issue results in failure in pod sandbox creation as can be found in the messages above.
On the same node there were actually 2 more pods stuck in the ContainerCreating state:
# kubectl get pods -A --field-selector=spec.nodeName=ip-10-4-185-xxx.us-west-2.compute.internal --no-headers --sort-by=status.phase | awk '{ print $4 }' | uniq -c
3 ContainerCreating
18 Running
The issue persists against new pods from the same DaemonSet after we terminated the old ones.
After hopping onto the node, we could find the following error messages in the ipam*.log file:
root@ip-10-4-185-xxx:/var/log/aws-routed-eni# zcat *.gz | jq -c 'select(.level == "error") | {level, caller, msg}' | uniq -c
90117 {"level":"error","caller":"datastore/data_store.go:288","msg":"DataStore has no available IP addresses"}
Also, the node has been fully assigned with 8 ENIs, each allocated with 30 IP addresses. This could be verified by the following procedure:
root@ip-10-4-185-xxx:~# eni-to-yaml | yq -y 'map(.["local-ipv4s"] | length) | {totalENIs: length, totalIPs: add}'
totalENIs: 8
totalIPs: 240
(... where eni-to-yaml is a bash function from the following snippet):
# Create this function first:
function eni-to-yaml() {
local eni_mac_url=http://169.254.169.254/latest/meta-data/network/interfaces/macs/
curl -s "${eni_mac_url}" | eni_mac_url="${eni_mac_url}" xargs -n 1 -I'{{}}' sh -c \
'printf -- "- interface-id: %s\n local-ipv4s:\n" "$(curl -s "${eni_mac_url}"{{}}"interface-id")"; curl -s "${eni_mac_url}"{{}}"local-ipv4s" | sort | awk '\''{ print " - " $0 }'\'
}
While on a good node, the number of IPs are just 21 (20 + 1 for primary IP):
root@ip-10-4-167-xxx:~# eni-to-yaml | yq -Y 'map(.["local-ipv4s"] | length) | {totalENIs: length, totalIPs: add}'
totalENIs: 1
totalIPs: 21
Environment:
kubectl version): v1.17.9AWS CNI v1.6.1cat /etc/os-release): Debian Stretchuname -a): 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 GNU/LinuxWould appreciate if someone could look into this and provide insights on how this happened, as well as recommendations on how to alleviate/address the issue. Thanks a lot!
@jasonzlai Hi, thanks for the detailed issue! If you have the logs for the node with the issue, we would love to take a look at it. Either just email it to me, or send through a support ticket.
In short, "DataStore has no available IP addresses" means that even though the ENIs get attached to the instance, and the secondary IPs gets added to them, they never get added to the "data store", which is the internal cache that ipamd is using to give IPs to pods. Why this happens should be seen in the ipamd.log. The prime suspect is that the setting up the newly created ENI is failing for some reason, this function.
A question to start with, was there ever a "Force detach" on this instance. I was wondering if the instance metadata was stale or if doing curl on the metadata endpoint fails?
We have refactored a lot of this in the latest v1.7.2-rc1 release candidate, and we are running through all our test cases right now.
Hi @jasonzlai , Can you please collect the CNI logs which @mogren is requesting. You can email to me [email protected] if it is fine or please attach it to the support ticket and share us the ticket.
CNI log collection tool [https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html]
Thanks.
@jasonzlai Are you still seeing this issue?
Hi @jasonzlai
Please reopen the issue if it still exists and please share the CNI logs which help us debug further.
Thank you!
Hey @mogren and @jayanthvn! Sorry I forgot to follow up here. Actually we found out the issue was caused by Cilium installed in the same cluster. After Cilium got uninstalled, the issue no longer stands. Thanks a lot again for looking into this!