Recently applied 1.7.0 manifests in one of our clusters and we are seeing new pods are having interfaces created with no ip address associated with them on the host.
$ kubectl get pod -o wide | grep debug
shell-node-debug 1/1 Running 0 6s 172.30.164.149 ip-172-30-165-219.eu-central-1.compute.internal <none>
<none>
$ tail /var/log/aws-routed-eni/plugin.log
{"level":"info","ts":"2020-08-22T11:41:26.166Z","caller":"routed-eni-cni-plugin/cni.go:117","msg":"Received CNI add request: ContainerID(7864f0bbebd4eb32871d4a42414e7e216f4614203d59c3ec1d7fbbe532975064) Netns(/proc/17287/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=shell-node-debug;K8S_POD_INFRA_CONTAINER_ID=7864f0bbebd4eb32871d4a42414e7e216f4614203d59c3ec1d7fbbe532975064) Path(/opt/cni/bin/) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"Debug\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"debug","ts":"2020-08-22T11:41:26.166Z","caller":"routed-eni-cni-plugin/cni.go:117","msg":"MTU value set is 9001:"}
{"level":"info","ts":"2020-08-22T11:41:26.168Z","caller":"routed-eni-cni-plugin/cni.go:117","msg":"Received add network response for container 7864f0bbebd4eb32871d4a42414e7e216f4614203d59c3ec1d7fbbe532975064 interface eth0: Success:true IPv4Addr:\"172.30.164.149\" UseExternalSNAT:true VPCcidrs:\"172.30.160.0/20\" "}
{"level":"debug","ts":"2020-08-22T11:41:26.168Z","caller":"routed-eni-cni-plugin/cni.go:194","msg":"SetupNS: hostVethName=eni7c64954e155, contVethName=eth0, netnsPath=/proc/17287/ns/net, table=0, mtu=9001"}
{"level":"debug","ts":"2020-08-22T11:41:26.194Z","caller":"driver/driver.go:184","msg":"setupVeth network: disabled IPv6 RA and ICMP redirects on eni7c64954e155"}
{"level":"debug","ts":"2020-08-22T11:41:26.194Z","caller":"driver/driver.go:178","msg":"Setup host route outgoing hostVeth, LinkIndex 12"}
{"level":"debug","ts":"2020-08-22T11:41:26.194Z","caller":"driver/driver.go:178","msg":"Successfully set host route to be 172.30.164.149/0"}
{"level":"info","ts":"2020-08-22T11:41:26.194Z","caller":"driver/driver.go:178","msg":"Added toContainer rule for 172.30.164.149/32"}
md5-401ed2ca61f8fc26ee0cefc19dadff81
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.30.164.1 0.0.0.0 UG 100 0 0 ens5
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.30.164.0 0.0.0.0 255.255.252.0 U 0 0 0 ens5
172.30.164.1 0.0.0.0 255.255.255.255 UH 100 0 0 ens5
172.30.166.133 0.0.0.0 255.255.255.255 UH 0 0 0 enic9a8a93f307
172.30.166.147 0.0.0.0 255.255.255.255 UH 0 0 0 enic91275e76d3
172.30.167.107 0.0.0.0 255.255.255.255 UH 0 0 0 eniae804caef6a
172.30.167.144 0.0.0.0 255.255.255.255 UH 0 0 0 eni9ae85d9b9d4
md5-781a99a59da01678acfe47b2249d2076
$ ifconfig eni7c64954e155
eni7c64954e155: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
ether 6a:e9:93:af:0b:2e txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 19 bytes 5187 (5.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
md5-b4aa18197592f03df77a7cd44eaebef3
- name: AWS_VPC_K8S_CNI_LOGLEVEL
value: DEBUG
- name: AWS_VPC_K8S_CNI_LOG_FILE
value: stdout
- name: AWS_VPC_K8S_CNI_EXTERNALSNAT
value: "true"
- name: AWS_VPC_CNI_NODE_PORT_SUPPORT
value: "true"
- name: AWS_VPC_ENI_MTU
value: "9001"
- name: AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER
value: "true"
- name: AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG
value: "false"
- name: MY_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: WARM_IP_TARGET
value: "10"
Kubernetes: 1.17.9
AWS CNI: 1.7.0
Thank you for reporting the issue @kgtw. Is it possible to run CNI log collection tool and send us the files?
I'm wondering if there was a delete request from kubelet for this pod or whether two pods (running and terminating?) got same IP and deleting one of them caused the IP to be cleaned up. Logs from ipamd, plugin will help us to dive deep here.
Hi @SaranBalaji90, I've re-applied the manifests and manually deployed a test pod shell-node-debug that hadn't previously existed on to a newly created node, this is the relevant logs for that action.
Unfortunately we cannot provide the entire debug information from that script due to company policies, but here are the relevant logs for the timestamp period when the debug pod was created.
plugin.log
{"level":"info","ts":"2020-08-22T16:47:47.359Z","caller":"routed-eni-cni-plugin/cni.go:117","msg":"Received CNI add request: ContainerID(a0366eae458ff21b8a0efe8190f673744fe83e65e524d5fa8d6c93b2881239f0) Netns(/proc/23799/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=shell-node-debug;K8S_POD_INFRA_CONTAINER_ID=a0366eae458ff21b8a0efe8190f673744fe83e65e524d5fa8d6c93b2881239f0) Path(/opt/cni/bin/) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"Debug\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"debug","ts":"2020-08-22T16:47:47.359Z","caller":"routed-eni-cni-plugin/cni.go:117","msg":"MTU value set is 9001:"}
{"level":"info","ts":"2020-08-22T16:47:47.361Z","caller":"routed-eni-cni-plugin/cni.go:117","msg":"Received add network response for container a0366eae458ff21b8a0efe8190f673744fe83e65e524d5fa8d6c93b2881239f0 interface eth0: Success:true IPv4Addr:\"172.30.167.51\" UseExternalSNAT:true VPCcidrs:\"172.30.160.0/20\" "}
{"level":"debug","ts":"2020-08-22T16:47:47.361Z","caller":"routed-eni-cni-plugin/cni.go:194","msg":"SetupNS: hostVethName=eni51add10f095, contVethName=eth0, netnsPath=/proc/23799/ns/net, table=0, mtu=9001"}
{"level":"debug","ts":"2020-08-22T16:47:47.417Z","caller":"driver/driver.go:184","msg":"setupVeth network: disabled IPv6 RA and ICMP redirects on eni51add10f095"}
{"level":"debug","ts":"2020-08-22T16:47:47.417Z","caller":"driver/driver.go:178","msg":"Setup host route outgoing hostVeth, LinkIndex 12"}
{"level":"debug","ts":"2020-08-22T16:47:47.417Z","caller":"driver/driver.go:178","msg":"Successfully set host route to be 172.30.167.51/0"}
{"level":"info","ts":"2020-08-22T16:47:47.417Z","caller":"driver/driver.go:178","msg":"Added toContainer rule for 172.30.167.51/32"}
ipamd.log
{"level":"info","ts":"2020-08-22T16:47:47.360Z","caller":"rpc/rpc.pb.go:486","msg":"Received AddNetwork for NS /proc/23799/ns/net, Sandbox a0366eae458ff21b8a0efe8190f673744fe83e65e524d5fa8d6c93b2881239f0, ifname eth0"}
{"level":"debug","ts":"2020-08-22T16:47:47.360Z","caller":"rpc/rpc.pb.go:486","msg":"AddNetworkRequest: ClientVersion:\"v1.7.0\" K8S_POD_NAME:\"shell-node-debug\" K8S_POD_NAMESPACE:\"kube-system\" K8S_POD_INFRA_CONTAINER_ID:\"a0366eae458ff21b8a0efe8190f673744fe83e65e524d5fa8d6c93b2881239f0\" ContainerID:\"a0366eae458ff21b8a0efe8190f673744fe83e65e524d5fa8d6c93b2881239f0\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/proc/23799/ns/net\" "}
{"level":"debug","ts":"2020-08-22T16:47:47.360Z","caller":"ipamd/rpc_handler.go:142","msg":"AssignIPv4Address: IP address pool stats: total: 15, assigned 5"}
{"level":"info","ts":"2020-08-22T16:47:47.360Z","caller":"datastore/data_store.go:499","msg":"AssignPodIPv4Address: Assign IP 172.30.167.51 to sandbox aws-cni/a0366eae458ff21b8a0efe8190f673744fe83e65e524d5fa8d6c93b2881239f0/eth0"}
{"level":"debug","ts":"2020-08-22T16:47:47.360Z","caller":"rpc/rpc.pb.go:486","msg":"VPC CIDR 172.30.160.0/20"}
{"level":"info","ts":"2020-08-22T16:47:47.361Z","caller":"rpc/rpc.pb.go:486","msg":"Send AddNetworkReply: IPv4Addr 172.30.167.51, DeviceNumber: 0, err: <nil>"}
{"level":"debug","ts":"2020-08-22T16:47:48.929Z","caller":"ipamd/ipamd.go:889","msg":"Current warm IP stats: target: 10, total: 15, assigned: 6, available: 9, short: 1, over 0"}
{"level":"debug","ts":"2020-08-22T16:47:48.929Z","caller":"ipamd/ipamd.go:510","msg":"Starting to increase IP pool size"}
{"level":"debug","ts":"2020-08-22T16:47:48.929Z","caller":"ipamd/ipamd.go:628","msg":"Current warm IP stats: target: 10, total: 15, assigned: 6, available: 9, short: 1, over 0"}
{"level":"debug","ts":"2020-08-22T16:47:48.929Z","caller":"ipamd/ipamd.go:729","msg":"Current warm IP stats: target: 10, total: 15, assigned: 6, available: 9, short: 1, over 0"}
{"level":"debug","ts":"2020-08-22T16:47:48.929Z","caller":"ipamd/ipamd.go:735","msg":"Found ENI eni-0fc3792c7cd8ef3cc that has less than the maximum number of IP addresses allocated: cur=15, max=29"}
{"level":"info","ts":"2020-08-22T16:47:48.929Z","caller":"ipamd/ipamd.go:739","msg":"Trying to allocate 14 IP addresses on ENI eni-0fc3792c7cd8ef3cc"}
{"level":"info","ts":"2020-08-22T16:47:49.404Z","caller":"ipamd/ipamd.go:739","msg":"Allocated 14 private IP addresses"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.124 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.165.248 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.214 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.167.246 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.18 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.165.210 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.13 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.104 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.167.169 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.70 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.6 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.167.229 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.2 to datastore"}
{"level":"info","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:793","msg":"Added ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.165.64 to datastore"}
{"level":"debug","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:755","msg":"IP Address Pool stats: total: 29, assigned: 6"}
{"level":"debug","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:645","msg":"Successfully increased IP pool, total: 29, used: 6"}
{"level":"debug","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:668","msg":"IP pool stats: total = 29, used = 6, c.maxIPsPerENI = 29"}
{"level":"debug","ts":"2020-08-22T16:47:49.507Z","caller":"ipamd/ipamd.go:919","msg":"Current warm IP stats: target: 10, total: 29, assigned: 6, available: 23, short: 0, over 13"}
{"level":"debug","ts":"2020-08-22T16:47:49.508Z","caller":"datastore/data_store.go:675","msg":"ENI eni-0fc3792c7cd8ef3cc cannot be deleted because it is primary"}
{"level":"debug","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:889","msg":"Current warm IP stats: target: 10, total: 29, assigned: 6, available: 23, short: 0, over 13"}
{"level":"debug","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:907","msg":"Current warm IP stats: target: 10, total: 29, assigned: 6, available: 23, short: 0, over 13"}
{"level":"debug","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:512","msg":"Starting to decrease IP pool"}
{"level":"debug","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:567","msg":"Current warm IP stats: target: 10, total: 29, assigned: 6, available: 23, short: 0, over 13"}
{"level":"debug","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:616","msg":"Current warm IP stats: target: 10, total: 29, assigned: 6, available: 23, short: 0, over 13"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.167.108 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.124 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.167.246 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.167.169 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.164.252 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.164.184 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.167.81 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.103 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.167.32 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.2 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.165.64 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.165.69 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:585","msg":"Deleted ENI(eni-0fc3792c7cd8ef3cc)'s IP 172.30.166.70 from datastore"}
{"level":"info","ts":"2020-08-22T16:47:54.508Z","caller":"ipamd/ipamd.go:596","msg":"Trying to unassign the following IPs [172.30.167.108 172.30.166.124 172.30.167.246 172.30.167.169 172.30.164.252 172.30.164.184 172.30.167.81 172.30.166.103 172.30.167.32 172.30.166.2 172.30.165.64 172.30.165.69 172.30.166.70] from ENI eni-0fc3792c7cd8ef3cc"}
{"level":"debug","ts":"2020-08-22T16:47:54.870Z","caller":"ipamd/ipamd.go:534","msg":"Successfully decreased IP pool by removing IPs [172.30.167.108 172.30.166.124 172.30.167.246 172.30.167.169 172.30.164.252 172.30.164.184 172.30.167.81 172.30.166.103 172.30.167.32 172.30.166.2 172.30.165.64 172.30.165.69 172.30.166.70] from ENI eni-0fc3792c7cd8ef3cc"}
{"level":"debug","ts":"2020-08-22T16:47:54.870Z","caller":"ipamd/ipamd.go:512","msg":"Successfully decreased IP pool"}
{"level":"debug","ts":"2020-08-22T16:47:54.870Z","caller":"ipamd/ipamd.go:540","msg":"IP pool stats: total = 16, used = 6, c.maxIPsPerENI = 29"}
{"level":"debug","ts":"2020-08-22T16:47:54.870Z","caller":"ipamd/ipamd.go:919","msg":"Current warm IP stats: target: 10, total: 16, assigned: 6, available: 10, short: 0, over 0"}
{"level":"debug","ts":"2020-08-22T16:47:54.870Z","caller":"datastore/data_store.go:675","msg":"ENI eni-0fc3792c7cd8ef3cc cannot be deleted because it is primary"}
kubelet
$ journalctl -fu kubelet --since "2020-08-22 16:42:34" --until "2020-08-22 16:43:34"
-- Logs begin at Sat 2020-08-22 16:41:38 UTC. --
Aug 22 16:42:34 ip-172-30-166-186 kubelet[2379]: I0822 16:42:34.007286 2379 kube_docker_client.go:345] Stop pulling image "REDACTED/jaegertracing/jaeger-agent:1.9.0": "Status: Downloaded newer image for REDACTED/jaegertracing/jaeger-agent:1.9.0"
Aug 22 16:42:34 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:42:34.229Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:42:34 ip-172-30-166-186 kubelet[2379]: I0822 16:42:34.876949 2379 kubelet.go:1952] SyncLoop (PLEG): "node-problem-detector-l8dsn_node-problem-detector(25d93733-77ca-4292-a14f-3a1c8a700b85)", event: &pleg.PodLifecycleEvent{ID:"25d93733-77ca-4292-a14f-3a1c8a700b85", Type:"ContainerStarted", Data:"ed4ec97abc79740a8a4f228f2a91df917f1802576803049c643d1b5442b7d423"}
Aug 22 16:42:34 ip-172-30-166-186 kubelet[2379]: I0822 16:42:34.883229 2379 kubelet.go:1952] SyncLoop (PLEG): "kured-2455s_kured(e8901744-5eba-4c50-a823-0a2a161c8cbb)", event: &pleg.PodLifecycleEvent{ID:"e8901744-5eba-4c50-a823-0a2a161c8cbb", Type:"ContainerStarted", Data:"e7453255bfae231d6bfb290f603436206f51e2736bcbef13f74b4bc47687686c"}
Aug 22 16:42:34 ip-172-30-166-186 kubelet[2379]: I0822 16:42:34.886649 2379 kubelet.go:1952] SyncLoop (PLEG): "jaeger-agent-jl5r4_tracing(6296834f-a475-453d-ac0f-4394bce4d54f)", event: &pleg.PodLifecycleEvent{ID:"6296834f-a475-453d-ac0f-4394bce4d54f", Type:"ContainerStarted", Data:"863be82ef98ae7c97ded728b14f6925bbd6c3c6c4312c4431f35656d95ad9aae"}
Aug 22 16:42:34 ip-172-30-166-186 kubelet[2379]: I0822 16:42:34.891773 2379 kubelet.go:1952] SyncLoop (PLEG): "cluster-overprovisioning-56ccddc5bc-m8wn4_kube-system(914518fb-d282-480d-b021-e61a002b4e9c)", event: &pleg.PodLifecycleEvent{ID:"914518fb-d282-480d-b021-e61a002b4e9c", Type:"ContainerStarted", Data:"eb12dd0745269c029ce538503cf7f57c3281f75fb9fa650a0ca847d00069acfe"}
Aug 22 16:42:35 ip-172-30-166-186 kubelet[2379]: W0822 16:42:35.004286 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/6296834f-a475-453d-ac0f-4394bce4d54f/volumes/kubernetes.io~secret/jaeger-agent-token-fbr6b and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:42:35 ip-172-30-166-186 kubelet[2379]: I0822 16:42:35.741063 2379 kube_docker_client.go:345] Stop pulling image "REDACTED/fluentbit-s3:1.4.2": "Status: Downloaded newer image for REDACTED/fluentbit-s3:1.4.2"
Aug 22 16:42:36 ip-172-30-166-186 kubelet[2379]: W0822 16:42:36.006111 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/6296834f-a475-453d-ac0f-4394bce4d54f/volumes/kubernetes.io~secret/jaeger-agent-token-fbr6b and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:42:36 ip-172-30-166-186 kubelet[2379]: I0822 16:42:36.650462 2379 kube_docker_client.go:345] Stop pulling image "REDACTED/pause:3.2": "Status: Downloaded newer image for REDACTED/pause:3.2"
Aug 22 16:42:37 ip-172-30-166-186 kubelet[2379]: I0822 16:42:37.922919 2379 kubelet.go:1952] SyncLoop (PLEG): "cluster-overprovisioning-56ccddc5bc-m8wn4_kube-system(914518fb-d282-480d-b021-e61a002b4e9c)", event: &pleg.PodLifecycleEvent{ID:"914518fb-d282-480d-b021-e61a002b4e9c", Type:"ContainerStarted", Data:"3aa25d0d3119c7b7073a5ed23eea0d5bbad8d5c1dc615510d4b0ad0508a56d14"}
Aug 22 16:42:37 ip-172-30-166-186 kubelet[2379]: I0822 16:42:37.930020 2379 kubelet.go:1952] SyncLoop (PLEG): "fluentbit-fdpwt_kube-logs(3ad66c50-3ce5-419c-8685-5b8949991344)", event: &pleg.PodLifecycleEvent{ID:"3ad66c50-3ce5-419c-8685-5b8949991344", Type:"ContainerStarted", Data:"e10fb3610a6049bce22481f787e083eae6e3e36acb86f3f4e4a86e870d463831"}
Aug 22 16:42:38 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:42:38.109Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:42:38 ip-172-30-166-186 kubelet[2379]: W0822 16:42:38.110013 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/3ad66c50-3ce5-419c-8685-5b8949991344/volumes/kubernetes.io~secret/fluentbit-token-d25cp and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:42:38 ip-172-30-166-186 kubelet[2379]: W0822 16:42:38.110305 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/3ad66c50-3ce5-419c-8685-5b8949991344/volumes/kubernetes.io~configmap/fluentbit-config and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:42:38 ip-172-30-166-186 kubelet[2379]: I0822 16:42:38.689641 2379 setters.go:73] Using node IP: "172.30.166.186"
Aug 22 16:42:39 ip-172-30-166-186 kubelet[2379]: W0822 16:42:39.113297 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/3ad66c50-3ce5-419c-8685-5b8949991344/volumes/kubernetes.io~secret/fluentbit-token-d25cp and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:42:39 ip-172-30-166-186 kubelet[2379]: W0822 16:42:39.114164 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/3ad66c50-3ce5-419c-8685-5b8949991344/volumes/kubernetes.io~configmap/fluentbit-config and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:42:43 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:42:43.119Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:42:48 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:42:48.129Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:42:48 ip-172-30-166-186 kubelet[2379]: I0822 16:42:48.694827 2379 setters.go:73] Using node IP: "172.30.166.186"
Aug 22 16:42:53 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:42:53.138Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:42:58 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:42:58.181Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:42:58 ip-172-30-166-186 kubelet[2379]: I0822 16:42:58.698566 2379 setters.go:73] Using node IP: "172.30.166.186"
Aug 22 16:43:00 ip-172-30-166-186 kubelet[2379]: I0822 16:43:00.202296 2379 kubelet.go:1952] SyncLoop (PLEG): "spire-agent-dhtdv_spire(28662c40-f3f9-460b-96ff-d401c95fa729)", event: &pleg.PodLifecycleEvent{ID:"28662c40-f3f9-460b-96ff-d401c95fa729", Type:"ContainerDied", Data:"c6ae074a1a293c03a8bcf4583fc7ab96547793035ab755fcf544e873bc32b71d"}
Aug 22 16:43:00 ip-172-30-166-186 kubelet[2379]: I0822 16:43:00.262341 2379 kube_docker_client.go:345] Stop pulling image "REDACTED/spiffe-io/wait-for-it@sha256:d9bdc931e4404237d2fb0ba84db5ece88b236c40eeca570d786ee54fd243f4ae": "Status: Image is up to date for REDACTED/spiffe-io/wait-for-it@sha256:d9bdc931e4404237d2fb0ba84db5ece88b236c40eeca570d786ee54fd243f4ae"
Aug 22 16:43:00 ip-172-30-166-186 kubelet[2379]: W0822 16:43:00.359464 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~configmap/spire-bundle and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:00 ip-172-30-166-186 kubelet[2379]: W0822 16:43:00.359479 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~configmap/spire-agent-config and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:00 ip-172-30-166-186 kubelet[2379]: W0822 16:43:00.359588 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~secret/spire-agent-token-tgj5p and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:00 ip-172-30-166-186 kubelet[2379]: W0822 16:43:00.359654 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~projected/spire-agent and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:01 ip-172-30-166-186 kubelet[2379]: I0822 16:43:01.206631 2379 kubelet.go:1952] SyncLoop (PLEG): "spire-agent-dhtdv_spire(28662c40-f3f9-460b-96ff-d401c95fa729)", event: &pleg.PodLifecycleEvent{ID:"28662c40-f3f9-460b-96ff-d401c95fa729", Type:"ContainerStarted", Data:"8fc986cb87ee3eb07ddfb9a958c6e78b51c752c091ccba31f100cb38e24c04a5"}
Aug 22 16:43:01 ip-172-30-166-186 kubelet[2379]: W0822 16:43:01.361978 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~projected/spire-agent and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:01 ip-172-30-166-186 kubelet[2379]: W0822 16:43:01.362010 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~configmap/spire-bundle and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:01 ip-172-30-166-186 kubelet[2379]: W0822 16:43:01.362025 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~configmap/spire-agent-config and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:01 ip-172-30-166-186 kubelet[2379]: W0822 16:43:01.362070 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~secret/spire-agent-token-tgj5p and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:03 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:43:03.190Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:43:05 ip-172-30-166-186 kubelet[2379]: I0822 16:43:05.263316 2379 kubelet.go:1952] SyncLoop (PLEG): "kured-2455s_kured(e8901744-5eba-4c50-a823-0a2a161c8cbb)", event: &pleg.PodLifecycleEvent{ID:"e8901744-5eba-4c50-a823-0a2a161c8cbb", Type:"ContainerDied", Data:"e7453255bfae231d6bfb290f603436206f51e2736bcbef13f74b4bc47687686c"}
Aug 22 16:43:06 ip-172-30-166-186 kubelet[2379]: I0822 16:43:06.279378 2379 kubelet.go:1952] SyncLoop (PLEG): "kured-2455s_kured(e8901744-5eba-4c50-a823-0a2a161c8cbb)", event: &pleg.PodLifecycleEvent{ID:"e8901744-5eba-4c50-a823-0a2a161c8cbb", Type:"ContainerStarted", Data:"bff65f4e8d7125e9c57a4f5baefa525c4f3878826f485f813d675d1259217c4d"}
Aug 22 16:43:08 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:43:08.200Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:43:08 ip-172-30-166-186 kubelet[2379]: I0822 16:43:08.710533 2379 setters.go:73] Using node IP: "172.30.166.186"
Aug 22 16:43:13 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:43:13.210Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:43:14 ip-172-30-166-186 kubelet[2379]: I0822 16:43:14.648917 2379 prober.go:116] Liveness probe for "fluentbit-fdpwt_kube-logs(3ad66c50-3ce5-419c-8685-5b8949991344):fluentbit" failed (failure): Get http://172.30.165.255:2020/: dial tcp 172.30.165.255:2020: connect: no route to host
Aug 22 16:43:16 ip-172-30-166-186 kubelet[2379]: W0822 16:43:16.297252 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/ed16d36e-c7bd-4b8a-b698-0a94c68b7477/volumes/kubernetes.io~secret/node-exporter-token-k788f and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:18 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:43:18.220Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:43:18 ip-172-30-166-186 kubelet[2379]: I0822 16:43:18.715397 2379 setters.go:73] Using node IP: "172.30.166.186"
Aug 22 16:43:22 ip-172-30-166-186 kubelet[2379]: I0822 16:43:22.872870 2379 prober.go:116] Liveness probe for "fluentbit-fdpwt_kube-logs(3ad66c50-3ce5-419c-8685-5b8949991344):fluentbit" failed (failure): Get http://172.30.165.255:2020/: dial tcp 172.30.165.255:2020: connect: no route to host
Aug 22 16:43:23 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:43:23.230Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:43:28 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:43:28.240Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:43:28 ip-172-30-166-186 kubelet[2379]: I0822 16:43:28.719027 2379 setters.go:73] Using node IP: "172.30.166.186"
Aug 22 16:43:32 ip-172-30-166-186 kubelet[2379]: I0822 16:43:32.561275 2379 kubelet.go:1952] SyncLoop (PLEG): "spire-agent-dhtdv_spire(28662c40-f3f9-460b-96ff-d401c95fa729)", event: &pleg.PodLifecycleEvent{ID:"28662c40-f3f9-460b-96ff-d401c95fa729", Type:"ContainerDied", Data:"8fc986cb87ee3eb07ddfb9a958c6e78b51c752c091ccba31f100cb38e24c04a5"}
Aug 22 16:43:32 ip-172-30-166-186 kubelet[2379]: E0822 16:43:32.571248 2379 remote_runtime.go:261] RemoveContainer "c6ae074a1a293c03a8bcf4583fc7ab96547793035ab755fcf544e873bc32b71d" from runtime service failed: rpc error: code = Unknown desc = failed to remove container "c6ae074a1a293c03a8bcf4583fc7ab96547793035ab755fcf544e873bc32b71d": Error response from daemon: removal of container c6ae074a1a293c03a8bcf4583fc7ab96547793035ab755fcf544e873bc32b71d is already in progress
Aug 22 16:43:32 ip-172-30-166-186 kubelet[2379]: E0822 16:43:32.571360 2379 kuberuntime_container.go:671] failed to remove pod init container "init": rpc error: code = Unknown desc = failed to remove container "c6ae074a1a293c03a8bcf4583fc7ab96547793035ab755fcf544e873bc32b71d": Error response from daemon: removal of container c6ae074a1a293c03a8bcf4583fc7ab96547793035ab755fcf544e873bc32b71d is already in progress; Skipping pod "spire-agent-dhtdv_spire(28662c40-f3f9-460b-96ff-d401c95fa729)"
Aug 22 16:43:32 ip-172-30-166-186 kubelet[2379]: E0822 16:43:32.571571 2379 pod_workers.go:191] Error syncing pod 28662c40-f3f9-460b-96ff-d401c95fa729 ("spire-agent-dhtdv_spire(28662c40-f3f9-460b-96ff-d401c95fa729)"), skipping: failed to "StartContainer" for "init" with CrashLoopBackOff: "back-off 10s restarting failed container=init pod=spire-agent-dhtdv_spire(28662c40-f3f9-460b-96ff-d401c95fa729)"
Aug 22 16:43:32 ip-172-30-166-186 kubelet[2379]: W0822 16:43:32.733670 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~projected/spire-agent and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:32 ip-172-30-166-186 kubelet[2379]: W0822 16:43:32.733782 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~configmap/spire-agent-config and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:32 ip-172-30-166-186 kubelet[2379]: W0822 16:43:32.733892 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~configmap/spire-bundle and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:32 ip-172-30-166-186 kubelet[2379]: W0822 16:43:32.733953 2379 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/28662c40-f3f9-460b-96ff-d401c95fa729/volumes/kubernetes.io~secret/spire-agent-token-tgj5p and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Aug 22 16:43:33 ip-172-30-166-186 kubelet[2379]: {"level":"info","ts":"2020-08-22T16:43:33.249Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.0 ..."}
Aug 22 16:43:33 ip-172-30-166-186 kubelet[2379]: I0822 16:43:33.880905 2379 prober.go:116] Liveness probe for "fluentbit-fdpwt_kube-logs(3ad66c50-3ce5-419c-8685-5b8949991344):fluentbit" failed (failure): Get http://172.30.165.255:2020/: dial tcp 172.30.165.255:2020: connect: no route to host
Aug 22 16:43:33 ip-172-30-166-186 kubelet[2379]: I0822 16:43:33.880963 2379 kubelet.go:1985] SyncLoop (container unhealthy): "fluentbit-fdpwt_kube-logs(3ad66c50-3ce5-419c-8685-5b8949991344)"
Aug 22 16:43:33 ip-172-30-166-186 kubelet[2379]: I0822 16:43:33.881401 2379 kuberuntime_manager.go:630] Container "fluentbit" ({"docker" "e10fb3610a6049bce22481f787e083eae6e3e36acb86f3f4e4a86e870d463831"}) of pod fluentbit-fdpwt_kube-logs(3ad66c50-3ce5-419c-8685-5b8949991344): Container fluentbit failed liveness probe, will be restarted
Aug 22 16:43:33 ip-172-30-166-186 kubelet[2379]: I0822 16:43:33.881443 2379 kuberuntime_container.go:600] Killing container "docker://e10fb3610a6049bce22481f787e083eae6e3e36acb86f3f4e4a86e870d463831" with 20 second grace period
Docker Version
Client: Docker Engine - Community
Version: 19.03.12
API version: 1.39
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:45:36 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.6
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 481bc77
Built: Sat May 4 01:59:36 2019
OS/Arch: linux/amd64
Experimental: false
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.30.164.1 0.0.0.0 UG 100 0 0 ens5
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.30.164.0 0.0.0.0 255.255.252.0 U 0 0 0 ens5
172.30.164.1 0.0.0.0 255.255.255.255 UH 100 0 0 ens5
$ ifconfig eni51add10f095
eni51add10f095: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
ether 2a:5c:2d:80:46:8a txqueuelen 0 (Ethernet)
RX packets 2 bytes 148 (148.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 20 bytes 5460 (5.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
You can see from the kubelet log that the daemonsets which had pods scheduled on this new node are also in a failing state. Let me know if you need anything further.
Hi @kgtw, we set up some new Ubuntu clusters and tested with v1.7.0 right now. Using the same config as you pasted above, pods came up fine so we could not reproduce the issue. The ipamd logs above also do not show any errors, and eni51add10f095 for example does get the IP 172.30.167.51 from the CNI and the plugin.log shows that it gets set up under /proc/23799/ns/net on ens5. This happened around "2020-08-22T16:47:54.870Z", but the kubelet logs end 4 minutes before that, so it's hard to tell if the pod started up correctly. If the pod failed to start after getting the IP, kubelet will call the CNI with a Delete command, and the IP and routes will be cleaned up again.
If you run ip rule and ip route, does that show the pod IP?
It would help us a lot to debug this if you could provide us with the complete logs from the /opt/cni/bin/aws-cni-support.sh script. (Email to [email protected], or through a support ticket to AWS)
If you apply the v1.6.3 config to the node, does it recover?
Hi @mogren, thanks for the speedy reply :)
Re-apply the 1.6.3 manifests does appear to recover the cluster, although I did terminate most of the nodes to speed it up.
I have attached the debug information in a support ticket, ID 7303365231.
Logs were taken from a newly created node, from which once it registered as ready within k8s I manually deployed a debug pod named shell-node-debug at 2020-08-22T21:42:21.635Z.
This pod has the ip address 172.30.163.152 / eni8e97e305193 associated to it. I did manage to to see the rule but no route for it.
$ ip route
default via 172.30.160.1 dev ens5 proto dhcp src 172.30.163.15 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.30.160.0/22 dev ens5 proto kernel scope link src 172.30.163.15
172.30.160.1 dev ens5 proto dhcp scope link src 172.30.163.15 metric 100
$ ip rule
0: from all lookup local
512: from all to 172.30.163.154 lookup main
512: from all to 172.30.163.121 lookup main
512: from all to 172.30.160.79 lookup main
512: from all to 172.30.161.176 lookup main
512: from all to 172.30.160.103 lookup main
512: from all to 172.30.160.120 lookup main
512: from all to 172.30.161.59 lookup main
512: from all to 172.30.163.152 lookup main
1024: from all fwmark 0x80/0x80 lookup main
32766: from all lookup main
32767: from all lookup default
Hope this helps.
@kgtw I'm guessing you upgraded by applying the yaml file right?
@kgtw Thanks for providing the logs. It seems like the routes get added correctly, but later they get removed by Calico. We are still investigating why this happens.
@SaranBalaji90 - Yes, although we have copied them so we can add additional labels/tolerations etc. I have double checked what we have and all looks correct.
@mogren Yes, I am seeing this as well now. Adding the debug pod shell-node-debug to a newly created node.
172.30.170.74 / eni467ad56930c
ipamd.log
{"level":"info","ts":"2020-08-23T09:16:25.912Z","caller":"rpc/rpc.pb.go:486","msg":"Received AddNetwork for NS /proc/12312/ns/net, Sandbox 2c35bc93533a5a984f5e71d8ec408cdca96e1f13beff1f7e293b3842ba8c74db, ifname eth0"}
{"level":"debug","ts":"2020-08-23T09:16:25.912Z","caller":"rpc/rpc.pb.go:486","msg":"AddNetworkRequest: ClientVersion:\"v1.7.0\" K8S_POD_NAME:\"shell-node-debug\" K8S_POD_NAMESPACE:\"kube-system\" K8S_POD_INFRA_CONTAINER_ID:\"2c35bc93533a5a984f5e71d8ec408cdca96e1f13beff1f7e293b3842ba8c74db\" ContainerID:\"2c35bc93533a5a984f5e71d8ec408cdca96e1f13beff1f7e293b3842ba8c74db\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/proc/12312/ns/net\" "}
{"level":"debug","ts":"2020-08-23T09:16:25.912Z","caller":"ipamd/rpc_handler.go:142","msg":"AssignIPv4Address: IP address pool stats: total: 14, assigned 6"}
{"level":"info","ts":"2020-08-23T09:16:25.912Z","caller":"datastore/data_store.go:499","msg":"AssignPodIPv4Address: Assign IP 172.30.170.74 to sandbox aws-cni/2c35bc93533a5a984f5e71d8ec408cdca96e1f13beff1f7e293b3842ba8c74db/eth0"}
{"level":"debug","ts":"2020-08-23T09:16:25.912Z","caller":"rpc/rpc.pb.go:486","msg":"VPC CIDR 172.30.160.0/20"}
{"level":"info","ts":"2020-08-23T09:16:25.912Z","caller":"rpc/rpc.pb.go:486","msg":"Send AddNetworkReply: IPv4Addr 172.30.170.74, DeviceNumber: 0, err: <nil>"}
plugin.log
{"level":"info","ts":"2020-08-23T09:16:25.912Z","caller":"routed-eni-cni-plugin/cni.go:117","msg":"Received add network response for container 2c35bc93533a5a984f5e71d8ec408cdca96e1f13beff1f7e293b3842ba8c74db interface eth0: Success:true IPv4Addr:\"172.30.170.74\" UseExternalSNAT:true VPCcidrs:\"172.30.160.0/20\" "}
{"level":"debug","ts":"2020-08-23T09:16:25.912Z","caller":"routed-eni-cni-plugin/cni.go:194","msg":"SetupNS: hostVethName=eni467ad56930c, contVethName=eth0, netnsPath=/proc/12312/ns/net, table=0, mtu=9001"}
{"level":"debug","ts":"2020-08-23T09:16:25.988Z","caller":"driver/driver.go:184","msg":"setupVeth network: disabled IPv6 RA and ICMP redirects on eni467ad56930c"}
{"level":"debug","ts":"2020-08-23T09:16:25.989Z","caller":"driver/driver.go:178","msg":"Setup host route outgoing hostVeth, LinkIndex 13"}
{"level":"debug","ts":"2020-08-23T09:16:25.989Z","caller":"driver/driver.go:178","msg":"Successfully set host route to be 172.30.170.74/0"}
{"level":"info","ts":"2020-08-23T09:16:25.990Z","caller":"driver/driver.go:178","msg":"Added toContainer rule for 172.30.170.74/32"}
calico-node
2020-08-23 09:16:25.988 [INFO][58] int_dataplane.go 778: Linux interface addrs changed. addrs=set.mapSet{} ifaceName="eni467ad56930c"
2020-08-23 09:16:25.988 [INFO][58] int_dataplane.go 743: Linux interface state changed. ifIndex=13 ifaceName="eni467ad56930c" state="up"
2020-08-23 09:16:25.989 [INFO][58] int_dataplane.go 1067: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"eni467ad56930c", Addrs:set.mapSet{}}
2020-08-23 09:16:25.989 [INFO][58] hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"eni467ad56930c", Addrs:set.mapSet{}}
2020-08-23 09:16:25.989 [INFO][58] int_dataplane.go 1162: Applying dataplane updates
2020-08-23 09:16:25.989 [INFO][58] route_table.go 556: Syncing routes: found unexpected route; ignoring due to grace period. dest=172.30.171.35/32 ifaceName="eni325bbee54ae" ipVersion=0x4
2020-08-23 09:16:25.989 [INFO][58] route_table.go 364: Interface in cleanup grace period, will retry after. ifaceName="eni325bbee54ae" ipVersion=0x4
2020-08-23 09:16:25.989 [INFO][58] route_table.go 556: Syncing routes: found unexpected route; ignoring due to grace period. dest=172.30.169.5/32 ifaceName="enif7638c71565" ipVersion=0x4
2020-08-23 09:16:25.989 [INFO][58] route_table.go 364: Interface in cleanup grace period, will retry after. ifaceName="enif7638c71565" ipVersion=0x4
2020-08-23 09:16:25.990 [INFO][58] int_dataplane.go 1176: Finished applying updates to dataplane. msecToApply=0.7474839999999999
2020-08-23 09:16:25.990 [INFO][58] int_dataplane.go 1049: Received interface update msg=&intdataplane.ifaceUpdate{Name:"eni467ad56930c", State:"up", Index:13}
2020-08-23 09:16:25.990 [INFO][58] int_dataplane.go 1162: Applying dataplane updates
2020-08-23 09:16:25.990 [INFO][58] endpoint_mgr.go 342: Workload interface came up, marking for reconfiguration. ifaceName="eni467ad56930c"
2020-08-23 09:16:25.990 [INFO][58] endpoint_mgr.go 1043: Applying /proc/sys configuration to interface. ifaceName="eni467ad56930c"
2020-08-23 09:16:25.990 [INFO][58] route_table.go 556: Syncing routes: found unexpected route; ignoring due to grace period. dest=172.30.170.74/32 ifaceName="eni467ad56930c" ipVersion=0x4
2020-08-23 09:16:25.990 [INFO][58] route_table.go 364: Interface in cleanup grace period, will retry after. ifaceName="eni467ad56930c" ipVersion=0x4
2020-08-23 09:16:25.991 [INFO][58] route_table.go 556: Syncing routes: found unexpected route; ignoring due to grace period. dest=172.30.171.35/32 ifaceName="eni325bbee54ae" ipVersion=0x4
2020-08-23 09:16:25.991 [INFO][58] route_table.go 364: Interface in cleanup grace period, will retry after. ifaceName="eni325bbee54ae" ipVersion=0x4
2020-08-23 09:16:25.991 [INFO][58] route_table.go 556: Syncing routes: found unexpected route; ignoring due to grace period. dest=172.30.169.5/32 ifaceName="enif7638c71565" ipVersion=0x4
2020-08-23 09:16:25.991 [INFO][58] route_table.go 364: Interface in cleanup grace period, will retry after. ifaceName="enif7638c71565" ipVersion=0x4
2020-08-23 09:16:25.991 [INFO][58] int_dataplane.go 1176: Finished applying updates to dataplane. msecToApply=1.098275
.
.
.
2020-08-23 09:16:36.695 [INFO][58] table.go 810: Invalidating dataplane cache ipVersion=0x4 reason="post update" table="filter"
2020-08-23 09:16:36.695 [INFO][58] route_table.go 561: Syncing routes: removing old route. dest=172.30.170.74/32 ifaceName="eni467ad56930c" ipVersion=0x4 routeProblems=[]string{"unexpected route"}
2020-08-23 09:16:36.696 [INFO][58] conntrack.go 78: Removing conntrack flows ip=172.30.170.74
2020-08-23 09:16:36.697 [INFO][58] table.go 497: Loading current iptables state and checking it is correct. ipVersion=0x4 table="filter"
2020-08-23 09:16:36.706 [INFO][58] int_dataplane.go 1176: Finished applying updates to dataplane. msecToApply=10.334442
2020-08-23 09:16:36.695 [INFO][58] route_table.go 561: Syncing routes: removing old route. dest=172.30.170.74/32 ifaceName="eni467ad56930c" ipVersion=0x4 routeProblems=[]string{"unexpected route"}
md5-b26384653e0681834ed3cd00a9242a23
2020-08-23 10:31:11.112 [DEBUG][64] felix/update_filter.go 103: Route update route={Ifindex: 37 Dst: 172.30.171.53/32 Src: <nil> Gw: <nil> Flags: [] Table: 254}
2020-08-23 10:31:11.112 [DEBUG][64] felix/update_filter.go 105: Ignoring non-local route. route={Ifindex: 37 Dst: 172.30.171.53/32 Src: <nil> Gw: <nil> Flags: [] Table: 254}
md5-0b84d9a79e845a2b485a6944f3279d11
2020-08-23 10:31:11.215 [INFO][64] felix/route_table.go 856: Syncing routes: found unexpected route; ignoring due to grace period. dest=172.30.171.53/32 ifaceName="eniac706274002" ifaceRegex="^eni.*" ipVersion=0x4
@kgtw Thanks for reporting and debugging this issue. Can we update the title of this ticket to something like "Calico deletes routes when using CNI v1.7.0" to make it easier to find for other people that run into this? We will try to get to the root cause of why Calico can't see the changes with v1.7.0 ASAP.
@kgtw - Thanks again for reporting the issue. #1166 is merged to master so closing this issue now.
Thanks for the speedy fix @jayanthvn - I'll let you know if we still have issues with this once a new release has been made.
Yes please. Thanks :)
PR has all the details, but in short, Calico assumes that the ENI name is generated by using this exact formula:
h := sha1.New()
h.Write([]byte(fmt.Sprintf("%s.%s", namespace, podname)))
@kgtw v1.7.1 is out, please let us know if this fixes the issue for you.
@mogren - Yes, this did fix the issue. Thanks!
@kgtw - Thanks for verifying this :)
Most helpful comment
@mogren - Yes, this did fix the issue. Thanks!