When pods are getting an IP associated to the secondary, tertiary or quaternary ENI of an instance cluster internal communication is not working, e.g.g pods can not communicate with kube-dns, Kubernetes service in default namespace or any other cluster-ip service. Pods on the primary ENI have no problems to talk to those internal services. All pods on all ENIs can talk to the "internet". This is both with kops default jessie/stretch amis as well as with the latest Amazon Linux 2 ami.
All pods on all ENIs can talk to the cluster internal services.
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: cluster.team.example.com
name: c5-eu-west-1
spec:
kubelet:
maxPods: 29
additionalSecurityGroups:
- sg-0332feaa999999999
image: amazon.com/amzn2-ami-hvm-2.0.20190115-x86_64-gp2
machineType: c5.large
maxSize: 3
minSize: 3
nodeLabels:
kops.k8s.io/instancegroup: c5-eu-west-1
team.example.com/ec2-class: c5
team.example.com/instance-class: c5class
role: Node
taints:
- dedicated=compute:NoSchedule
subnets:
- eu-west-1a
- eu-west-1b
- eu-west-1c
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: c5class
namespace: monitoring
labels:
k8s-app: c5class
spec:
replicas: 100
selector:
matchLabels:
k8s-app: c5class
template:
metadata:
labels:
k8s-app: c5class
spec:
terminationGracePeriodSeconds: 30
containers:
- name: c5class
image: "jcsorvasi/alpine-bash-curl-jq:2"
command: ["/bin/bash"]
args: [
"-c",
"while true; do curl -s -k https://kubernetes.default | jq -c .; sleep 5; done"
]
env:
- name: INCREMENT_ME_TO_DEPLOY
value: "1"
tolerations:
- operator: Exists
nodeSelector:
team.example.com/instance-class: c5class
#!/usr/bin/env bash
csv_header="instance_class,instance_type,instance_id,k8s_nodename,pod_name,pod_ip,eni_id,eni_primary,eni_device_number,curl_exit_code,is_working"
echo $csv_header
#classes="c4class c5class i3class m4class m5class r4class r5class"
classes="c5class"
[[ ! -z $1 ]] && classes=$1
for class in $(echo $classes); do
nodes=$(kubectl get nodes --no-headers -l team.example.com/instance-class=$class -o custom-columns=NAME:metadata.name | tr "\n" " ")
for node in $nodes; do
node_info_tuple=$(kubectl get node $node -o json | jq -r ". | \"\(.metadata.labels[\"beta.kubernetes.io/instance-type\"]),\(.spec.externalID)\"")
pods=$(kubectl get pod --field-selector=spec.nodeName=$node -o custom-columns=NAME:metadata.name | grep $class | tr "\n" " ")
eni_info=$(dsh $node "curl -s localhost:61678/v1/enis")
for pod in $pods; do
pod_ip=$(kubectl get pod $pod -o jsonpath='{.status.podIP}')
eni_tuple=$(echo $eni_info | jq -r ".ENIIPPools | to_entries[].value | select(.IPv4Addresses | keys[] == \"$pod_ip\") | \"\(.ID),\(.IsPrimary),\(.DeviceNumber)\"")
curl_exit_code=$(kubectl exec -it $pod -- bash -c "curl -o /dev/null -s -k https://kubernetes.default" 2>/dev/null; echo $?)
is_working="false"
[[ $curl_exit_code -eq 0 ]] && is_working="true"
echo "$class,$node_info_tuple,$node,$pod,$pod_ip,$eni_tuple,$curl_exit_code,$is_working"
done
done
done
The result of above experiment is a csv file
`
instance_class,instance_type,instance_id,k8s_nodename,pod_name,pod_ip,eni_id,eni_primary,eni_device_number,curl_exit_code,is_working
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-2w2fp,10.105.20.63,eni-0e06652ea4e14866f,false,3,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-4mffp,10.105.20.127,eni-00d8277f0328e853b,false,2,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-4mmjn,10.105.21.171,eni-05e5adb0626972cdd,true,0,0,true
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-56z4w,10.105.23.207,eni-00d8277f0328e853b,false,2,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-64ptl,10.105.23.66,eni-0e06652ea4e14866f,false,3,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-7ds2h,10.105.20.76,eni-0e06652ea4e14866f,false,3,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-8hx79,10.105.23.139,eni-00d8277f0328e853b,false,2,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-9449w,10.105.16.138,eni-00d8277f0328e853b,false,2,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-9jgqv,10.105.22.31,eni-05e5adb0626972cdd,true,0,0,true
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-gbl8q,10.105.20.179,eni-05e5adb0626972cdd,true,0,0,true
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-hqkrp,10.105.17.156,eni-05e5adb0626972cdd,true,0,0,true
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-ldm4q,10.105.18.6,eni-0e06652ea4e14866f,false,3,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-n8df8,10.105.19.14,eni-0e06652ea4e14866f,false,3,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-njnfc,10.105.20.18,eni-0e06652ea4e14866f,false,3,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-nr4dz,10.105.22.133,eni-00d8277f0328e853b,false,2,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-p6w4p,10.105.17.44,eni-00d8277f0328e853b,false,2,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-qsfd6,10.105.22.181,eni-00d8277f0328e853b,false,2,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-rphrw,10.105.22.185,eni-05e5adb0626972cdd,true,0,0,true
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-rz2bk,10.105.19.28,eni-0e06652ea4e14866f,false,3,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-sbpnx,10.105.20.233,eni-0e06652ea4e14866f,false,3,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-sdph8,10.105.16.178,eni-0e06652ea4e14866f,false,3,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-slsdr,10.105.21.126,eni-05e5adb0626972cdd,true,0,0,true
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-xx8l9,10.105.22.3,eni-00d8277f0328e853b,false,2,6,false
c5class,c5.large,i-0ce2badef326ca01b,ip-10-105-18-196.eu-west-1.compute.internal,c5class-5658859ddc-zfrrf,10.105.20.241,eni-00d8277f0328e853b,false,2,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-26kwg,10.105.55.84,eni-0c6564c7e0caaf635,false,2,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-2chjc,10.105.48.164,eni-0c6564c7e0caaf635,false,2,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-5ntzt,10.105.52.94,eni-07baa21c687bc69de,false,3,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-5qbmc,10.105.53.200,eni-0915ab8359b5037ac,true,0,0,true
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-6wnrc,10.105.53.96,eni-0c6564c7e0caaf635,false,2,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-8h2hl,10.105.54.187,eni-07baa21c687bc69de,false,3,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-9gg5v,10.105.51.101,eni-07baa21c687bc69de,false,3,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-bjp54,10.105.55.201,eni-07baa21c687bc69de,false,3,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-d9z2p,10.105.49.138,eni-0915ab8359b5037ac,true,0,0,true
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-dv7t2,10.105.52.192,eni-0915ab8359b5037ac,true,0,0,true
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-fcbft,10.105.51.208,eni-0c6564c7e0caaf635,false,2,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-hg68b,10.105.52.78,eni-0c6564c7e0caaf635,false,2,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-jjlxh,10.105.54.185,eni-07baa21c687bc69de,false,3,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-k645j,10.105.53.148,eni-07baa21c687bc69de,false,3,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-kks67,10.105.50.231,eni-0915ab8359b5037ac,true,0,0,true
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-lzl8t,10.105.53.243,eni-07baa21c687bc69de,false,3,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-pglsq,10.105.55.124,eni-0915ab8359b5037ac,true,0,0,true
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-qwpbn,10.105.55.24,eni-0915ab8359b5037ac,true,0,0,true
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-r6n24,10.105.51.37,eni-0915ab8359b5037ac,true,0,0,true
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-sb6q9,10.105.49.120,eni-07baa21c687bc69de,false,3,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-stxnq,10.105.55.2,eni-0915ab8359b5037ac,true,0,0,true
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-vxsk5,10.105.53.114,eni-0915ab8359b5037ac,true,0,0,true
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-wpvt8,10.105.48.55,eni-07baa21c687bc69de,false,3,6,false
c5class,c5.large,i-0e42a1c4f002a7fb0,ip-10-105-54-169.eu-west-1.compute.internal,c5class-5658859ddc-z96cp,10.105.55.187,eni-0c6564c7e0caaf635,false,2,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-677lx,10.105.86.11,eni-085ff55486913475e,false,2,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-7hhbh,10.105.85.185,eni-010c2993f7b0b78cf,true,0,0,true
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-7wmjl,10.105.86.215,eni-010c2993f7b0b78cf,true,0,0,true
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-cwcpq,10.105.84.138,eni-070920a243e88ec75,false,3,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-czjn4,10.105.87.167,eni-085ff55486913475e,false,2,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-fxbfc,10.105.87.45,eni-085ff55486913475e,false,2,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-h7nhl,10.105.84.84,eni-085ff55486913475e,false,2,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-hgf8z,10.105.84.80,eni-070920a243e88ec75,false,3,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-j29c8,10.105.80.82,eni-070920a243e88ec75,false,3,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-k5v7s,10.105.85.91,eni-070920a243e88ec75,false,3,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-knjgm,10.105.87.63,eni-070920a243e88ec75,false,3,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-krf52,10.105.85.6,eni-070920a243e88ec75,false,3,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-l2dq4,10.105.86.166,eni-085ff55486913475e,false,2,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-mphz6,10.105.84.61,eni-085ff55486913475e,false,2,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-nggb7,10.105.87.174,eni-010c2993f7b0b78cf,true,0,0,true
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-nns8v,10.105.86.95,eni-070920a243e88ec75,false,3,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-plcq2,10.105.82.88,eni-070920a243e88ec75,false,3,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-qfn5q,10.105.82.197,eni-010c2993f7b0b78cf,true,0,0,true
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-s2w6t,10.105.82.55,eni-010c2993f7b0b78cf,true,0,0,true
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-wd6wv,10.105.83.73,eni-070920a243e88ec75,false,3,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-wrdd2,10.105.81.39,eni-085ff55486913475e,false,2,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-xzj88,10.105.82.66,eni-085ff55486913475e,false,2,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-zgdrv,10.105.80.229,eni-085ff55486913475e,false,2,6,false
c5class,c5.large,i-0c323533c0368bdf9,ip-10-105-80-231.eu-west-1.compute.internal,c5class-5658859ddc-zpmj2,10.105.86.94,eni-010c2993f7b0b78cf,true,0,0,true
The last column (is_working) indication if the pod can communicate to a cluster internal service. And column 8 indicating the type of ENI (primary or non-primary)
It is visible in this example that all the non-working pods are associated to a non-primary ENI, while the working ones are on the primary.
We can provide results for the other instance classes as well, that show that pods on the non-primary ENIs fail.
This is both the the Kops default debian based AMIs as well as with the Amazon Linux 2 AMI.
This sounds a lot like #263, is Calico being used for network policies?
One thing to try is to disable the source/destination check to prove if the issue is related to packets exiting a different adapter to the one they came in on.
AWS support wanted me to add the following information to the ticket:
- Was the issue that you were facing directly as soon as you updated to 1.3.0 or did this issue uncovered later?
The issue was uncovered on a cluster that is runnig aws cni 1.3.0 as we wanted to add a new instance type (c5 )to the cluster.
- If this was an intermittent or continuous issue?
Continuous issue. That is also reproducible each time on new instances.
- Was also an issue with the 1.2.1 plugin?
- Were you able to downgrade the plugin and see if you still faced the issue?
I have now tested with version 1.2.1 (downgraded the cluster to it). And the issue also exists in 1.2.1.
- Can you also confirm if you are facing this after upgrading to the latest CNI 1.3.2
The issue also exists in 1.3.2. Though to test I had to create my own images. I don't see any publicly available images for 1.3.2.
Out of curiosity, I have also tried the current master branch of the aws cni plugin. And there it seems to work. That is m4/m5/r5/c4/c5/i3 instances don't have the communication problem. And they seem to work as the r4 instances.
@nickdgriffin No, we don't use Calico.
As for the "_to disable the source/destination check_", do you mean https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#change_source_dest_check or do you refer to the srcdst app (which we have removed). I can give it a try to disable the check on the ENIs of the c5 instances. Why would this be different and not matter on the working r4 instances (the check is enabled there).
As for further tests, I have compiled and tested with the current master branch. A quick test showed that the prev broken instances are working with master. I need to verify this.
If this is the case, it would be nice to understand which commit did this and why it is needed for the cX, mX and r5 instances. As the r4 instances are working with prev versions of the plugin. Thanks.
I do mean that, yes. The issue I am referring to comes about when a pod is allocated an IP on a secondary adapter, so you can check if that is the case across your various tests - it might be that your r4 test only had pods being allocated IPs from the primary interface, which is why it worked.
If you still have problems with pods that are allocated IPs from the primary adapter then it cannot be the issue I am referring to.
For the tests we saturated the instance with as many pods as the total number of IPs the instance supported. To make sure that we have pods on all ENIs.
I have now rerun the tests we did to verify internal cluster communication of the pods assigned to the non-primary ENIs on r5/c4/c5/m4/m5/i3. This time with the aws cni based of master (commit 4a51ed41abf). No errors now. All the pods on the non-primary ENIs can talk to kubernetes.default (as well as resolv it). This did clearly not work with version 1.2.1, 1.3.0 and 1.3.2.
I wonder if commits 6be0029bf2ee and/or 96a86f58e64 are fixing the issue we have seen? I also wonder if we are the only ones seeing this? When can we expect a new release of the aws cni?
Any news on this?
Hi, sorry for the late reply.
We will investigate this issue, and if the current master works, that's great. I've created a 1.4 Milestone and we will start working on that soon.
@mogren We saw there is a 1.4.0-rc1 candidate out there. We have tested it and it seems that it solves this issue. Wondering why this issue is not in the milestone for 1.4 and/or why it was not addressed with any update here?
Hey @recollir, thanks a lot for testing this. I didn't want to include this issue since I was not sure the changes solved your problems. I'll try to get this release out as soon as possible.
v1.4.0 has been released! Has this issue been resolved with v1.4.0? @recollir
v1.4.0 has been released! Has this issue been resolved with v1.4.0? @recollir
Currently testing it.
v1.4.0 has been released! Has this issue been resolved with v1.4.0? @recollir
I have now tested the 1.4.0 version with c5, r4, r5, m5 and i3 instances - for all those: large, 2xlarge and 4xlarge.
It seems that the issue is now resolved. We still don't know why though. It would be nice to get an explanation of this. @Jeffwan
Hi @recollir,
The PR you commented a while ago, https://github.com/aws/amazon-vpc-cni-k8s/pull/346, is the most probably cause for this that I can think of. https://github.com/aws/amazon-vpc-cni-k8s/pull/305 might affect the Debian images, but should not cause issues for the AL ones.
Aside from that, since you're not using Calico, not that many changes were made to the CNI in regard to setting up routes on secondary ENIs before 4a51ed4.
To figure out what happened to those ENIs I'd have to take another look at the logs for the v1.3.0 that failed.
Looking at the iptables files you provided, there are a lot of rules that are set up by some kubernetes firewall script or something, and there are a lot of old rules and stuff in the cluster from a few days earlier. Are you sure nothing has changed with that script? Also, did you use any other tool like ec2config-cli that can modify the routes or iptables?
Most helpful comment
Hi, sorry for the late reply.
We will investigate this issue, and if the current master works, that's great. I've created a 1.4 Milestone and we will start working on that soon.