CoreOS stable: 2023.5.0
k8s-cni: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.3.2
if one does update the instance manual or with the auto updater, the default routes gets mixed up and end up with multiple default routes with multiple eni devices. what results in a dead instance.
~ : ip route
default via 172.20.96.1 dev eth0 proto dhcp src 172.20.102.154 metric 1024
default via 172.20.96.1 dev eth1 proto dhcp src 172.20.122.33 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.20.96.0/19 dev eth0 proto kernel scope link src 172.20.102.154
172.20.96.1 dev eth0 proto dhcp scope link src 172.20.102.154 metric 1024
172.20.96.1 dev eth1 proto dhcp scope link src 172.20.122.33 metric 1024
172.20.101.170 dev enida03fe4f0f6 scope link
172.20.103.147 dev eni15c7619fccc scope link
172.20.105.172 dev eni87b415c6969 scope link
172.20.107.244 dev eni8faa09e7e82 scope link
172.20.110.225 dev enia46a6f60ead scope link
172.20.111.229 dev eni23617a1478e scope link
172.20.112.252 dev eni379acfe3525 scope link
172.20.114.88 dev eniccbdfc1d2eb scope link
172.20.115.51 dev eni3e86f225b23 scope link
172.20.117.217 dev enid4905cd279e scope link
172.20.120.35 dev eni46027850f6e scope link
172.20.120.180 dev enia6771da0b33 scope link
172.20.124.73 dev eni0c963f563be scope link
networkctl status
โ State: routable
Address: 172.20.102.154 on eth0
172.17.0.1 on docker0
172.20.122.33 on eth1
fe80::475:55ff:fe4b:ef1e on eth0
fe80::28d4:3eff:feca:b186 on eni3e86f225b23
fe80::4b1:2fff:fe3b:5ef2 on eth1
fe80::1829:f7ff:fe1f:f4be on enia6771da0b33
fe80::1041:55ff:fe7b:44e on eni15c7619fccc
fe80::24f5:bbff:fef0:2ea9 on eni0c963f563be
fe80::b8d4:c7ff:feb1:3a3b on enida03fe4f0f6
fe80::43:c4ff:feba:597 on eni87b415c6969
fe80::c044:60ff:fe4a:39b0 on eni8faa09e7e82
fe80::9cc3:50ff:fe1d:f663 on enid4905cd279e
fe80::ec2a:7ff:fe42:65bd on eni46027850f6e
fe80::a091:40ff:fe23:f528 on eni379acfe3525
fe80::b42a:8ff:fe47:df15 on eniccbdfc1d2eb
fe80::6c9e:f7ff:fe08:bbd6 on enia46a6f60ead
fe80::24d0:c6ff:fe5e:5e8c on eni23617a1478e
Gateway: 172.20.96.1 on eth1
172.20.96.1 on eth0
172.20.96.1 on eth1
DNS: 172.20.0.2
Search Domains: eu-west-1.compute.internal
question is, coreOS bug or cni bug?
Thanks for reporting, I'll try to take a look at why this happens.
We saw the same behavior on CoreOS because it defaults to using DHCP on all interfaces, so when the machine comes up it gets a DHCP response with a default route for each ENI and programs the route tables accordingly.
We instructed networkd to always prefer the eth0 route by using ignition (via userdata) to write a networkd dropin:
[Match]
Name=eth0
[Network]
DHCP=ipv4
[DHCP]
RouteMetric=512
There's some more information available on this CoreOS issue: https://github.com/coreos/bugs/issues/992
Hi Experts
I am using coreos CoreOS-stable-2135.5.0-hvm (ami-049ed451bb483d4be) and found this issue still exists. Is there a corresponding solution and bug fix plan?
@MMichael-S I'm not sure about fixes, etc., but as far as I know preferring the eth0 route still works (see: https://coreos.com/os/docs/latest/network-config-with-networkd.html)
@sethp-nr Thank you for your reply.
I have found the workaround in https://github.com/coreos/bugs/issues/992, to decrease the RouteMetric of Eth0.
But I think the workaround will bother customers. :(
As people have already mentioned in this issue, the problem comes from the default setup in CoreOS and is tracked in https://github.com/coreos/bugs/issues/992. The fix for now is to configure CoreOS to prefer eth0 for DCHP.
Most helpful comment
We saw the same behavior on CoreOS because it defaults to using DHCP on all interfaces, so when the machine comes up it gets a DHCP response with a default route for each ENI and programs the route tables accordingly.
We instructed networkd to always prefer the eth0 route by using ignition (via userdata) to write a networkd dropin:
There's some more information available on this CoreOS issue: https://github.com/coreos/bugs/issues/992