Amazon-vpc-cni-k8s: multiple default routes on reboot

Created on 13 Mar 2019 · 6Comments · Source: aws/amazon-vpc-cni-k8s

CoreOS stable: 2023.5.0
k8s-cni: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.3.2

if one does update the instance manual or with the auto updater, the default routes gets mixed up and end up with multiple default routes with multiple eni devices. what results in a dead instance.

~ : ip route
default via 172.20.96.1 dev eth0 proto dhcp src 172.20.102.154 metric 1024 
default via 172.20.96.1 dev eth1 proto dhcp src 172.20.122.33 metric 1024 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.20.96.0/19 dev eth0 proto kernel scope link src 172.20.102.154 
172.20.96.1 dev eth0 proto dhcp scope link src 172.20.102.154 metric 1024 
172.20.96.1 dev eth1 proto dhcp scope link src 172.20.122.33 metric 1024 
172.20.101.170 dev enida03fe4f0f6 scope link 
172.20.103.147 dev eni15c7619fccc scope link 
172.20.105.172 dev eni87b415c6969 scope link 
172.20.107.244 dev eni8faa09e7e82 scope link 
172.20.110.225 dev enia46a6f60ead scope link 
172.20.111.229 dev eni23617a1478e scope link 
172.20.112.252 dev eni379acfe3525 scope link 
172.20.114.88 dev eniccbdfc1d2eb scope link 
172.20.115.51 dev eni3e86f225b23 scope link 
172.20.117.217 dev enid4905cd279e scope link 
172.20.120.35 dev eni46027850f6e scope link 
172.20.120.180 dev enia6771da0b33 scope link 
172.20.124.73 dev eni0c963f563be scope link 

networkctl status
●        State: routable
       Address: 172.20.102.154 on eth0
                172.17.0.1 on docker0
                172.20.122.33 on eth1
                fe80::475:55ff:fe4b:ef1e on eth0
                fe80::28d4:3eff:feca:b186 on eni3e86f225b23
                fe80::4b1:2fff:fe3b:5ef2 on eth1
                fe80::1829:f7ff:fe1f:f4be on enia6771da0b33
                fe80::1041:55ff:fe7b:44e on eni15c7619fccc
                fe80::24f5:bbff:fef0:2ea9 on eni0c963f563be
                fe80::b8d4:c7ff:feb1:3a3b on enida03fe4f0f6
                fe80::43:c4ff:feba:597 on eni87b415c6969
                fe80::c044:60ff:fe4a:39b0 on eni8faa09e7e82
                fe80::9cc3:50ff:fe1d:f663 on enid4905cd279e
                fe80::ec2a:7ff:fe42:65bd on eni46027850f6e
                fe80::a091:40ff:fe23:f528 on eni379acfe3525
                fe80::b42a:8ff:fe47:df15 on eniccbdfc1d2eb
                fe80::6c9e:f7ff:fe08:bbd6 on enia46a6f60ead
                fe80::24d0:c6ff:fe5e:5e8c on eni23617a1478e
       Gateway: 172.20.96.1 on eth1
                172.20.96.1 on eth0
                172.20.96.1 on eth1
           DNS: 172.20.0.2
Search Domains: eu-west-1.compute.internal

question is, coreOS bug or cni bug?

bug

Source

Deshke

Most helpful comment

We saw the same behavior on CoreOS because it defaults to using DHCP on all interfaces, so when the machine comes up it gets a DHCP response with a default route for each ENI and programs the route tables accordingly.

We instructed networkd to always prefer the eth0 route by using ignition (via userdata) to write a networkd dropin:

[Match]
Name=eth0

[Network]
DHCP=ipv4

[DHCP]
RouteMetric=512

There's some more information available on this CoreOS issue: https://github.com/coreos/bugs/issues/992

sethp-nr on 13 Mar 2019

❤2

All 6 comments

Thanks for reporting, I'll try to take a look at why this happens.

mogren on 13 Mar 2019

We instructed networkd to always prefer the eth0 route by using ignition (via userdata) to write a networkd dropin:

[Match]
Name=eth0

[Network]
DHCP=ipv4

[DHCP]
RouteMetric=512

There's some more information available on this CoreOS issue: https://github.com/coreos/bugs/issues/992

sethp-nr on 13 Mar 2019

❤2

Hi Experts
I am using coreos CoreOS-stable-2135.5.0-hvm (ami-049ed451bb483d4be) and found this issue still exists. Is there a corresponding solution and bug fix plan?

MMichael-S on 5 Jul 2019

@MMichael-S I'm not sure about fixes, etc., but as far as I know preferring the eth0 route still works (see: https://coreos.com/os/docs/latest/network-config-with-networkd.html)

sethp-nr on 8 Jul 2019

@sethp-nr Thank you for your reply.
I have found the workaround in https://github.com/coreos/bugs/issues/992, to decrease the RouteMetric of Eth0.
But I think the workaround will bother customers. :(

MMichael-S on 8 Jul 2019

As people have already mentioned in this issue, the problem comes from the default setup in CoreOS and is tracked in https://github.com/coreos/bugs/issues/992. The fix for now is to configure CoreOS to prefer eth0 for DCHP.

mogren on 11 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings