Cross node pod traffic should work, node to pod traffic should work across nodes.
When running flannel with systemd 242+ there seems to be a race condition between flannel programming the mac address of the flannel.1 interface and systemd programming the mac address on the virtual interface. This results in all cross node traffic being dropped at layer 2 on the destination node due to incorrect destination vtep mac.
With systemd 242 the default policy is setup to be MACAddressPolicy=persistent
/usr/lib/systemd/network/99-default.link
[Link]
NamePolicy=keep kernel database onboard slot path
MACAddressPolicy=persistent
When flannel brings up the interface it programs the mac address and systemd then reprograms it again.
In the trace below you will see
clear@clr-02 ~ $ ip addr show flannel.1
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether d6:02:e3:df:ea:7a brd ff:ff:ff:ff:ff:ff
inet 10.244.1.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
But the arp tables on remote nodes are setup with a different mac address d6:02:e3:df:ea:7a
vs 5e:89:db:49:c6:a4
clear@clr-01 ~ $ ip neigh
10.244.1.0 dev flannel.1 lladdr 5e:89:db:49:c6:a4 PERMANENT
Looking at the netlink traces you see the mac address being changed twice, the first time by flannel and the second time to a different address by systemd based on its default policy
clear@clr-02 ~ $ sudo ip monitor all
[NETCONF]inet flannel.1 forwarding on rp_filter off mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off
[NETCONF]inet6 flannel.1 forwarding off proxy_neigh off ignore_routes_with_linkdown off
[LINK]4: flannel.1: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
link/ether 5e:89:db:49:c6:a4 brd ff:ff:ff:ff:ff:ff
[LINK]4: flannel.1: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
link/ether d6:02:e3:df:ea:7a brd ff:ff:ff:ff:ff:ff
[ADDR]4: flannel.1 inet 10.244.1.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
MACAddressPolicy=none
on the flannel* interface on each system which hides the issue, but requires node level changesFlannel and any flannel based network plugins stop working with systemd 242 (Canal).
This will impact other distributions when they upgrade to systemd 242 and beyond.
Here's a quick documentation of the workaround (at least this worked in my lab):
~~~
cat<<'EOF'>/etc/systemd/network/10-flannel.link
[Match]
OriginalName=flannel*
[Link]
MACAddressPolicy=none
EOF
~~~
After this, I rebooted my controllers and workers and flannel's overlay worked.
Looking at the cross-references here, I think more people are stepping on this. Perhaps it would be worth carrying the link unit in this repo, so that it's easier for people to notice and install it.
Just got bit by this issue, spent several hours trying to understand why a single node can't communicate with others. At least until flanneld is killed and then it suddenly works.
Tnx for reporting this issue in detail @mcastelino!
Yeah, many will be bitten and pull hair over this...
Most helpful comment
Here's a quick documentation of the workaround (at least this worked in my lab):
~~~
cat<<'EOF'>/etc/systemd/network/10-flannel.link
[Match]
OriginalName=flannel*
[Link]
MACAddressPolicy=none
EOF
~~~
After this, I rebooted my controllers and workers and flannel's overlay worked.