The test which caught the bug is doing the following:
The problem is that the BPF-masq doesn't check whether a packet (podIP -> outside) is a reply, and does SNAT regardless. One possible fix is to consult CT to see whether a packet is a reply.
@brb Any idea on why this only occurs on net-next kernels?
@christarazi This is because we don't enable the BPF-masq feature on other CI builds (except for 4.19) due to the kernel version constraint.
Had another thought that fixing this by adding the CT lookup would introduce the perf penalty for each packet from a local endpoint to outside. To make the test to work with BPF-based MASQ, we could add the outside IP addr to the ip-masq-agent.