Cilium: systemd 245 breaks cilium pod to out-of-node traffic

Created on 19 Mar 2020  路  25Comments  路  Source: cilium/cilium

Bug report

General Information
Updating systemd 244.2-2 on Arch to systemd 245.2-1 and 245-3 break pod to out-of-node ipv4 traffic. Reverting to 244.2-2 and rebooting fixes the problem. (ipv6 keeps working on all versions)

I did a sysctl -a diff with 244 vs 245 with cilium running (ready):

< net.ipv4.conf.all.promote_secondaries = 1
> net.ipv4.conf.all.promote_secondaries = 0
< net.ipv4.conf.cilium_host.accept_source_route = 1
> net.ipv4.conf.cilium_host.accept_source_route = 0
< net.ipv4.conf.cilium_host.promote_secondaries = 0
> net.ipv4.conf.cilium_host.promote_secondaries = 1
< net.ipv4.conf.cilium_host.rp_filter = 0
> net.ipv4.conf.cilium_host.rp_filter = 2
< net.ipv4.conf.cilium_net.accept_source_route = 1
> net.ipv4.conf.cilium_net.accept_source_route = 0
< net.ipv4.conf.cilium_net.promote_secondaries = 0
> net.ipv4.conf.cilium_net.promote_secondaries = 1
< net.ipv4.conf.default.accept_source_route = 1
> net.ipv4.conf.default.accept_source_route = 0
< net.ipv4.conf.default.promote_secondaries = 0
> net.ipv4.conf.default.promote_secondaries = 1
< net.ipv4.conf.default.rp_filter = 0
> net.ipv4.conf.default.rp_filter = 2
< net.ipv4.conf.ens192.accept_source_route = 1
> net.ipv4.conf.ens192.accept_source_route = 0
< net.ipv4.conf.ens192.promote_secondaries = 0
> net.ipv4.conf.ens192.promote_secondaries = 1
< net.ipv4.conf.ens192.rp_filter = 0
> net.ipv4.conf.ens192.rp_filter = 2
< net.ipv4.conf.lo.accept_source_route = 1
> net.ipv4.conf.lo.accept_source_route = 0
< net.ipv4.conf.lo.promote_secondaries = 0
> net.ipv4.conf.lo.promote_secondaries = 1
< net.ipv4.conf.lo.rp_filter = 0
> net.ipv4.conf.lo.rp_filter = 2
  • Cilium version (run cilium version)
    1.7.1
  • Kernel version (run uname -a)
    Linux k8s22 5.5.10-arch1-1 #1 SMP PREEMPT Wed, 18 Mar 2020 08:40:35 +0000 x86_64 GNU/Linux
  • Orchestration system version in use (e.g. kubectl version, Mesos, ...)
    Kubernetes 1.17.4
  • Upload a system dump (run curl -sLO https://github.com/cilium/cilium-sysdump/releases/latest/download/cilium-sysdump.zip && python cilium-sysdump.zip and then attach the generated zip file)

cilium-sysdump-20200319-221054.zip

help-wanted kinbug priorithigh

All 25 comments

The breaking change is in /usr/lib/sysctl.d/50-default.conf
https://github.com/systemd/systemd/commit/5d4fc0e665a3639f92ac880896c56f9533441307#diff-7816eed8ca6324f23a690cc5f58e6bf7

a minimal fix for 245 is:

echo 'net.ipv4.conf.lxc*.rp_filter = 0' | sudo tee -a /etc/sysctl.d/90-override.conf && sudo systemctl start systemd-sysctl

there was a systemd bug that we had in the past. Although it is completely unrelated it might help give some help to figure out the underlying issue https://github.com/cilium/cilium/pull/8351

@nberlee Thanks for reporting!

The systemd behavior is very annoying. At least, we should warn users when systemd > 245 is detected.

Add log statement if user is running with systemd version

I have an Ubuntu 20.04 development machine that I use, where I run microk8s.

I've installed Cilium v1.8.1:

$ cilium version
Client: 1.6.8 f534e98df 2020-03-25T13:32:35+01:00 go version go1.12.17 linux/amd64
Daemon: 1.8.1 5ce2bc7b3 2020-07-02T20:04:47+02:00 go version go1.14.4 linux/amd64

It's running systemd 245:

$ systemctl --version
systemd 245 (245.4-4ubuntu3.2)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

Here's the config (I don't recall whether I modified these, but they seem consistent with the case where connectivity was not working for you):

$ grep rp_filter /etc/sysctl.d/*
/etc/sysctl.d/10-network-security.conf:net.ipv4.conf.default.rp_filter=2
/etc/sysctl.d/10-network-security.conf:net.ipv4.conf.all.rp_filter=2
/etc/sysctl.d/99-sysctl.conf:#net.ipv4.conf.default.rp_filter=1
/etc/sysctl.d/99-sysctl.conf:#net.ipv4.conf.all.rp_filter=1

Interestingly it seems to be set to 0, although I've rebooted this machine recently:

$ sysctl net.ipv4.conf.all.rp_filter
net.ipv4.conf.all.rp_filter = 0

I note that parts of Cilium now disable rp_filter:

$ git grep rp_filter pkg/datapath
pkg/datapath/connector/add.go:  return sysctl.Disable(fmt.Sprintf("net.ipv4.conf.%s.rp_filter", ifName))
pkg/datapath/loader/base.go:            {"net.ipv4.conf.all.rp_filter", "0", false},

Deploying the single-node-connectivity YAML I see that external connectivity works:
https://github.com/cilium/cilium/blob/master/examples/kubernetes/connectivity-check/connectivity-check-single-node.yaml

$ k get po | grep external
pod-to-external-1111-75df4847d7-tjxpq                    1/1     Running   1          23m
pod-to-external-fqdn-allow-google-cnp-77d7586f58-9l5z6   1/1     Running   1          23m

So I think this issue is resolved as of the latest Cilium releases?

I will try 1.8.2 tomorrow evening, but I just tested it on 1.7.6 and it is still broken.

Also 1.7.6 seems to have the same interface specific line:

$ grep -ri rp_filter
pkg/endpoint/connector/add.go:  return sysctl.Disable(fmt.Sprintf("net.ipv4.conf.%s.rp_filter", ifName))
pkg/datapath/loader/base.go:        {"net.ipv4.conf.all.rp_filter", "0", false},

my systemd version right now:

$ systemctl --version
systemd 245 (245.6-8-arch)
+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid```

status of rp_filter

$ sysctl -a | grep \\.rp_filter
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.cilium_host.rp_filter = 2
net.ipv4.conf.cilium_net.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.ens192.rp_filter = 2
net.ipv4.conf.lo.rp_filter = 2
net.ipv4.conf.lxc4d36398ebb25.rp_filter = 2
net.ipv4.conf.lxc6aaba34c27a9.rp_filter = 2
net.ipv4.conf.lxc_health.rp_filter = 0
net.ipv4.conf.lxca210cbb192a4.rp_filter = 2
net.ipv4.conf.lxca3dfd01c8fff.rp_filter = 2
net.ipv4.conf.lxcebe3792a5b6a.rp_filter = 2
net.ipv4.conf.lxcece74f373c8f.rp_filter = 2
net.ipv4.conf.lxcfe2f3a81b538.rp_filter = 2

pinging an outside destination ip.

$ kubectl run  --restart=Never -it --rm --image=alpine  test2 ash
If you don't see a command prompt, try pressing enter.
/ # ping 9.9.9.9
PING 9.9.9.9 (9.9.9.9): 56 data bytes
^C
--- 9.9.9.9 ping statistics ---
19 packets transmitted, 0 packets received, 100% packet loss
/ # exit
pod "test2" deleted

(it works fine with my 90-override.conf described in the second post of this issue)

Maybe Ubuntu 20.04 has a different /usr/lib/sysctl.d/50-default.conf?

$ grep rp_filter /usr/lib/sysctl.d/50-default.conf
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.*.rp_filter = 2
-net.ipv4.conf.all.rp_filter

Maybe Ubuntu 20.04 has a different /usr/lib/sysctl.d/50-default.conf

Good spotting. Ubuntu seems to carry a patch to systemd to explicitly remove those lines. I downloaded the tar from the ubuntu packages site and:

$ grep -R rp_filter systemd_245.4-4ubuntu3.2/debian/patches/*
systemd_245.4-4ubuntu3.2/debian/patches/debian/UBUNTU-drop-kernel.-settings-from-sysctl-defaults-shipped.patch:-net.ipv4.conf.default.rp_filter = 2
systemd_245.4-4ubuntu3.2/debian/patches/debian/UBUNTU-drop-kernel.-settings-from-sysctl-defaults-shipped.patch:-net.ipv4.conf.*.rp_filter = 2
systemd_245.4-4ubuntu3.2/debian/patches/debian/UBUNTU-drop-kernel.-settings-from-sysctl-defaults-shipped.patch:--net.ipv4.conf.all.rp_filter

Furthermore at least in my environment networkd doesn't seem to be enabled:

$ networkctl | grep lxc2cd6411
WARNING: systemd-networkd is not running, output will be incomplete.

357 lxc2cd6411832fb ether    n/a         unmanaged

I have the same WARNING: systemd-networkd is not running, output will be incomplete. [...] unmanaged on Arch Linux. It seems that rp_filter is being set by systemd-sysctl service.

@joestringer What does systemctl status systemd-sysctl return on your machine?

@brb it's Active (exited), but per my last post I think the Ubuntu version of systemd-sysctl won't apply rp_filter by default. That seems like it explains the difference in behaviour to me.

per my last post I think the Ubuntu version of systemd-sysctl won't apply rp_filter by default

@joestringer Ah, damn, missed that comment. Yeah, that explains the difference.

Yep, caught the same problem during first cilium installation (found this bug right after).
Just overrided rp_filter settings in /etc/sysctl.d/90-override.conf
Gentoo, Systemd 245.5

Can confirm this issue is still happening on Cilium 1.8.2 and using the latest 2605 flatcar release channel

I've hit the same problem Ubuntu 20.04.

For future googlers on hetzner systems: Check /etc/sysctl.d/99-hetzner.conf, they set net.ipv4.conf.all.rp_filter=1 there.

I am not sure if related but after attempting a 1.8.2 -> 1.8.3 upgrade on Ubuntu 20.04 / 5.4.0-1021-aws, agent pods end up crashing with the following logs. Rolling back to 1.8.2, pods are healthy and do not contain those errors. 馃し

{"error":"Failed to sysctl -w net.ipv4.conf.eth0.rp_filter=2: could not open the sysctl file /proc/sys/net/ipv4/conf/eth0/rp_filter: open /proc/sys/net/ipv4/conf/eth0/rp_filter: no such file or directory","level":"error","msg":"Error while initializing daemon","subsys":"daemon"}
{"error":"Failed to sysctl -w net.ipv4.conf.eth0.rp_filter=2: could not open the sysctl file /proc/sys/net/ipv4/conf/eth0/rp_filter: open /proc/sys/net/ipv4/conf/eth0/rp_filter: no such file or directory","level":"fatal","msg":"Error while creating daemon","subsys":"daemon"}
{"error":"Operation cannot be fulfilled on ciliumnodes.cilium.io \"ip-10-6-11-13.eu-west-1.compute.internal\": the object has been modified; please apply your changes to the latest version and try again","level":"warning","msg":"Unable to update CiliumNode custom resource","subsys":"ipam"}
{"level":"info","msg":"regenerating all endpoints","reason":"one or more identities created or deleted","subsys":"endpoint-manager"}
{"level":"info","msg":"regenerating all endpoints","reason":"one or more identities created or deleted","subsys":"endpoint-manager"}

@mvisonneau Is there an eth0 interface on those nodes? If not, then I think this is a separate bug (regression) in v1.8.3 on Ubuntu in EKS environments. If there is an eth0 then it may still be this issue.

馃憢 @joestringer, indeed my instances are based over the AWS Nitro System which gets me network interfaces with the en[0-9]+ format.


interfaces list

~$ netstat -i
Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
cilium_h  9001        0      0      0 0             0      0      0      0 BMORU
cilium_n  9001        0      0      0 0             0      0      0      0 BMORU
docker0   1500        0      0      0 0             0      0      0      0 BMU
ens5      9001 77873091      0      0 0      114673441      0      0      0 BMRU
ens6      9001 605028990      0     74 0      495146454      0      0      0 BMRU
ens7      9001 64204300      0      0 0      59610683      0      0      0 BMRU
lo       65536 135179116      0      0 0      135179116      0      0      0 LRU
lxc84712  9001  5109046      0      0 0       5763573      0      0      0 BMRU
lxc053a0  9001  1566665      0      0 0       1104132      0      0      0 BMRU
lxc17cb3  9001   280988      0      0 0        251064      0      0      0 BMRU
lxc225b5  9001  6816453      0      0 0       5947230      0      0      0 BMRU
lxc25c93  9001  3330495      0      0 0       3939436      0      0      0 BMRU
lxc2ae26  9001   116270      0      0 0        141742      0      0      0 BMRU
lxc3b24a  9001 205922198      0      0 0      263466611      0      0      0 BMRU
lxc3d661  9001  4308719      0      0 0       6791656      0      0      0 BMRU
lxc45541  9001  2294528      0      0 0       2473290      0      0      0 BMRU
lxc49be6  9001  3975092      0      0 0       2414246      0      0      0 BMRU
lxc5f6ec  9001   148957      0      0 0        148039      0      0      0 BMRU
lxc676a3  9001  1057937      0      0 0        645921      0      0      0 BMRU
lxc67811  9001 236109688      0      0 0      206980719      0      0      0 BMRU
lxc6b001  9001   169636      0      0 0        168709      0      0      0 BMRU
lxc75d86  9001 19918743      0      0 0      16686086      0      0      0 BMRU
lxc7d310  9001 70644814      0      0 0      47580110      0      0      0 BMRU
lxc_heal  9001  1628264      0      0 0       1932101      0      0      0 BMRU
lxcb9ade  9001  2623227      0      0 0       3048972      0      0      0 BMRU
lxcc56bb  9001 15626813      0      0 0      21041815      0      0      0 BMRU
lxccd212  9001   234663      0      0 0        305566      0      0      0 BMRU
lxcd656e  9001   145880      0      0 0        145195      0      0      0 BMRU
lxcd7296  9001  5129261      0      0 0       4645878      0      0      0 BMRU
lxcdecd9  9001  2764864      0      0 0       3648134      0      0      0 BMRU
lxce1a74  9001    32706      0      0 0         32298      0      0      0 BMRU
lxcef306  9001 21973644      0      0 0      21831440      0      0      0 BMRU
lxcf535c  9001 24108700      0      0 0      27435746      0      0      0 BMRU

@mvisonneau OK great, would you mind filing a separate bug for that to help track fixing the regression? The output from your last couple of comments on this thread would be a great start for such a bug.

On Ubuntu hosted by Hetzner i dont have any config added by hetzner self anymore. But had same issue with systemd. So i added sysctrl configuration and got cilium 1.8.3 working.

ubuntu version 20.04.1
kernel 5.8.10-050810-generic
docker 19.3.13
kubernetes 1.19.2
systemd 245 (245.4-4ubuntu3.2)

net.ipv4.conf.lxc*.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0

A good workaround for this is to enable endpoint-routes --enable-endpoint-routes. It enforces symmetric routing. https://github.com/cilium/cilium/pull/13346 is going to fix endpoint-routes in combination with tunneling.

enable-endpoint-routes: "true" with default systemd 2.45 (now 46) rp_filter works great using native routing.

Followup items:

  • Is there context from upstream systemd release around this change?
  • Open issue upstream to discuss impact there

@errordeveloper mentioned he can share some systemd configuration he used to mitigate this issue.

If we want to enable endpoint routes mode by default, we will also need to resolve #13121.

I have encountered this on OpenShift, which uses CoreOS.
I can confirm that following two solutions worked well.

Either write /etc/sysctl.d/99-override_cilium_rp_filter.conf with the following contents:

net.ipv4.conf.lxc*.rp_filter = 0
net.ipv4.conf.cilium_*.rp_filter = 0

Or use enable-endpoint-routes: "true", however if you are using tunnelling mode, you will require either Cilium 1.8.5 (not yet released due to be released soon), or 1.9.0 (also due to be released) (see https://github.com/cilium/cilium/pull/13346).

@joestinger https://github.com/cilium/cilium/issues/10645#issuecomment-601451909

Is there context from upstream systemd release around this change?

https://github.com/systemd/systemd/commit/5d4fc0e665a3639f92ac880896c56f9533441307#diff-7816eed8ca6324f23a690cc5f58e6bf7 whch solved issue https://github.com/systemd/systemd/issues/6282

Using flatcar 2605 and above, the minimal fix: echo 'net.ipv4.conf.lxc*.rp_filter = 0' | sudo tee -a /etc/sysctl.d/90-override.conf && sudo systemctl start systemd-sysctl works. But if the node is rebooted and the pod get's a new IP address on the same node, it stops working.

@jaysiyani that sounds like you need to find the correct way to persist configuration on Flatcar, I can have one very concrete example in the docs that you might want to try.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tklauser picture tklauser  路  19Comments

tgraf picture tgraf  路  19Comments

pchaigno picture pchaigno  路  27Comments

tgraf picture tgraf  路  21Comments

paolodedios picture paolodedios  路  19Comments