Kind: installing calico requires a change to net.ipv4.conf.all.rp_filter

Created on 30 Sep 2019  路  11Comments  路  Source: kubernetes-sigs/kind

What happened:
When deploying calico against kind I used the following kind config:

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
networking:
  disableDefaultCNI: True
nodes:
- role: control-plane
- role: worker
- role: worker
kubeadmConfigPatches:
- |
  apiVersion: kubeadm.k8s.io/v1beta2
  kind: ClusterConfiguration
  metadata:
    name: config
  networking:
    serviceSubnet: "10.96.0.1/12"
    podSubnet: "192.168.0.0/16"

I then apply the latest calico manifest from: https://docs.projectcalico.org/latest/getting-started/kubernetes/installation/calico

This results in a crashlooping calico-node pod on each host with the following presented in the log:

2019-09-30 18:38:28.452 [FATAL][42] int_dataplane.go 1037: Kernel's RPF check is set to 'loose'.  This would allow endpoints to spoof their IP address.  Calico requires net.ipv4.conf.all.rp_filter to be set to 0 or 1. If you require loose RPF and you are not concerned about spoofing, this check can be disabled by setting the IgnoreLooseRPF configuration parameter to 'true'.

This can be worked around by running the following:

kind get nodes --name=kind   | xargs -n1 -I {} docker exec {} sysctl -w net.ipv4.conf.all.rp_filter=0

adjust the --name argument to the name of your cluster or leave it off for the "default" kind cluster.

I then looked into when this value was being set.

In the standard bring up this is the configured value:

docker exec -ti kind-control-plane  sysctl -a | grep all.rp_filter
net.ipv4.conf.all.rp_filter = 2

which it appears is being set by

/etc/sysctl.d/10-network-security.conf:net.ipv4.conf.default.rp_filter=2
/etc/sysctl.d/10-network-security.conf:net.ipv4.conf.all.rp_filter=2

This was in turn changed to 2 with this issue:
https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1814262

in my other findings I found that:
the base image that we use ubuntu:19.04 has it set:

11:27 $ docker run -it ubuntu:19.04 sysctl -a | grep all.rp_filter
net.ipv4.conf.all.rp_filter = 1

and the base-image freshly built:

11:32 $ docker run -it  --tmpfs /tmp --tmpfs /run  --privileged --entrypoint /bin/bash mauilion/base 
root@914dc973cf59:/# sysctl -a | grep rp_filter
net.ipv4.conf.all.rp_filter = 1
root@914dc973cf59:/# exit

altho! the security file is present at this time!.

root@2f587629b72c:/# cat /etc/sysctl.d/10-network-security.conf 

# Turn on Source Address Verification in all interfaces to
# prevent some spoofing attacks.
net.ipv4.conf.default.rp_filter=2
net.ipv4.conf.all.rp_filter=2

I think what's happening is that the sysctl is being honored when we start up the "real" node-image and that is what's causing the problem for calico.

What you expected to happen:
that rp_filter would be set to 0 or 1 as it is set by default in 19.03

How to reproduce it (as minimally and precisely as possible):
This is true of most of the recent base images.

Anything else we need to know?:
In my opinion it's safe to set all.rp_filter to a value of 1 explicitly.

Environment:

  • kind version: (use kind version): 0.5.1
  • Kubernetes version: (use kubectl version):
  • Docker version: (use docker info):
  • OS (e.g. from /etc/os-release):
kinbug

Most helpful comment

write up on this by Alex as well I like his solution as well!

https://twitter.com/alexbrand/status/1178768251024760833?s=20

kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true

All 11 comments

nice catch

write up on this by Alex as well I like his solution as well!

https://twitter.com/alexbrand/status/1178768251024760833?s=20

kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true

/assign

thanks @mauilion !

@mauilion

kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true

It seems to be messing up DNS (I was following you TGIK 075)
And while everything is working fine with kind default cni.
Calico one is giving issues, when I used "kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true"

$ k exec -it nginxd-667bdf4c99-qsbrv -- bash
root@nginxd-667bdf4c99-qsbrv:/# curl google.com
curl: (6) Could not resolve host: google.com
root@nginxd-667bdf4c99-qsbrv:/# nslookup google.com
Server: 10.96.0.10
Address: 10.96.0.10#53

** server can't find google.com: SERVFAIL

root@nginxd-667bdf4c99-qsbrv:/# exit

new images set rp_filter so you won't have to do this anymore

@BenTheElder New images created since this bug was fixed include v1.16.1 and v1.16.2. Is it worth patching the v1.15.3 (or older) images, or is that out of scope for kind?

I'll push new images with https://github.com/kubernetes-sigs/kind/milestone/8 which is primarily blocked on rounding out some stability fixes. I'm back on that now.

@BenTheElder Thank you! I wasn't sure if older images would get fixes like this.

write up on this by Alex as well I like his solution as well!

https://twitter.com/alexbrand/status/1178768251024760833?s=20

kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true

Really thanks Man!!
This solved my issue, been getting crazy for quite some hours.
After I made the:

kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true

All my calico-node pods started working.

Again thanks a lot!

if you update to kind v0.6.1 and use one of the images in the v0.6 release notes the rp_filter settings should be correct already

Was this page helpful?
0 / 5 - 0 ratings