K3s: Raspbian 10 fresh install has broken routing (iptables/nf_tables detection)

Created on 28 Mar 2020 · 10Comments · Source: k3s-io/k3s

Version:
k3s version v1.17.4+k3s1 (3eee8ac3) on a raspberry pi 4 running Raspbian 10.

K3s arguments:
curl -sfL https://get.k3s.io | sh -

Describe the bug

On a fresh install, no traffic is routed inside the cluster, even for core services. Resolution is to uninstall the default iptables v1.8.2 (nf_tables), and install nftables.

~~On a fresh install, no traffic is routed inside the cluster. nodes cannot reach each other or coredns. Core services can't reach each other or the api. Ports are not opened on the physical host.~~

~~The host is not listening on port 80. Traefik LB reports that it is listening on port 80, but sudo netstat -tlp |grep 80 disagrees. External hosts cannot access created ingresses.~~

To Reproduce
1) Install k3s on Raspbian 10.
2) Run shell in a dnsutils container: kubectl run -it --rm --restart=Never dnsutils --image=gcr.io/kubernetes-e2e-test-images/dnsutils:1.3 sh
3) Inside that container, run wget -O- github.com, or wget -O- kubernetes.default and observe "invalid name" errors. Try pinging any IP you please - the DNS server, external IPs - and observe failures.

Expected behavior
Traffic inside the cluster should be routed.

Actual behavior
No traffic is routed inside the cluster. Services (even kube-system) can't reach each other, nothing can reach the API server, etc.

First symptom I noticed was that ingresses failed to open port 80, and services couldn't reach their pods.

Additional context / logs

Fresh uninstall/reinstall on a raspbian host with IP 192.168.1.41:

$ sudo kubectl get all -n kube-system
NAME                                          READY   STATUS      RESTARTS   AGE
pod/metrics-server-6d684c7b5-w7swt            1/1     Running     0          25m
pod/coredns-6c6bb68b64-tqs4d                  1/1     Running     0          25m
pod/helm-install-traefik-pvsvx                0/1     Completed   0          25m
pod/svclb-traefik-pr5cn                       2/2     Running     0          22m
pod/traefik-7b8b884c8-826lt                   1/1     Running     0          22m
pod/local-path-provisioner-58fb86bdfd-cdjwd   1/1     Running     2          25m

NAME                         TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
service/kube-dns             ClusterIP      10.43.0.10      <none>         53/UDP,53/TCP,9153/TCP       25m
service/metrics-server       ClusterIP      10.43.106.186   <none>         443/TCP                      25m
service/traefik-prometheus   ClusterIP      10.43.118.104   <none>         9100/TCP                     22m
service/traefik              LoadBalancer   10.43.138.141   192.168.1.41   80:30192/TCP,443:31737/TCP   22m

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/svclb-traefik   1         1         1       1            1           <none>          22m

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/metrics-server           1/1     1            1           25m
deployment.apps/coredns                  1/1     1            1           25m
deployment.apps/traefik                  1/1     1            1           22m
deployment.apps/local-path-provisioner   1/1     1            1           25m

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/metrics-server-6d684c7b5            1         1         1       25m
replicaset.apps/coredns-6c6bb68b64                  1         1         1       25m
replicaset.apps/traefik-7b8b884c8                   1         1         1       22m
replicaset.apps/local-path-provisioner-58fb86bdfd   1         1         1       25m

NAME                             COMPLETIONS   DURATION   AGE
job.batch/helm-install-traefik   1/1           2m24s      25m

$ sudo kubectl describe service traefik -n kube-system
Name:                     traefik
Namespace:                kube-system
Labels:                   app=traefik
                          chart=traefik-1.81.0
                          heritage=Helm
                          release=traefik
Annotations:              <none>
Selector:                 app=traefik,release=traefik
Type:                     LoadBalancer
IP:                       10.43.138.141
LoadBalancer Ingress:     192.168.1.41
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  30192/TCP
Endpoints:                10.42.0.6:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  31737/TCP
Endpoints:                10.42.0.6:443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>



md5-4331f9ee2c9139738558f6b6551c8d0a



$ sudo kubectl describe ingress nginx
Name:             nginx
Namespace:        default
Address:          192.168.1.41
Default backend:  default-http-backend:80 (<none>)
Rules:
  Host                Path  Backends
  ----                ----  --------
  nginx.cluster.vert
                      /   nginx:80 (<none>)
Annotations:
Events:  <none>



md5-4331f9ee2c9139738558f6b6551c8d0a



$ sudo netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:8125          0.0.0.0:*               LISTEN      461/netdata
tcp        0      0 0.0.0.0:19999           0.0.0.0:*               LISTEN      461/netdata
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      2754/k3s
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      2754/k3s
tcp        0      0 127.0.0.1:6444          0.0.0.0:*               LISTEN      2754/k3s
tcp        0      0 127.0.0.1:10256         0.0.0.0:*               LISTEN      2754/k3s
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      523/sshd
tcp        0      0 127.0.0.1:10010         0.0.0.0:*               LISTEN      2821/containerd
tcp6       0      0 ::1:8125                :::*                    LISTEN      461/netdata
tcp6       0      0 :::19999                :::*                    LISTEN      461/netdata
tcp6       0      0 :::10250                :::*                    LISTEN      2754/k3s
tcp6       0      0 :::10251                :::*                    LISTEN      2754/k3s
tcp6       0      0 :::6443                 :::*                    LISTEN      2754/k3s
tcp6       0      0 :::10252                :::*                    LISTEN      2754/k3s
tcp6       0      0 :::30192                :::*                    LISTEN      2754/k3s
tcp6       0      0 :::22                   :::*                    LISTEN      523/sshd
tcp6       0      0 :::31737                :::*                    LISTEN      2754/k3s

This all started with a power loss/reboot of my working pi cluster, after an apt update.

See my eventual resolution. Seems to me that it was applying rules to both nftables and iptables-legacy, and there was some conflict.

UPDATE: changed focus now that I know routing is completely borked.
UPDATE 2: rewrite title/description after I discovered/resolved the problem. Left open because I believe it will affect other Raspbian 10 users and could probably use a PR to improve iptables vs nftables behavior.

Source

ohthehugemanatee

👍2

Most helpful comment

Got it! W00tarz!

So here's the problem, for future frustrated folk:

Raspbian 10 comes with an iptables wrapper around nf_tables in the kernel. So the command iptables exists, but only as a simlink to iptables_nft. It returns version string iptables v1.8.2 (nf_tables) which seems like it should be correctly handled in check-config.sh. Still, I found firewall entries both in iptables -L (ie nf_tables) and iptables-legacy -L.

The fix was to remove the iptables wrapper and explicitly install nftables:
sudo apt remove iptables -y && sudo apt install nftables

I then reinstalled with a reboot for good measure. And hey presto, everything works!

I'm leaving this issue open and re-titling/describing, because I believe this should be common to all recently updated Raspbian 10 installs, and it probably indicates something to be improved in the installer.

ohthehugemanatee on 30 Mar 2020

👍9

All 10 comments

More debugging information:

I tried launching a dnsutils pod inside the cluster. It can't resolve anything. Even kubernetes.default times out trying to reach internal DNS.

Coredns logs are filled with entries like this:

E0328 22:36:15.894784       1 reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Failed to list *v1.Endpoints: Get https://10.43.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"

Same for v1.Namespace and v1.Service. The logs start with several lines of:
[ERROR] plugin/errors: 2 1889223156.156930755. HINFO: read udp 10.42.0.4:58686->192.168.1.40:53: i/o timeout
where 192.168.1.40:53 is my local DNS server, accessible from the host machine.

And I guess because kube-dns isn't ready, it doesn't have any endpoints, even after 8 hours. :|

Logs for metrics-server are filled with:
E0328 22:56:23.510224 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:cluster1: unable to fetch metrics from Kubelet cluster1 (cluster1): Get https://cluster1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: i/o timeout
That URL is perfectly accessible from outside the cluster (though unauthorized of course).

So I'm left with some kind of problem with routing. Traffic originating inside the cluster seems to be getting nowhere. Services can't talk to each other, or to the outside world. The only IP my dnsutils pod can ping is that of the host machine, 192.168.1.41.

I uninstalled, rebooted, and reinstalled once again, and the problem persists. So clearly there is some possible system state that causes this on a fresh install. I just don't know what. :(

ohthehugemanatee on 29 Mar 2020

@ohthehugemanatee Have you checked if u have a firewall running?

I have another issue where the install isn't working, and I've found the local linux firewall (firewalld or UFW) is the issue.

jfmatth on 29 Mar 2020

@jfmatth can you give some more detail? I would expect the k3s installer to validate that...

Anyway no, there's no firewall running. Just a default raspbian install. In fact if I was only trying to solve my own problem I would wipe/reinstall raspbian... but now I'm walking my way through routing in k3s in case this problem hits someone else...

ohthehugemanatee on 30 Mar 2020

Sorry @ohthehugemanatee I'm affraid I don't. There is issue #1543 that some of us are seeing, and I noticed that the same install at home didn't behave the same on Linode. The main difference was the firewall.

You can see my notes there, but basically, any firewall running before install seems to keep both .13 and .14 from working inside the cluster.

Maybe on Raspbian check the iptables --list just to be sure?

jfmatth on 30 Mar 2020

it's true! After uninstall, iptables -L and iptables-legacy -L both showed residual rules hanging around. BUT remember, I had rebooted in between uninstall and reinstall... in any case I did an uninstall, iptables -F; iptables-legacy -F, then reinstall. Same problem.

Metrics server logs are flooded with entries like this:

E0330 19:53:51.599412       1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:cluster1: unable to fetch metrics from Kubelet cluster1 (cluster1): Get https://cluster1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: i/o timeout

On the host machine I can wget that address and get an immediate response (and certificate failure since I don't have the cluster's CA in my chain)

I notice that I have iptables 1.8.2 nf_tables... and that _used_ to be a problem.. but that's solved, right? I'm looking at kubernetes/kubernetes#82966 . The fix got into kubernetes 1.17, and I'm running k3s v1.17.4+k3s1 (3eee8ac3).

ohthehugemanatee on 30 Mar 2020

Got it! W00tarz!

So here's the problem, for future frustrated folk:

The fix was to remove the iptables wrapper and explicitly install nftables:
sudo apt remove iptables -y && sudo apt install nftables

I then reinstalled with a reboot for good measure. And hey presto, everything works!

ohthehugemanatee on 30 Mar 2020

👍9

I encountered similar issues. seems like the kubernetes.default / 10.43.0.1 route is broken after reboot / further deployment. My temp workaround:

pi@raspberrypi:~$ export eth0_IP=`xxxxx` # set IP of your node
pi@raspberrypi:~$ sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
pi@raspberrypi:~$ sudo iptables -t nat -I PREROUTING -d 10.43.0.1/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination $eth0_IP:6443
pi@raspberrypi:~$ sudo iptables -t nat -I OUTPUT -d 10.43.0.1/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination $eth0_IP:6443

dictcp on 12 Apr 2020

👍1

some ppl report similar issue after install docker
https://github.com/rancher/k3s/issues/703

for my own case, it turns out some related IPv6. (unrelated to iptables, seems)
After I disable IPv6 via sysctl. everything works properly. Possibly I will create another separated issue.

dictcp on 13 Apr 2020

ohthehugemanatee you are amazing - I burned up a couple days trying to figure out why this wasn't working on my Pi Bramble. Your solution worked a charm.

rudderfeet on 12 May 2020

Just did a clean install of raspbian on two Rpi4s. update-alternatives method didn't work for me, and I'm reluctant to disable IPv6. @dictcp temp fix on the master node (where the failing containers were) worked for now. Not sure if it'll flush after a reboot and I'll need to add it to a script and cron on reboot for now or not.

Details:

# OS distrib: 
nate-mbp17:~ ls ~/Downloads/2020-05-27-raspios-buster-lite-armhf.zip
/Users/xnutsive/Downloads/2020-05-27-raspios-buster-lite-armhf.zip

# Steps I did: 
sudo apt-get update && apt-get upgrade
sudo apt-get install vim fish tmux git

# Installation
# Used k3s-ansible to setup. 

# Problem 

nate-mbp17:~ kubectl get pods -A -o wide
NAMESPACE     NAME                                     READY   STATUS             RESTARTS   AGE     IP           NODE   NOMINATED NODE   READINESS GATES
kube-system   helm-install-traefik-nspr7               0/1     Completed          0          8m57s   10.42.0.3    rpi2   <none>           <none>
kube-system   svclb-traefik-vh9mz                      2/2     Running            2          7m41s   10.42.1.4    rpi3   <none>           <none>
kube-system   coredns-8655855d6-q79dn                  0/1     Running            1          8m56s   10.42.0.7    rpi2   <none>           <none>
kube-system   traefik-758cd5fc85-9jx77                 1/1     Running            1          7m41s   10.42.1.5    rpi3   <none>           <none>
kube-system   svclb-traefik-wrflb                      2/2     Running            2          7m41s   10.42.0.10   rpi2   <none>           <none>
kube-system   metrics-server-7566d596c8-fb9t5          0/1     CrashLoopBackOff   4          8m56s   10.42.0.9    rpi2   <none>           <none>
kube-system   local-path-provisioner-6d59f47c7-5stjg   0/1     CrashLoopBackOff   5          8m56s   10.42.0.8    rpi2   <none>           <none>

# After applying the iptables tunnel preroute / ouput hack
nate-mbp17:~ kubectl get pods -A -o wide
NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE   IP           NODE   NOMINATED NODE   READINESS GATES
kube-system   helm-install-traefik-nspr7               0/1     Completed   0          22m   10.42.0.3    rpi2   <none>           <none>
kube-system   svclb-traefik-wrflb                      2/2     Running     4          20m   10.42.0.13   rpi2   <none>           <none>
kube-system   svclb-traefik-vh9mz                      2/2     Running     4          20m   10.42.1.6    rpi3   <none>           <none>
kube-system   traefik-758cd5fc85-9jx77                 1/1     Running     2          20m   10.42.1.7    rpi3   <none>           <none>
kube-system   coredns-8655855d6-q79dn                  1/1     Running     2          22m   10.42.0.12   rpi2   <none>           <none>
kube-system   local-path-provisioner-6d59f47c7-5stjg   1/1     Running     11         22m   10.42.0.14   rpi2   <none>           <none>
kube-system   metrics-server-7566d596c8-fb9t5          1/1     Running     11         22m   10.42.0.11   rpi2   <none>           <none>