K3s: wss dial leads to 'too many colons in address' with ipv6

Created on 12 Dec 2019 · 14Comments · Source: k3s-io/k3s

Version:
k3s version v1.0.0 (18bd921c)

Describe the bug

After installing k3s everything was working perfectly until I rebooted the whole cluster, now all nodes are in NotReady state and I can't find a reason why it's happening

To Reproduce
After getting 2 raspberry pi 4 with 4 gb of ram and 32g sd cards...

First I disabled swap on two machines
Added cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory to /boot/cmdline.txt
Installed k3s on master and on worker1 (on worker with proper params)

Expected behavior
I expected nodes to be healthy

Actual behavior
Nodes are not healthy

Additional context

k3s check-config:

Verifying binaries in /var/lib/rancher/k3s/data/93417efda3f1bfb0977d22d68559e0cccf71afecdf7dfc6b2df045c00421d7fa/bin:
- sha256sum: good
- links: good

System:
- /usr/sbin iptables v1.8.2 (nf_tables): should be older than v1.8.0 or in legacy mode (fail)
- swap: disabled
- routes: ok

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: missing
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: missing
- CONFIG_CFS_BANDWIDTH: missing
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled (as module)
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled (as module)
- Storage Drivers:
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)

STATUS: 1 (fail)

sudo kubectl get nodes -o wide:

NAME      STATUS     ROLES    AGE   VERSION         INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION   CONTAINER-RUNTIME
master    NotReady   master   23h   v1.16.3-k3s.2   192.168.0.201   <none>        Raspbian GNU/Linux 10 (buster)   4.19.75-v7l+     containerd://1.3.0-k3s.4
worker1   NotReady   node     19h   v1.16.3-k3s.2   192.168.0.202   <none>        Raspbian GNU/Linux 10 (buster)   4.19.75-v7l+     containerd://1.3.0-k3s.4

sudo kubectl describe node master:

Name:               master
Roles:              master
Labels:             beta.kubernetes.io/arch=arm
                    beta.kubernetes.io/instance-type=k3s
                    beta.kubernetes.io/os=linux
                    k3s.io/hostname=master
                    k3s.io/internal-ip=192.168.0.201
                    kubernetes.io/arch=arm
                    kubernetes.io/hostname=master
                    kubernetes.io/os=linux
                    kubernetes.io/role=master
                    node-role.kubernetes.io/master=
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"be:1c:4b:13:c6:4b"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.0.201
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 11 Dec 2019 18:40:42 +0000
Taints:             node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Conditions:
  Type                 Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----                 ------    -----------------                 ------------------                ------              -------
  NetworkUnavailable   False     Wed, 11 Dec 2019 23:04:18 +0000   Wed, 11 Dec 2019 23:04:18 +0000   FlannelIsUp         Flannel is running on this node
  MemoryPressure       Unknown   Wed, 11 Dec 2019 23:08:28 +0000   Thu, 12 Dec 2019 17:33:57 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure         Unknown   Wed, 11 Dec 2019 23:08:28 +0000   Thu, 12 Dec 2019 17:33:57 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure          Unknown   Wed, 11 Dec 2019 23:08:28 +0000   Thu, 12 Dec 2019 17:33:57 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready                Unknown   Wed, 11 Dec 2019 23:08:28 +0000   Thu, 12 Dec 2019 17:33:57 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
Addresses:
  InternalIP:  192.168.0.201
  Hostname:    master
Capacity:
 cpu:                4
 ephemeral-storage:  29567140Ki
 memory:             3999784Ki
 pods:               110
Allocatable:
 cpu:                4
 ephemeral-storage:  28762913770
 memory:             3999784Ki
 pods:               110
System Info:
 Machine ID:                 7b2fdbf071984c60abe0ba09b3b020e9
 System UUID:                7b2fdbf071984c60abe0ba09b3b020e9
 Boot ID:                    fbd64ddb-699a-4062-8356-f8127a5aeea3
 Kernel Version:             4.19.75-v7l+
 OS Image:                   Raspbian GNU/Linux 10 (buster)
 Operating System:           linux
 Architecture:               arm
 Container Runtime Version:  containerd://1.3.0-k3s.4
 Kubelet Version:            v1.16.3-k3s.2
 Kube-Proxy Version:         v1.16.3-k3s.2
PodCIDR:                     10.42.0.0/24
PodCIDRs:                    10.42.0.0/24
ProviderID:                  k3s://master
Non-terminated Pods:         (5 in total)
  Namespace                  Name                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                       ------------  ----------  ---------------  -------------  ---
  kube-system                svclb-traefik-rmrjg                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         23h
  kube-system                metrics-server-6d684c7b5-kkx98             0 (0%)        0 (0%)      0 (0%)           0 (0%)         23h
  kube-system                local-path-provisioner-58fb86bdfd-52llb    0 (0%)        0 (0%)      0 (0%)           0 (0%)         23h
  kube-system                coredns-d798c9dd-vnkkr                     100m (2%)     0 (0%)      70Mi (1%)        170Mi (4%)     23h
  kube-system                traefik-65bccdc4bd-fpnf8                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         23h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  0 (0%)
  memory             70Mi (1%)  170Mi (4%)
  ephemeral-storage  0 (0%)     0 (0%)
Events:
  Type     Reason                   Age                From                Message
  ----     ------                   ----               ----                -------
  Normal   Starting                 23h                kubelet, master     Starting kubelet.
  Warning  InvalidDiskCapacity      23h                kubelet, master     invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  23h                kubelet, master     Node master status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    23h                kubelet, master     Node master status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     23h                kubelet, master     Node master status is now: NodeHasSufficientPID
  Normal   Starting                 23h                kube-proxy, master  Starting kube-proxy.
  Normal   NodeAllocatableEnforced  23h                kubelet, master     Updated Node Allocatable limit across pods
  Normal   NodeReady                23h                kubelet, master     Node master status is now: NodeReady
  Normal   Starting                 19h                kubelet, master     Starting kubelet.
  Warning  InvalidDiskCapacity      19h                kubelet, master     invalid capacity 0 on image filesystem
  Normal   Starting                 19h                kube-proxy, master  Starting kube-proxy.
  Warning  Rebooted                 19h                kubelet, master     Node master has been rebooted, boot id: fbd64ddb-699a-4062-8356-f8127a5aeea3
  Normal   NodeNotSchedulable       19h                kubelet, master     Node master status is now: NodeNotSchedulable
  Normal   NodeHasSufficientMemory  19h (x2 over 19h)  kubelet, master     Node master status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    19h (x2 over 19h)  kubelet, master     Node master status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     19h (x2 over 19h)  kubelet, master     Node master status is now: NodeHasSufficientPID
  Normal   NodeNotReady             19h                kubelet, master     Node master status is now: NodeNotReady
  Normal   NodeAllocatableEnforced  19h                kubelet, master     Updated Node Allocatable limit across pods
  Normal   NodeReady                19h                kubelet, master     Node master status is now: NodeReady
  Normal   NodeSchedulable          19h                kubelet, master     Node master status is now: NodeSchedulable

systemctl status k3s

● k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-12-12 18:13:46 GMT; 18min ago
     Docs: https://k3s.io
  Process: 494 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 498 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
 Main PID: 502 (k3s-server)
    Tasks: 144
   Memory: 371.6M
   CGroup: /system.slice/k3s.service
           ├─502 /usr/local/bin/k3s server
           └─691 containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/a

Dec 12 18:32:14 master k3s[502]: time="2019-12-12T18:32:14.816288667Z" level=error msg="Failed to connect to proxy" error="dial tcp: address 2a02:a31a:a23e:c980:4918:a950:8309:2fa:6
Dec 12 18:32:14 master k3s[502]: time="2019-12-12T18:32:14.816401314Z" level=error msg="Remotedialer proxy error" error="dial tcp: address 2a02:a31a:a23e:c980:4918:a950:8309:2fa:644
Dec 12 18:32:19 master k3s[502]: time="2019-12-12T18:32:19.816706836Z" level=info msg="Connecting to proxy" url="wss://2a02:a31a:a23e:c980:4918:a950:8309:2fa:6443/v1-k3s/connect"
Dec 12 18:32:19 master k3s[502]: time="2019-12-12T18:32:19.816926205Z" level=error msg="Failed to connect to proxy" error="dial tcp: address 2a02:a31a:a23e:c980:4918:a950:8309:2fa:6
Dec 12 18:32:19 master k3s[502]: time="2019-12-12T18:32:19.817039316Z" level=error msg="Remotedialer proxy error" error="dial tcp: address 2a02:a31a:a23e:c980:4918:a950:8309:2fa:644
Dec 12 18:32:24 master k3s[502]: time="2019-12-12T18:32:24.817324245Z" level=info msg="Connecting to proxy" url="wss://2a02:a31a:a23e:c980:4918:a950:8309:2fa:6443/v1-k3s/connect"
Dec 12 18:32:24 master k3s[502]: time="2019-12-12T18:32:24.817549929Z" level=error msg="Failed to connect to proxy" error="dial tcp: address 2a02:a31a:a23e:c980:4918:a950:8309:2fa:6
Dec 12 18:32:24 master k3s[502]: time="2019-12-12T18:32:24.817674928Z" level=error msg="Remotedialer proxy error" error="dial tcp: address 2a02:a31a:a23e:c980:4918:a950:8309:2fa:644
Dec 12 18:32:25 master k3s[502]: E1212 18:32:25.992906     502 resource_quota_controller.go:407] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the ser
Dec 12 18:32:27 master k3s[502]: W1212 18:32:27.196677     502 garbagecollector.go:640] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to

I think that ipv6 addresses should be wrapped with [...] as it's not url="wss://2a02:a31a:a23e:c980:4918:a950:8309:2fa:6443/v1-k3s/connect"

Source

kamilgregorczyk

Most helpful comment

Turning off ipv6 on raspberry pi level helped although seems like a bug 🤷‍♂

Add net.ipv6.conf.all.disable_ipv6 = 1 to /etc/sysctl.conf to every worker & master

kamilgregorczyk on 16 Dec 2019

👍3 ❤1 😕1

All 14 comments

After uninstalling server & agent and installing it once again everything was Ready but after reboot it stayed Readyfor few minutes and went to NotReady :|

kamilgregorczyk on 12 Dec 2019

I finally got some logs from the server, seems like either a cert or dns issue related to ipv6:

Dec 16 23:28:24 master k3s[521]: time="2019-12-16T23:28:24.898927242+01:00" level=info msg="Connecting to proxy" url="wss://2a02:a31a:a23e:c980:2286:b08b:e636:c655:6443/v1-k3s/connect"
Dec 16 23:28:24 master k3s[521]: time="2019-12-16T23:28:24.899175927+01:00" level=error msg="Failed to connect to proxy" error="dial tcp: address 2a02:a31a:a23e:c980:2286:b08b:e636:c655:6443: too many colons in address"
Dec 16 23:28:24 master k3s[521]: time="2019-12-16T23:28:24.899294946+01:00" level=error msg="Remotedialer proxy error" error="dial tcp: address 2a02:a31a:a23e:c980:2286:b08b:e636:c655:6443: too many colons in address"
Dec 16 23:28:25 master k3s[521]: http: TLS handshake error from 192.168.0.202:43392: remote error: tls: bad certificate
Dec 16 23:28:25 master k3s[521]: I1216 23:28:25.656211     521 node_lifecycle_controller.go:1208] Initializing eviction metric for zone:
Dec 16 23:28:27 master k3s[521]: http: TLS handshake error from 192.168.0.202:43400: remote error: tls: bad certificate
Dec 16 23:28:29 master k3s[521]: http: TLS handshake error from 192.168.0.202:43408: remote error: tls: bad certificate
Dec 16 23:28:29 master k3s[521]: time="2019-12-16T23:28:29.899611646+01:00" level=info msg="Connecting to proxy" url="wss://2a02:a31a:a23e:c980:2286:b08b:e636:c655:6443/v1-k3s/connect"
Dec 16 23:28:29 master k3s[521]: time="2019-12-16T23:28:29.899863035+01:00" level=e

Suprisingly, it constantly logs errors about failing to connect to proxy due to too many colons but after shutting it down with systemctl stop k3s and running sudo k3s server it works, there are only bad certificate errors. Ofc. after reboot it does not, it goes back again to ipv6 issues :(

kamilgregorczyk on 16 Dec 2019

Turning off ipv6 on raspberry pi level helped although seems like a bug 🤷‍♂

Add net.ipv6.conf.all.disable_ipv6 = 1 to /etc/sysctl.conf to every worker & master

kamilgregorczyk on 16 Dec 2019

👍3 ❤1 😕1

although seems like a bug

It’s actually a bug

chrisnew on 17 Dec 2019

Seems like ipv6 addresses have to be in [ ... ]

kamilgregorczyk on 17 Dec 2019

The error message re ipv6 issue looks like it should be addressed with #1198, but it seems like there may be a bigger issue that the systemd service file is launching k3s before ipv4 networking is available.

erikwilson on 18 Dec 2019

Also, iptables should be legacy mode. :)

erikwilson on 18 Dec 2019

I changed to legacy during investigation without any difference. And yes, it only stopped working when booting up, running k3s after boot was ok

kamilgregorczyk on 18 Dec 2019

I've faced the same issue before in #811. The solution for me was to turn off ipv6 in my entire internal network. Not ideal, I know, but it worked.

ikaruswill on 3 Jan 2020

In my case it wasn't possible so I disabled ipv6 on every raspberrypi

kamilgregorczyk on 3 Jan 2020

Same for me when setting up my cluster mid/end of december 2019.

If someone searches on how to disable IPv6, this worked for me

# act as root
sudo su -

echo "ipv6.disable=1 cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory" >> /boot/cmdline.txt

cat <<EOF > /etc/modprobe.d/ipv6.conf
# Don't load ipv6 by default
alias net-pf-10 off
alias ipv6 off
options ipv6 disable_ipv6=1
EOF

# Comment out IPv6 Hosts
nano /etc/hosts

reboot

Hopefully this helps someone. Maybe there are better ways, I'am no sys-admin -.-