Flannel: "link has incompatible addresses" after restarting Flannel k8s pod

Created on 3 Nov 2018  路  12Comments  路  Source: coreos/flannel

Flannel pod starts successfully if flannel.1 link doesn't exist. But running kubectl delete pod kube-flannel-... leaves the flannel.1 link and the subsequently created pod will fail to start with the following error:

I1103 17:54:41.197308       1 main.go:475] Determining IP address of default interface
I1103 17:54:41.198443       1 main.go:488] Using interface with name eth0 and address 192.168.1.244
I1103 17:54:41.198533       1 main.go:505] Defaulting external address to interface address (192.168.1.244)
I1103 17:54:41.698456       1 kube.go:131] Waiting 10m0s for node controller to sync
I1103 17:54:41.698812       1 kube.go:294] Starting kube subnet manager
I1103 17:54:42.699051       1 kube.go:138] Node controller sync successful
I1103 17:54:42.699241       1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - k8s-worker2
I1103 17:54:42.699302       1 main.go:238] Installing signal handlers
I1103 17:54:42.796179       1 main.go:353] Found network config - Backend type: vxlan
I1103 17:54:42.796884       1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E1103 17:54:42.799114       1 main.go:280] Error registering network: failed to configure interface flannel.1: failed to ensure address of interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &netlink.Vxlan{LinkAttrs:netlink.LinkAttrs{Index:715, MTU:1450, TxQLen:0, Name:"flannel.1", HardwareAddr:net.HardwareAddr{0x96, 0x11, 0x68, 0xa1, 0x57, 0x2b}, Flags:0x13, RawFlags:0x11043, ParentIndex:0, MasterIndex:0, Namespace:interface {}(nil), Alias:"", Statistics:(*netlink.LinkStatistics)(0x13c1a0e4), Promisc:0, Xdp:(*netlink.LinkXdp)(0x13d673a0), EncapType:"ether", Protinfo:(*netlink.Protinfo)(nil), OperState:0x0}, VxlanId:1, VtepDevIndex:2, SrcAddr:net.IP{0xc0, 0xa8, 0x1, 0xf4}, Group:net.IP(nil), TTL:0, TOS:0, Learning:false, Proxy:false, RSC:false, L2miss:false, L3miss:false, UDPCSum:true, NoAge:false, GBP:false, Age:300, Limit:0, Port:8472, PortLow:0, PortHigh:0}
I1103 17:54:42.799393       1 main.go:333] Stopping shutdownHandler...

This is on a raspberry pi 3 B+, so the arm architecture may be a factor.

Expected Behavior

Recreating the flannel pod should succeed even if flannel.1 link exists.

Current Behavior

Flannel pod goes into CrashLoopBackoff after the pod is recreated. To allow the pod to start successfully, SSH onto the worker and run sudo ip link delete flannel.1. Recreating the pod will then start successfully.

Possible Solution

?

Steps to Reproduce (for bugs)

  1. Deploy k8s with flannel
  2. kubectl delete pod kube-flannel-...
  3. See that recreated pod does not start and logs above error message

Context

About once a week my flannel pods enter this state, possibly due to a crash or restart of the pod, and I have to manually SSH in to delete the flannel link on each affected node.

Related issue: https://github.com/coreos/flannel/issues/883

Your Environment

  • Flannel version: v0.10.0
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version: 3.2.24
  • Kubernetes version (if used): 1.12.1
  • Operating System and version: HypriotOS 1.9.0 on Raspberry Pi 3 B+
  • Link to your project (optional): n/a

Most helpful comment

@markus-seidl @mkuchenbecker It is normal to have an instance on each node.

kyle@noobuntu:~/Development/raspberry_patch$ kubectl get pods -o wide -n kube-system | grep flannel
kube-flannel-ds-arm-6lgx4       0/1     CrashLoopBackOff   5          4m49s   192.168.1.101   alpha   <none>           <none>
kube-flannel-ds-arm-nrrk7       0/1     CrashLoopBackOff   5          4m43s   192.168.1.102   beta    <none>           <none>
kube-flannel-ds-arm-nv8zx       0/1     CrashLoopBackOff   5          4m30s   192.168.1.104   delta   <none>           <none>
kube-flannel-ds-arm-rfkft       0/1     CrashLoopBackOff   5          4m57s   192.168.1.103   gamma   <none>           <none>

Manually deleting the link on the node and deleting the pod, as other have suggested, seems to be the resolution.

All 12 comments

Flannel version: v0.10.0
Backend used (e.g. vxlan or udp): vxlan
Etcd version: 3.2.24
Kubernetes version (if used): 1.13.1
Operating System and version: Debian 9.6 on Raspberry Pi 3 B

same setup as @b3nw. Only difference is I'm using HypriotOS 1.9.0 on Raspberry Pi 3 B+

Got this same thing on hypriotos 1.10.0-rc2 on a raspberry Pi 3 B+

Got this same thing on hypriotos 1.10.0-rc2 on a raspberry Pi 3 B+

@mr-sour My config is same as yours and have same the results. I'm glad at least that sudo ip link delete flannel.1 on the failing node allows the pod to recreate successfully after deleting the failing pod.

seeing the same issue with flannel v0.11.0

uname -a: Linux pirate1 4.19.58-v7+ #1245 SMP Fri Jul 12 17:25:51 BST 2019 armv7l GNU/Linux

Same issue from applying

https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

on a small bramble (two RPi 4s). I noticed that there are two pods, after deleting the flannel.1 network, the first one starts without problems, the second one is entering a crash loop (with the same error).

Has someone else two "kube-flannel-ds-arm-xxxx" pods? Maybe that's the problem?

Still an issue with Hypriot v1.11.1 + K8s 1.16.1 + Flannel 0.11.0

@markus-seidl I can confirm, have two.

pi@raspi-0:~ $ k get pods --all-namespaces
NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   coredns-5644d7b6d9-gtj6h          1/1     Running   0          10m
kube-system   coredns-5644d7b6d9-lx59g          1/1     Running   0          10m
kube-system   etcd-raspi-0                      1/1     Running   0          10m
kube-system   kube-apiserver-raspi-0            1/1     Running   0          10m
kube-system   kube-controller-manager-raspi-0   1/1     Running   0          10m
kube-system   kube-flannel-ds-arm-5brn7         1/1     Running   0          10m
kube-system   kube-flannel-ds-arm-c4hbv         0/1     Error     14         5m41s
kube-system   kube-proxy-htk7f                  1/1     Running   0          10m
kube-system   kube-proxy-kcql2                  1/1     Running   0          5m41s
kube-system   kube-scheduler-raspi-0            1/1     Running   0          10m

@markus-seidl @mkuchenbecker It is normal to have an instance on each node.

kyle@noobuntu:~/Development/raspberry_patch$ kubectl get pods -o wide -n kube-system | grep flannel
kube-flannel-ds-arm-6lgx4       0/1     CrashLoopBackOff   5          4m49s   192.168.1.101   alpha   <none>           <none>
kube-flannel-ds-arm-nrrk7       0/1     CrashLoopBackOff   5          4m43s   192.168.1.102   beta    <none>           <none>
kube-flannel-ds-arm-nv8zx       0/1     CrashLoopBackOff   5          4m30s   192.168.1.104   delta   <none>           <none>
kube-flannel-ds-arm-rfkft       0/1     CrashLoopBackOff   5          4m57s   192.168.1.103   gamma   <none>           <none>

Manually deleting the link on the node and deleting the pod, as other have suggested, seems to be the resolution.

same issue on k8s v1.16.11 ,flannel:v0.11.0
i have 2 clusters : first on vmware and second on Nutanix-AHV
the installation of k8s cluster performed by Kubespray in the same way.
on vmware cluster i can delete flannel pods without any problems but on Nutanix i see CrashLoopBackOff on some nodes, the solution is to delete the flannel.1 (just a temporary patch).

I0802 12:34:12.140814 1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E0802 12:34:12.141386 1 main.go:289] Error registering network: failed to configure interface flannel.1: failed to ensure address of interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &netlink.Vxlan{LinkAttrs:netlink.LinkAttrs{Index:25, MTU:1450, TxQLen:0, Name:"flannel.1", HardwareAddr:net.HardwareAddr{0xe6, 0x96, 0xa6, 0x84, 0x4b, 0x66}, Flags:0x13, RawFlags:0x11043, ParentIndex:0, MasterIndex:0, Namespace:interface {}(nil), Alias:"", Statistics:(netlink.LinkStatistics)(0xc4201c50f4), Promisc:0, Xdp:(netlink.LinkXdp)(0xc42042a360), EncapType:"ether", Protinfo:(*netlink.Protinfo)(nil), OperState:0x0}, VxlanId:1, VtepDevIndex:2, SrcAddr:net.IP{0xa, 0x35, 0xa2, 0x65}, Group:net.IP(nil), TTL:0, TOS:0, Learning:false, Proxy:false, RSC:false, L2miss:false, L3miss:false, UDPCSum:true, NoAge:false, GBP:false, Age:300, Limit:0, Port:8472, PortLow:0, PortHigh:0}
I0802 12:34:12.141422 1 main.go:366] Stopping shutdownHandler...

I got the same issue, any progress on resolving this problem.

I had same issue when power went down and then tried to get nodes back... The trick with removing the link actually helps.

Was this page helpful?
0 / 5 - 0 ratings