flannel.1 link gets 2 ipv4 addresses on secondary nodes

Created on 20 Nov 2017 · 9Comments · Source: coreos/flannel

Using kubeadm and kubernetes 1.8.3 to provision a 3 node cluster on HypriotOS (ARM), I can initialize the master fine, but when joining nodes into the cluster, kube-flannel crashes on each node with:

Error registering network: failed to configure interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &{{%!s(int=5) %!s(int=1450) %!s(int=0) flannel.1 76:dd:ec:c0:e6:86 up|broadcast|multicast %!s(uint32=69699) %!s(int=0) %!s(int=0) <nil>  %!s(*netlink.LinkStatistics=&{272198 273028 25973338 108272828 0 0 0 45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0}) %!s(int=0) %!s(*netlink.LinkXdp=<nil>) ether <nil> unknown} %!s(int=1) %!s(int=3) 192.168.0.14 <nil> %!s(int=0) %!s(int=0) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(int=300) %!s(int=0) %!s(int=8472) %!s(int=0) %!s(int=0)}

When I run ip addr I can see that flannel.1 has been given two different IPs, each a /24 apart from each other.

5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 76:dd:ec:c0:e6:86 brd ff:ff:ff:ff:ff:ff
    inet 10.244.2.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet 10.244.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::74dd:ecff:fec0:e686/64 scope link
       valid_lft forever preferred_lft forever

I haven't manually added these addresses, they just appear automatically when flannel starts.

kube-flannel enters CrashLoopBackoff and never recovers from the error. Is this a bug, or something I need to update on my host OS? How do I remove the second address if this is a suitable temporary workaround?

The flannel manifest I have applied is: https://raw.githubusercontent.com/coreos/flannel/v0.9.0/Documentation/kube-flannel.yml (sed s/amd64/arm/g).

The entire log leading up to the error is:

$ kubectl -n kube-system logs kube-flannel-ds-tgs85
I1120 04:39:40.815542       1 main.go:470] Determining IP address of default interface
I1120 04:39:40.817983       1 main.go:483] Using interface with name wlan0 and address 192.168.0.14
I1120 04:39:40.818240       1 main.go:500] Defaulting external address to interface address (192.168.0.14)
I1120 04:39:41.137713       1 kube.go:130] Waiting 10m0s for node controller to sync
I1120 04:39:41.137945       1 kube.go:283] Starting kube subnet manager
I1120 04:39:42.138167       1 kube.go:137] Node controller sync successful
I1120 04:39:42.138306       1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - jasmine
I1120 04:39:42.138370       1 main.go:238] Installing signal handlers
I1120 04:39:42.138873       1 main.go:348] Found network config - Backend type: vxlan
I1120 04:39:42.139127       1 vxlan.go:119] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E1120 04:39:42.141224       1 main.go:280] Error registering network: failed to configure interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &{{%!s(int=5) %!s(int=1450) %!s(int=0) flannel.1 76:dd:ec:c0:e6:86 up|broadcast|multicast %!s(uint32=69699) %!s(int=0) %!s(int=0) <nil>  %!s(*netlink.LinkStatistics=&{272198 273028 25973338 108272828 0 0 0 45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0}) %!s(int=0) %!s(*netlink.LinkXdp=<nil>) ether <nil> unknown} %!s(int=1) %!s(int=3) 192.168.0.14 <nil> %!s(int=0) %!s(int=0) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(bool=false) %!s(int=300) %!s(int=0) %!s(int=8472) %!s(int=0) %!s(int=0)}
I1120 04:39:42.141346       1 main.go:328] Stopping shutdownHandler...

Expected Behavior

Kube flannel should start properly, not only on the master, but on all additional nodes that join the cluster. flannel.1 should automatically be allocated a single IPv4 address, rather than two addresses.

Current Behavior

Kube flannel starts correctly on the master, but additional nodes that join the cluster appear to allocate two IPv4 addresses to flannel.1, which causes Kube flannel to enter a crash loop.

Steps to Reproduce (for bugs)

Initialize a master node on HypriotOS with kubeadm init.
Join a secondary node on HypriotOS with kubeadm join.
Observe that kube-flannel runs fine on the master, but crashes on the worker node.

Context

This is preventing me from running Kubernetes right now. I previously had a working k8s 1.7 cluster with Flannel 0.7. Only since upgrading Kubernetes and Flannel has this issue started occurring.

Your Environment

Flannel version: 0.9.0
Backend used (e.g. vxlan or udp): vxlan
Etcd version:gcr.io/google_containers/etcd-arm:3.0.17 (?)
Kubernetes version (if used): 1.8.3
Operating System and version: HypriotOS (Debian GNU/Linux 8)
Link to your project (optional):

Source

d11wtq

👍4

Most helpful comment

Update: I fixed this by just ip link delete flannel.1 on the affected nodes. I'm not sure how it started, but flannel was than able to recreate the interface and work.

d11wtq on 20 Nov 2017

👍8 🚀1 ❤1 🎉1

All 9 comments

Thoughts: is it possible flannel attempts to allocate a single IP, which works, but flannel believes it failed, so it attempts to allocate the IP again, which results in the two addresses?

d11wtq on 20 Nov 2017

Update: I fixed this by just ip link delete flannel.1 on the affected nodes. I'm not sure how it started, but flannel was than able to recreate the interface and work.

d11wtq on 20 Nov 2017

👍8 🚀1 ❤1 🎉1

@d11wtq Interesting, I've not seen this before but certainly let me know if you see this again.

tomdee on 21 Nov 2017

we see the same issue with flannel 0.7.1.

ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:a2:ab:59 brd ff:ff:ff:ff:ff:ff
    inet 10.10.14.187/23 brd 10.10.15.255 scope global eth0
       valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
    link/ether 02:42:ad:b4:c5:7f brd ff:ff:ff:ff:ff:ff
    inet 172.30.79.1/24 scope global docker0
       valid_lft forever preferred_lft forever
46: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether 62:d6:6c:70:2a:94 brd ff:ff:ff:ff:ff:ff
    inet 172.30.73.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet 172.30.79.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever

[cloud@psd011 ~]$ etcdctl get /k8s/network/subnets/172.30.73.0-24
Error:  100: Key not found (/k8s/network/subnets/172.30.73.0-24) [1959]
[cloud@psd011 ~]$ etcdctl get /k8s/network/subnets/172.30.79.0-24
{"PublicIP":"10.10.14.187","BackendType":"vxlan","BackendData":{"VtepMAC":"76:ef:58:64:0d:91"}}

Your Environment
Flannel version: 0.7.1
Backend used (e.g. vxlan or udp): vxlan
Etcd version:gcr.io/google_containers/etcd-arm:3.0.17 (?)
Kubernetes version (if used): 1.9.3
Operating System and version: Centos 7.3
Link to your project (optional):

xukunfeng on 4 May 2018

The same issue occurs with Flannel v0.10.0.
14: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default link/ether a6:62:08:a6:c2:f0 brd ff:ff:ff:ff:ff:ff inet 10.244.1.0/32 scope global flannel.1 valid_lft forever preferred_lft forever inet 169.254.46.120/16 brd 169.254.255.255 scope global flannel.1 valid_lft forever preferred_lft forever inet6 fe80::a462:8ff:fea6:c2f0/64 scope link valid_lft forever preferred_lft forever
Flannel goes into CrashLoopBackOff. Deleting the flannel.1 link does solve the issue.

OS: Raspbian Stretch Lite
Kubernetes version: 1.10.2

adarsh1001 on 18 Jun 2018

i get same promblem

timchenxiaoyu on 13 Aug 2018

Same under K8s 1.12.0, flannel v0.10.0; however, sudo ip link delete flannel.1 did allow it to come up without error. (Hypriot 1.9.0 on ARM - Raspberry Pi 3 B+)

wroney688 on 14 Oct 2018

check you etcd data and loacl env file , flanneld will get preview ip when startup

timchenxiaoyu on 14 Oct 2018

+1, same setup as @wroney688:

-k8s 1.12.0

flannel v0.10.0
Hypriot 1.9.0 on ARM - Raspberry Pi 3 B+

In my case all the flannel pods initially come up successfully but after ~3 days the flannel pod gets stuck in CrashLoopBackOff with the following error (other 4 workers are fine):

kubectl -n kube-system logs kube-flannel-ds-arm-fcm6r
I1019 03:16:15.097562       1 main.go:475] Determining IP address of default interface
I1019 03:16:15.099403       1 main.go:488] Using interface with name eth0 and address 192.168.1.246
I1019 03:16:15.099582       1 main.go:505] Defaulting external address to interface address (192.168.1.246)
I1019 03:16:15.734392       1 kube.go:131] Waiting 10m0s for node controller to sync
I1019 03:16:15.794863       1 kube.go:294] Starting kube subnet manager
I1019 03:16:16.995022       1 kube.go:138] Node controller sync successful
I1019 03:16:16.995104       1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - k8s-worker4
I1019 03:16:16.995131       1 main.go:238] Installing signal handlers
I1019 03:16:16.995337       1 main.go:353] Found network config - Backend type: vxlan
I1019 03:16:16.995492       1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E1019 03:16:16.996653       1 main.go:280] Error registering network: failed to configure interface flannel.1: failed to ensure address of interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &netlink.Vxlan{LinkAttrs:netlink.LinkAttrs{Index:16, MTU:1450, TxQLen:0, Name:"flannel.1", HardwareAddr:net.HardwareAddr{0x96, 0x73, 0x59, 0x4a, 0xa2, 0xd6}, Flags:0x13, RawFlags:0x11043, ParentIndex:0, MasterIndex:0, Namespace:interface {}(nil), Alias:"", Statistics:(*netlink.LinkStatistics)(0x13f340e4), Promisc:0, Xdp:(*netlink.LinkXdp)(0x14027100), EncapType:"ether", Protinfo:(*netlink.Protinfo)(nil), OperState:0x0}, VxlanId:1, VtepDevIndex:2, SrcAddr:net.IP{0xc0, 0xa8, 0x1, 0xf6}, Group:net.IP(nil), TTL:0, TOS:0, Learning:false, Proxy:false, RSC:false, L2miss:false, L3miss:false, UDPCSum:true, NoAge:false, GBP:false, Age:300, Limit:0, Port:8472, PortLow:0, PortHigh:0}
I1019 03:16:16.996768       1 main.go:333] Stopping shutdownHandler...

Here's the interfaces on the worker with the failing flannel pod:

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b8:27:eb:fa:0d:e4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.246/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a775:2ad:bcca:1d44/64 scope link
       valid_lft forever preferred_lft forever
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether b8:27:eb:af:58:b1 brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:fc:53:bc:ad brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
16: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 96:73:59:4a:a2:d6 brd ff:ff:ff:ff:ff:ff
    inet 10.244.2.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet 169.254.47.220/16 brd 169.254.255.255 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::697a:866d:c96e:5849/64 scope link
       valid_lft forever preferred_lft forever

As with others, running sudo ip link delete flannel.1 resolves it at least temporarily.

Can we re-open this issue? Any logs I can grab next time to help debug?

ljfranklin on 19 Oct 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

kube-flannel: operation not supported

tomdee · 11Comments

Provide more concrete guidance on upgrading flannel on Kubernetes

tomdee · 9Comments

pod cidr not assgned

kfox1111 · 12Comments

clusterroles.rbac.authorization.k8s.io "flannel" is forbidden: User "system:node:test" cannot get resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope

xuefengedu · 12Comments

Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-sb3k4': the server does not allow access to the requested resource^C

hustljl · 15Comments