Flannel: L3 miss and Route not found loop

Created on 7 Apr 2016 · 26Comments · Source: coreos/flannel

Hi there,
I'm running Flannel v0.5.5 in vxlan mode as part of a Kubernetes cluster. I seem to keep getting this repeating:

<Timestamp> <host> flanneld[873]: I0407 18:36:51.705743 00873 vxlan.go:345] L3 miss: <Service's IP>
<Timestamp> <host> flanneld[873]: I0407 18:36:51.705865 00873 vxlan.go:349] Route for <Service's IP> not found

It only happens to one node in the cluster at a time. It would seem like it's not aware a route has changed/removed and it is still looking for it.

It all was working fine for awhile, then I updated one service and then I keep getting this.

kinbug

Source

cpg1111

Most helpful comment

@eyakubovich is there any way to reduce log level to hide the Ignoring not a miss messages?

jsoriano on 19 Sep 2016

👍10

All 26 comments

This happens to me also.

iT2afL0rd on 12 Apr 2016

It seems Flannel changes its subnet right before this happens, making each service's IPs invalid, I assume the same applies to the pods' IPs. Anyone have insight as to what could cause this?

cpg1111 on 14 Apr 2016

Got the same problem.

hermanjunge on 10 May 2016

seems like I run into this as well

Rastusik on 22 May 2016

Got the same problem. Cross node network not work due to docker load flannel_docker_opts.env, but in my case flunnel starts after docker, so restart docker fix this problem.

$ systemctl cat docker
# /usr/lib64/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=docker.socket early-docker.target network.target
Requires=docker.socket early-docker.target

[Service]
EnvironmentFile=-/run/flannel_docker_opts.env
ExecStart=/usr/lib/coreos/dockerd daemon --host=fd:// $DOCKER_OPTS $DOCKER_OPT_BIP $DOCKER_OPT_MTU $DOCKER_OPT_IPMASQ

[Install]
WantedBy=multi-user.target

Add below config to avoid problem in future (cloud-config or something)

# /etc/systemd/system/docker.service.d/40-flannel.conf
[Unit]
Requires=flanneld.service
After=flanneld.service

Now logs looks like this:

May 27 16:23:33 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:33.936393 00001 device.go:187] calling NeighSet: 10.244.13.3, 12:6b:73:a2:7f:98
May 27 16:23:33 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:33.936735 00001 vxlan.go:356] AddL3 succeeded
May 27 16:23:35 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:35.717548 00001 vxlan.go:340] Ignoring not a miss: 12:6b:73:a2:7f:98, 10.244.13.5
May 27 16:23:36 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:36.719527 00001 vxlan.go:340] Ignoring not a miss: 12:6b:73:a2:7f:98, 10.244.13.5
May 27 16:23:37 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:37.721512 00001 vxlan.go:340] Ignoring not a miss: 12:6b:73:a2:7f:98, 10.244.13.5
May 27 16:23:38 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:38.724654 00001 vxlan.go:345] L3 miss: 10.244.13.5
May 27 16:23:38 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:38.724697 00001 device.go:187] calling NeighSet: 10.244.13.5, 12:6b:73:a2:7f:98
May 27 16:23:38 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:38.724827 00001 vxlan.go:356] AddL3 succeeded

rvadim on 27 May 2016

I'm a little surprised to see service IPs in the flannel logs, shouldn't the kube-proxy be rewriting them?

tomdee on 8 Jun 2016

I have since upgraded to 1.2.X of Kubernetes and switched to the gce backend and I haven't seen this, but I was seeing this with 1.1.x of Kubernetes and the vxLan backend. This might go away with the Kubernetes upgrade, but I'd be surprised, this seems a little more Flannel centric. I'd assume it could be seen with vxlan and 1.2.x of Kubernetes.

cpg1111 on 8 Jun 2016

👍1

L3 miss: <-Service's IP->
Route for <-Service's IP-> not found

I have the same issue. I solved my problem as:

$ etcdctl set /coreos.com/network/subnets/<-Service's IP->-24 '{"PublicIP":"<-Service's IP->","BackendType":"vxlan","BackendData":{"VtepMAC":"<-Service's MAC->"}}'

kevinsuo on 12 Jun 2016

👍1

@mattwang123 Yeah reassigning this "fixed" it for me to, but it's kinda a "duct tape" fix if you will. I would assume there's a reason this happens that could either be prevented or have things (maybe flannel, maybe kubernetes, I'm not sure where this responsibility would lie) adjust accordingly.

cpg1111 on 13 Jun 2016

Maybe related to #283

tomdee on 15 Jun 2016

Sounds a good bit similar

cpg1111 on 15 Jun 2016

Hi, i have a similar issue with CoreOS 1010.5.0, flannel v0.5.5 in vxLan mode and Kubernetes 1.2.4.
My setup is 1 schedulable master and 2 nodes on virtualbox - installed with the "bare metal" guide.

<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:340] Ignoring not a miss: <Mac:address>, <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:340] Ignoring not a miss: <Mac:address>, <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:340] Ignoring not a miss: <Mac:address>, <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:340] Ignoring not a miss: <Mac:address>, <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:248] Subnet removed: <Pod-Subnet>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 device.go:176] calling NeighDel: <k8s-master-IP>, <Mac:address>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:345] L3 miss: <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:349] Route for <Pod-IP> not found
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:345] L3 miss: <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:349] Route for <Pod-IP> not found
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:345] L3 miss: <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:349] Route for <Pod-IP> not found
... looping ...

My cluster was running fine. Systemd docker service requires flannel, both did not show any error logs.
The issue appeared after some time. Subnets started getting deleted on all nodes: pods lost all connection outside the host node.
The machines had been rebooted a lot, but the issue did NOT appear after a reboot or a systemd restart - in which case a race could be considered.

It could be linked to heavy CPU and memory load. I experienced this with several kubernetes and coreos versions, and although I cannot give any evidence, it seems to me that it always happened on a over-loaded cluster. Maybe network load to be considered too ?

After restarting flannel everything seems fine from the pods, but I still see _a LOT_ of Ignoring not a miss: <Mac:address>, <Some-Pod-IP> in flanneld logs.

senguehard on 23 Jun 2016

👍1

@iT2afL0rd

You may try to upgrade flannel from 0.5.5 to 0.5.6, after upgrade, the issue did not happen again to me.

I'm curious because there hasn't been an official flannel 0.5.6 release. Are you talking about builds from master?

steveeJ on 12 Jul 2016

@steveeJ Sorry, it just the latest bug fix release.

iT2afL0rd on 12 Jul 2016

@samnag Subnets getting deleted is very disturbing. I think the only thing that can delete subnets is TTL expiring before the node had a chance to renew the lease. But IIRC there was a prior report of something similar associated with a lot of etcd activity. More testing needs to be done to isolate the problem.

As far as Ignoring not a miss messages, they should be benign. I'd be interested to know if anyone has seen actual problems associated with them being reported. I'm not even sure why the kernel sends up these events and flannel just remove this log line.

eyakubovich on 16 Jul 2016

@eyakubovich is there any way to reduce log level to hide the Ignoring not a miss messages?

jsoriano on 19 Sep 2016

👍10

Could you repro on v0.7.0?

tomdee on 20 Jan 2017

I encounter this problem.

tedzhang2891 on 17 Mar 2017

I'm facing this problem but the IP in question is for an end point rather than a service
flannel-wrapper[21821]: I0531 02:51:48.031256 21821 network.go:243] L3 miss but route for 10.2.77.2 not found

holygits on 31 May 2017

The issue can be reproduced on v0.6.2

garyyang85 on 10 Jul 2017

@tomdee can we reopen this issue? I use kargo to deploy a kubernetes env with flannel, the flannel subnet changed by some unknown reason, which is a serious issue.

garyyang85 on 10 Jul 2017

👍1

Can you repro it on v0.7.1? v0.6.2 is quite old...

tomdee on 11 Jul 2017

Still repro on v0.7.0. See two kinds of messages:

2017/07/13, 23:59:57.000 I0714 03:59:57.610107 01590 network.go:225] L3 miss: 10.4.30.10
2017/07/13, 23:59:57.000 I0714 03:59:57.353027    1539 network.go:243] L3 miss but route for 10.6.52.22 not found

zihaoyu on 15 Jul 2017

the same to me, on v0.7.0

YEXINGZHE54 on 15 Jul 2017

I also met ,on 0.7.0-1.el7

JingYang0725 on 3 Aug 2017

flannel: 0.7.1
coreos: 1409.7.0 stable
k8s: 1.7.3 from hyperkube
I had the same problem when I removed calico cni configs and left only flannel routing. Check your docker and flannel network configurations, they should have the same subnet.

~ $ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet **10.64.209.1**  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 02:42:f6:36:29:1c  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet **10.64.209.0**  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::3c3c:eeff:fedd:2753  prefixlen 64  scopeid 0x20<link>
        ether 3e:3c:ee:dd:27:53  txqueuelen 0  (Ethernet)
        RX packets 1156  bytes 848161 (828.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1602  bytes 71988 (70.3 KiB)
        TX errors 0  dropped 47 overruns 0  carrier 0  collisions 0

If not so check flannel and Docker configs. Removing this configs and reloading daemons solved the problem in my case:

/etc/systemd/system/docker.service.d/40-flannel.conf
[Unit]
Requires=flanneld.service
After=flanneld.service
#---REMOVE LINES BELOW
[Service]
EnvironmentFile=/etc/kubernetes/cni/docker_opts_cni.env

#---ALSO REMOVE THIS:
/etc/kubernetes/cni/net.d/10-flannel.conf
{
    "name": "podnet",
    "type": "flannel",
    "delegate": {
        "isDefaultGateway": true
    }
}