Hi there,
I'm running Flannel v0.5.5 in vxlan mode as part of a Kubernetes cluster. I seem to keep getting this repeating:
<Timestamp> <host> flanneld[873]: I0407 18:36:51.705743 00873 vxlan.go:345] L3 miss: <Service's IP>
<Timestamp> <host> flanneld[873]: I0407 18:36:51.705865 00873 vxlan.go:349] Route for <Service's IP> not found
It only happens to one node in the cluster at a time. It would seem like it's not aware a route has changed/removed and it is still looking for it.
It all was working fine for awhile, then I updated one service and then I keep getting this.
This happens to me also.
It seems Flannel changes its subnet right before this happens, making each service's IPs invalid, I assume the same applies to the pods' IPs. Anyone have insight as to what could cause this?
Got the same problem.
seems like I run into this as well
Got the same problem. Cross node network not work due to docker load flannel_docker_opts.env, but in my case flunnel starts after docker, so restart docker fix this problem.
$ systemctl cat docker
# /usr/lib64/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=docker.socket early-docker.target network.target
Requires=docker.socket early-docker.target
[Service]
EnvironmentFile=-/run/flannel_docker_opts.env
ExecStart=/usr/lib/coreos/dockerd daemon --host=fd:// $DOCKER_OPTS $DOCKER_OPT_BIP $DOCKER_OPT_MTU $DOCKER_OPT_IPMASQ
[Install]
WantedBy=multi-user.target
Add below config to avoid problem in future (cloud-config or something)
# /etc/systemd/system/docker.service.d/40-flannel.conf
[Unit]
Requires=flanneld.service
After=flanneld.service
Now logs looks like this:
May 27 16:23:33 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:33.936393 00001 device.go:187] calling NeighSet: 10.244.13.3, 12:6b:73:a2:7f:98
May 27 16:23:33 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:33.936735 00001 vxlan.go:356] AddL3 succeeded
May 27 16:23:35 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:35.717548 00001 vxlan.go:340] Ignoring not a miss: 12:6b:73:a2:7f:98, 10.244.13.5
May 27 16:23:36 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:36.719527 00001 vxlan.go:340] Ignoring not a miss: 12:6b:73:a2:7f:98, 10.244.13.5
May 27 16:23:37 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:37.721512 00001 vxlan.go:340] Ignoring not a miss: 12:6b:73:a2:7f:98, 10.244.13.5
May 27 16:23:38 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:38.724654 00001 vxlan.go:345] L3 miss: 10.244.13.5
May 27 16:23:38 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:38.724697 00001 device.go:187] calling NeighSet: 10.244.13.5, 12:6b:73:a2:7f:98
May 27 16:23:38 k8s-master-1 sdnotify-proxy[31142]: I0527 16:23:38.724827 00001 vxlan.go:356] AddL3 succeeded
I'm a little surprised to see service IPs in the flannel logs, shouldn't the kube-proxy be rewriting them?
I have since upgraded to 1.2.X of Kubernetes and switched to the gce backend and I haven't seen this, but I was seeing this with 1.1.x of Kubernetes and the vxLan backend. This might go away with the Kubernetes upgrade, but I'd be surprised, this seems a little more Flannel centric. I'd assume it could be seen with vxlan and 1.2.x of Kubernetes.
L3 miss: <-Service's IP->
Route for <-Service's IP-> not found
I have the same issue. I solved my problem as:
$ etcdctl set /coreos.com/network/subnets/<-Service's IP->-24 '{"PublicIP":"<-Service's IP->","BackendType":"vxlan","BackendData":{"VtepMAC":"<-Service's MAC->"}}'
@mattwang123 Yeah reassigning this "fixed" it for me to, but it's kinda a "duct tape" fix if you will. I would assume there's a reason this happens that could either be prevented or have things (maybe flannel, maybe kubernetes, I'm not sure where this responsibility would lie) adjust accordingly.
Maybe related to #283
Sounds a good bit similar
Hi, i have a similar issue with CoreOS 1010.5.0, flannel v0.5.5 in vxLan mode and Kubernetes 1.2.4.
My setup is 1 schedulable master and 2 nodes on virtualbox - installed with the "bare metal" guide.
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:340] Ignoring not a miss: <Mac:address>, <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:340] Ignoring not a miss: <Mac:address>, <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:340] Ignoring not a miss: <Mac:address>, <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:340] Ignoring not a miss: <Mac:address>, <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:248] Subnet removed: <Pod-Subnet>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 device.go:176] calling NeighDel: <k8s-master-IP>, <Mac:address>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:345] L3 miss: <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:349] Route for <Pod-IP> not found
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:345] L3 miss: <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:349] Route for <Pod-IP> not found
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:345] L3 miss: <Pod-IP>
<Timestamp> <host> sdnotify-proxy[770]: <Timestamp> 00001 vxlan.go:349] Route for <Pod-IP> not found
... looping ...
My cluster was running fine. Systemd docker service requires flannel, both did not show any error logs.
The issue appeared after some time. Subnets started getting deleted on all nodes: pods lost all connection outside the host node.
The machines had been rebooted a lot, but the issue did NOT appear after a reboot or a systemd restart - in which case a race could be considered.
It could be linked to heavy CPU and memory load. I experienced this with several kubernetes and coreos versions, and although I cannot give any evidence, it seems to me that it always happened on a over-loaded cluster. Maybe network load to be considered too ?
After restarting flannel everything seems fine from the pods, but I still see _a LOT_ of Ignoring not a miss: <Mac:address>, <Some-Pod-IP>
in flanneld logs.
@iT2afL0rd
You may try to upgrade flannel from 0.5.5 to 0.5.6, after upgrade, the issue did not happen again to me.
I'm curious because there hasn't been an official flannel 0.5.6 release. Are you talking about builds from master?
@steveeJ Sorry, it just the latest bug fix release.
@samnag Subnets getting deleted is very disturbing. I think the only thing that can delete subnets is TTL expiring before the node had a chance to renew the lease. But IIRC there was a prior report of something similar associated with a lot of etcd activity. More testing needs to be done to isolate the problem.
As far as Ignoring not a miss
messages, they should be benign. I'd be interested to know if anyone has seen actual problems associated with them being reported. I'm not even sure why the kernel sends up these events and flannel just remove this log line.
@eyakubovich is there any way to reduce log level to hide the Ignoring not a miss
messages?
Could you repro on v0.7.0?
I encounter this problem.
I'm facing this problem but the IP in question is for an end point rather than a service
flannel-wrapper[21821]: I0531 02:51:48.031256 21821 network.go:243] L3 miss but route for 10.2.77.2 not found
The issue can be reproduced on v0.6.2
@tomdee can we reopen this issue? I use kargo to deploy a kubernetes env with flannel, the flannel subnet changed by some unknown reason, which is a serious issue.
Can you repro it on v0.7.1? v0.6.2 is quite old...
Still repro on v0.7.0. See two kinds of messages:
2017/07/13, 23:59:57.000 I0714 03:59:57.610107 01590 network.go:225] L3 miss: 10.4.30.10
2017/07/13, 23:59:57.000 I0714 03:59:57.353027 1539 network.go:243] L3 miss but route for 10.6.52.22 not found
the same to me, on v0.7.0
I also met ,on 0.7.0-1.el7
flannel: 0.7.1
coreos: 1409.7.0 stable
k8s: 1.7.3 from hyperkube
I had the same problem when I removed calico cni configs and left only flannel routing. Check your docker and flannel network configurations, they should have the same subnet.
~ $ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet **10.64.209.1** netmask 255.255.255.0 broadcast 0.0.0.0
ether 02:42:f6:36:29:1c txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet **10.64.209.0** netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::3c3c:eeff:fedd:2753 prefixlen 64 scopeid 0x20<link>
ether 3e:3c:ee:dd:27:53 txqueuelen 0 (Ethernet)
RX packets 1156 bytes 848161 (828.2 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1602 bytes 71988 (70.3 KiB)
TX errors 0 dropped 47 overruns 0 carrier 0 collisions 0
If not so check flannel and Docker configs. Removing this configs and reloading daemons solved the problem in my case:
/etc/systemd/system/docker.service.d/40-flannel.conf
[Unit]
Requires=flanneld.service
After=flanneld.service
#---REMOVE LINES BELOW
[Service]
EnvironmentFile=/etc/kubernetes/cni/docker_opts_cni.env
#---ALSO REMOVE THIS:
/etc/kubernetes/cni/net.d/10-flannel.conf
{
"name": "podnet",
"type": "flannel",
"delegate": {
"isDefaultGateway": true
}
}
Most helpful comment
@eyakubovich is there any way to reduce log level to hide the
Ignoring not a miss
messages?