Flannel: Super slow access to service IP from host (& host-networked pods) with Flannel CNI

Created on 18 Jan 2020 · 12Comments · Source: coreos/flannel

Ref: https://github.com/kubernetes/kubernetes/issues/87233#issue-550046098

The k/k guys believed this is a Flannel's issue, so re-post here.

Source

fengye87

👍9

Most helpful comment

I think I've been hitting this issue yesterday/today

Some tests I was doing, from one host (not in a container)

curl http://pod-on-other-node worked
curl http://service-ip worked after a delay of 1 second. tcpdump showed a retransmission which got through. This was reproducible

I've just swapped to the host-gw backend and everything's working normally

flannel: 0.11.0
kubernetes: 1.17.2, installed using kubeadm
on a baremetal switched network.

mikebryant on 5 Feb 2020

👍7

All 12 comments

We are seeing multiple reports that flannel + kube 1.17 don't play well:

@tomdee can you look at these?

mariusgrigoriu on 4 Feb 2020

I think I've been hitting this issue yesterday/today

Some tests I was doing, from one host (not in a container)

curl http://pod-on-other-node worked
curl http://service-ip worked after a delay of 1 second. tcpdump showed a retransmission which got through. This was reproducible

I've just swapped to the host-gw backend and everything's working normally

flannel: 0.11.0
kubernetes: 1.17.2, installed using kubeadm
on a baremetal switched network.

mikebryant on 5 Feb 2020

👍7

Something we noticed is that the number of conntrack insert_failed was dramatically higher while running kube 1.17.

mariusgrigoriu on 6 Feb 2020

We experienced the same issue today. Fixed this by using the solution of @mikebryant. Is there any permanent solution on the way?

thibautvincent on 14 Feb 2020

@tomdee as you are the last remaining maintainer, who should I ping/tag to get this looked at.

MansM on 19 Feb 2020

Just FIY, this is not related only to 1.17 .. Because of these issues here, I've tried to downgrade from 1.17.3 to 1.16.8, but same result
First of all, route is missing from service cidr to cni0 interface gateway, so I had to manually add it in order for it to even resolve

ip route add 10.96.0.0/12 via 10.244.3.1

And after that, even traceroute is super slow

traceroute <service>.<namespace>.svc.cluster.local
traceroute to <service>.<namespace>.svc.cluster.local (10.106.49.44), 30 hops max, 38 byte packets
 1  10.244.3.1 (10.244.3.1)  3097.057 ms !H  3097.946 ms !H  3119.540 ms !H

tkislan on 25 Mar 2020

Just curious, how many folks experiencing this issue are using hyperkube?

mariusgrigoriu on 31 Mar 2020

I'm having this issue with vxlan backend both with flannel version 0.11 and 0.12 aswell.
Affected kubernetes versions 1.16.X, 1.17.x and 1.18.x.

Finally setting up a static route on my nodes to service network through cni0 interface helped me instantly:
ip route add 10.96.0.0/12 dev cni0

os: CentOS 7
install method: kubeadm
underlying plattform: Virtualbox 6

mengmann on 13 Apr 2020

👍4

Finally setting up a static route on my nodes to service network through cni0 interface helped me instantly:
ip route add 10.96.0.0/12 dev cni0

Fixed this problem by using the solution of @mengmann in kubernetes version v1.17.2 .

pytimer on 17 Apr 2020

I think I've been hitting this issue yesterday/today

Some tests I was doing, from one host (not in a container)

curl http://pod-on-other-node worked

curl http://service-ip worked after a delay of 1 second. tcpdump showed a retransmission which got through. This was reproducible

I've just swapped to the host-gw backend and everything's working normally

flannel: 0.11.0
kubernetes: 1.17.2, installed using kubeadm
on a baremetal switched network.

Exactly the same issue here

blueabysm on 24 Apr 2020

I think I've been hitting this issue yesterday/today
Some tests I was doing, from one host (not in a container)

curl http://pod-on-other-node worked

curl http://service-ip worked after a delay of 1 second. tcpdump showed a retransmission which got through. This was reproducible

I've just swapped to the host-gw backend and everything's working normally
flannel: 0.11.0
kubernetes: 1.17.2, installed using kubeadm
on a baremetal switched network.

Exactly the same issue here

Not sure if its the same issue but we noticed an additional delay of 1 second when upgrading from kubernetes 1.15.3 to 1.18.1. We seem to trace the problem to the --random-fully flag introduced by this PR. See the issue here

skamboj on 7 May 2020

I think I've been hitting this issue yesterday/today
Some tests I was doing, from one host (not in a container)

curl http://pod-on-other-node worked

curl http://service-ip worked after a delay of 1 second. tcpdump showed a retransmission which got through. This was reproducible

I've just swapped to the host-gw backend and everything's working normally
flannel: 0.11.0
kubernetes: 1.17.2, installed using kubeadm
on a baremetal switched network.

Exactly the same issue here

Not sure if its the same issue but we noticed an additional delay of 1 second when upgrading from kubernetes 1.15.3 to 1.18.1. We seem to trace the problem to the --random-fully flag introduced by this PR. See the issue here

I'm currently working with kubernetes 17.3(some nodes 17.4). Fortunately there are not so many apps running on my new-built cluster, so I migrated them this week and changed the network fabric to calico according to this article. Now erverything works perfect. 😄

blueabysm on 10 May 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-sb3k4': the server does not allow access to the requested resource^C

hustljl · 15Comments

IPTables rules missing from Flannel/CNI on Kubernetes installation

limited · 18Comments

*: add IPv6 support

Nurza · 31Comments

pod cidr not assgned

kfox1111 · 12Comments

L3 miss and Route not found loop

cpg1111 · 26Comments