After installing a Kubernetes cluster using kubeadm on a machines running vanilla Ubuntu 16.04 (following the steps in this doc: https://kubernetes.io/docs/getting-started-guides/kubeadm/) and setting up Flannel network add-on, we are not able to communicate with pods deployed on a different host (e.g. from the master node to a pod deployed on a worker node, or between pods deployed on different worker nodes).
Our deployment steps:
MASTER:
kubeadm init --pod-network-cidr=10.244.0.0/16 --api-advertise-addresses=192.168.56.11
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
WORKER(s):
kubeadm join --token=525be4.d44dce5cc2c64fb3 192.168.56.11
Deployment (on master):
kubectl run nginx --image=nginx --replicas=2
# kubectl get nodes
NAME STATUS AGE
ubuntu-server-1 Ready,master 56s
ubuntu-server-2 Ready 13s
# kubectl get pods
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-tsqng 2/2 Running 0 23s
kube-flannel-ds-vv4dd 2/2 Running 0 48s
nginx-701339712-b9hkx 1/1 Running 0 15s
nginx-701339712-wlqj8 1/1 Running 0 15s
# kubectl describe pod nginx-701339712-b9hkx
Name: nginx-701339712-b2xmf
Namespace: default
Node: ubuntu-server-2/10.0.2.15
Start Time: Fri, 27 Jan 2017 23:21:25 +0100
Labels: pod-template-hash=701339712
run=nginx
Status: Running
IP: 10.244.1.5
Ping from the master does not work:
root@ubuntu-server-1:~# ping 10.244.1.5
PING 10.244.1.5 (10.244.1.5) 56(84) bytes of data.
^C
--- 10.244.1.5 ping statistics ---
8 packets transmitted, 0 received, 100% packet loss, time 7021ms
Ping from the worker where the pod is running works:
root@ubuntu-server-2:~# ping 10.244.1.5
PING 10.244.1.5 (10.244.1.5) 56(84) bytes of data.
64 bytes from 10.244.1.5: icmp_seq=1 ttl=64 time=0.156 ms
(master) List of interfaces & stats. Containers are correctly connected to cni0 bridge (shows non-zero counters). flannel.1 TX counters are non-negative.
root@ubuntu-server-1:/home/rasto# netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
cni0 1450 0 1201 0 0 0 1231 0 0 0 BMRU
docker0 1500 0 0 0 0 0 0 0 0 0 BMU
enp0s3 1500 0 74 0 0 0 84 0 0 0 BMRU
enp0s8 1500 0 5154 0 0 0 5575 0 0 0 BMRU
flannel.1 1450 0 0 0 0 0 0 0 14 0 BMRU
lo 65536 0 49230 0 0 0 49230 0 0 0 LRU
veth2bec76c0 1450 0 1020 0 0 0 1061 0 0 0 BMRU
(master) Routing table looks good:
# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.0.2.2 0.0.0.0 UG 0 0 0 enp0s3
10.0.2.0 * 255.255.255.0 U 0 0 0 enp0s3
10.244.0.0 * 255.255.255.0 U 0 0 0 cni0
10.244.0.0 * 255.255.0.0 U 0 0 0 flannel.1
172.17.0.0 * 255.255.0.0 U 0 0 0 docker0
192.168.56.0 * 255.255.255.0 U 0 0 0 enp0s8
(worker) no flannel.1 interface ???
# netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
cni0 1450 0 10 0 0 0 12 0 0 0 BMRU
docker0 1500 0 0 0 0 0 0 0 0 0 BMU
enp0s3 1500 0 122 0 0 0 128 0 0 0 BMRU
enp0s8 1500 0 7369 0 0 0 6237 0 0 0 BMRU
lo 65536 0 168 0 0 0 168 0 0 0 LRU
veth3f949122 1450 0 3 0 0 0 21 0 0 0 BMRU
vethff0406c4 1450 0 7 0 0 0 24 0 0 0 BMRU
(not sure whether this is an issue with kubeadm, Flannel, or both)
BTW, in order to make Flannel working on the worker node, I had to manually add the file /run/flannel/subnet.env there, otherwise I was getting this error:
Error syncing pod, skipping: failed to "SetupNetwork" for "nginx-701339712-wkf7z_default" with SetupNetworkError: "Failed to setup network for pod \"nginx-701339712-wkf7z_default(a5fc14ec-e7fa-11e6-8f28-080027fb0a94)\" using network plugins \"cni\": open /run/flannel/subnet.env: no such file or directory; Skipping pod"
# kubectl describe pods
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 {default-scheduler } Normal Scheduled Successfully assigned nginx-701339712-wkf7z to ubuntu-server-2
1m 1s 42 {kubelet ubuntu-server-2} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "nginx-701339712-wkf7z_default" with SetupNetworkError: "Failed to setup network for pod \"nginx-701339712-wkf7z_default(a5fc14ec-e7fa-11e6-8f28-080027fb0a94)\" using network plugins \"cni\": open /run/flannel/subnet.env: no such file or directory; Skipping pod"
The content I added there:
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.1.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
Looks like you are using vagrant, for this to work, you have to set a static route from the second interface, to kube-dns, so the address that you used in kubeadm init --api-advertise-addresses=192.168.56.11 on the nodes.
ip route add 10.96.0.0/12 dev enp0s8 in your case.
Also mentioned in #113
Also have a look at the limitations, the vagrant section, and make sure that /etcd/hosts file get's fixed.
Hi, thanks for the suggestions!
I'm not using Vagrant, but I'm still using Virtualbox VMs. /etc/hosts is fixed:
root@ubuntu-server-1:~# hostname -i
192.168.56.11
root@ubuntu-server-2:~# hostname -i
192.168.56.12
The static route (ip route add 10.96.0.0/12 dev enp0s8) resolved the issue with /run/flannel/subnet.env, now flannel seems to be working correctly on worker, I also see the flannel.1 interface has been correctly created there.
However, the ping from master to the container on worker still does not work.
On master, the traffic seems to go through the flannel.1 interface (non-zero TX counters):
root@ubuntu-server-1:~# netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
cni0 1450 0 1576 0 0 0 1608 0 0 0 BMRU
docker0 1500 0 0 0 0 0 0 0 0 0 BMU
enp0s3 1500 0 204062 0 0 0 98753 0 0 0 BMRU
enp0s8 1500 0 2689 0 0 0 2681 0 0 0 BMRU
flannel.1 1450 0 17 0 0 0 17 0 8 0 BMRU
lo 65536 0 49083 0 0 0 49083 0 0 0 LRU
veth8f4d5722 1450 0 1576 0 0 0 1618 0 0 0 BMRU
But on worker, no traffic goes through the flannel.1 interface (zero RX & TX counters):
root@ubuntu-server-2:~# netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
cni0 1450 0 6 0 0 0 8 0 0 0 BMRU
docker0 1500 0 0 0 0 0 0 0 0 0 BMU
enp0s3 1500 0 120120 0 0 0 57754 0 0 0 BMRU
enp0s8 1500 0 2460 0 0 0 1979 0 0 0 BMRU
flannel.1 1450 0 0 0 0 0 0 0 8 0 BMRU
lo 65536 0 164 0 0 0 164 0 0 0 LRU
veth041fb4a1 1450 0 3 0 0 0 20 0 0 0 BMRU
veth5133c37c 1450 0 3 0 0 0 20 0 0 0 BMRU
Note that there are 8 TX-OVR packets on the flannel.1 interface.
This seems to be the issue: https://github.com/coreos/flannel/issues/535
The solution is to specify interface for launching flanneld (e.g. "--iface=enp0s8" in my case) in kube-flannel.yml:
root@ubuntu-server-1:~# diff kube-flannel.yml kube-flannel-mod.yml
52c52
< command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ]
---
> command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr", "--iface=enp0s8" ]
+ the static routes mentioned above (e.g. ip route add 10.96.0.0/12 dev enp0s8 in my case).
I too had the same problem. Resolved after adding --iface=eth1, in my case it is eth1, in flannel.yml and with a static route.
Thank you.
Most helpful comment
This seems to be the issue: https://github.com/coreos/flannel/issues/535
The solution is to specify interface for launching flanneld (e.g.
"--iface=enp0s8"in my case) inkube-flannel.yml:+ the static routes mentioned above (e.g.
ip route add 10.96.0.0/12 dev enp0s8in my case).