I cannot access the kube-api from inside a pod on a minion node (default namespace). Accessing other service ips of other deployments (same namespace) works flawlessly. Also from each node (host) I have access to the kube-api. There seem to be no other issues (e.g., I've installed traefik and it seems to work fine).
A pod in default namespace can access the kube-api (10.96.0.1:443) from any node.
From inside a pod on a minion node I cannot access the kube-api service. curl -k https://10.96.0.1 --> fails.
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready master 8d v1.11.2 192.168.56.10 <none> Debian GNU/Linux 9 (stretch) 4.9.0-7-amd64 docker://18.6.0
node1 Ready <none> 8d v1.11.2 192.168.56.11 <none> Debian GNU/Linux 9 (stretch) 4.9.0-7-amd64 docker://18.6.0
node2 Ready <none> 5d v1.11.2 192.168.56.12 <none> Debian GNU/Linux 9 (stretch) 4.9.0-7-amd64 docker://18.6.0
node3 Ready <none> 2d v1.11.2 192.168.56.13 <none> Debian GNU/Linux 9 (stretch) 4.9.0-7-amd64 docker://18.6.0
dev@master:~$ kubectl get all --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default pod/deploy-heketi-559446b649-lqjlc 1/1 Running 0 24m 192.168.2.9 node2 <none>
default pod/glusterfs-6lbl7 1/1 Running 0 13h 192.168.56.11 node1 <none>
default pod/glusterfs-6p6ql 1/1 Running 0 13h 192.168.56.13 node3 <none>
default pod/glusterfs-xn7t2 1/1 Running 0 13h 192.168.56.12 node2 <none>
default pod/stilton-7bdb8cfb8f-8zwb8 1/1 Running 1 1d 192.168.3.7 node3 <none>
default pod/stilton-7bdb8cfb8f-kf4tq 1/1 Running 1 1d 192.168.2.8 node2 <none>
default pod/stilton-7bdb8cfb8f-p9kkq 1/1 Running 1 1d 192.168.1.21 node1 <none>
kube-system pod/calico-node-6vvkk 2/2 Running 2 1d 192.168.56.11 node1 <none>
kube-system pod/calico-node-h76bw 2/2 Running 3 1d 192.168.56.13 node3 <none>
kube-system pod/calico-node-ln6h6 2/2 Running 5 1d 192.168.56.10 master <none>
kube-system pod/calico-node-q64nd 2/2 Running 2 1d 192.168.56.12 node2 <none>
kube-system pod/coredns-78fcdf6894-bxkb7 1/1 Running 34 8d 192.168.0.16 master <none>
kube-system pod/coredns-78fcdf6894-tbbr2 1/1 Running 34 8d 192.168.0.17 master <none>
kube-system pod/etcd-master 1/1 Running 5 8d 192.168.56.10 master <none>
kube-system pod/kube-apiserver-master 1/1 Running 3 1d 192.168.56.10 master <none>
kube-system pod/kube-controller-manager-master 1/1 Running 6 8d 192.168.56.10 master <none>
kube-system pod/kube-proxy-2g45v 1/1 Running 0 7h 192.168.56.10 master <none>
kube-system pod/kube-proxy-6t5l4 1/1 Running 0 7h 192.168.56.12 node2 <none>
kube-system pod/kube-proxy-qh64w 1/1 Running 0 7h 192.168.56.13 node3 <none>
kube-system pod/kube-proxy-r2mxc 1/1 Running 0 7h 192.168.56.11 node1 <none>
kube-system pod/kube-scheduler-master 1/1 Running 5 8d 192.168.56.10 master <none>
kube-system pod/traefik-ingress-controller-fzg9n 1/1 Running 6 5d 192.168.56.11 node1 <none>
kube-system pod/traefik-ingress-controller-m4rzv 1/1 Running 2 2d 192.168.56.13 node3 <none>
kube-system pod/traefik-ingress-controller-n4fcc 1/1 Running 5 5d 192.168.56.10 master <none>
kube-system pod/traefik-ingress-controller-w2xhp 1/1 Running 2 5d 192.168.56.12 node2 <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/deploy-heketi ClusterIP 10.98.112.221 <none> 8080/TCP 2d deploy-heketi=pod
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 8d <none>
default service/stilton ClusterIP 10.99.36.220 <none> 80/TCP 1d app=cheese,task=stilton
kube-system service/calico-typha ClusterIP 10.108.55.44 <none> 5473/TCP 1d k8s-app=calico-typha
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 8d k8s-app=kube-dns
kube-system service/traefik-ingress-service ClusterIP 10.97.117.183 <none> 80/TCP,8080/TCP 5d k8s-app=traefik-ingress-lb
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
default daemonset.apps/glusterfs 3 3 3 3 3 storagenode=glusterfs 2d glusterfs gluster/gluster-centos:latest glusterfs=pod,glusterfs-node=pod
kube-system daemonset.apps/calico-node 4 4 4 4 4 <none> 1d calico-node,install-cni quay.io/calico/node:v3.2.0,quay.io/calico/cni:v3.2.0 k8s-app=calico-node
kube-system daemonset.apps/kube-proxy 4 4 4 4 4 beta.kubernetes.io/arch=amd64 8d kube-proxy k8s.gcr.io/kube-proxy-amd64:v1.11.2 k8s-app=kube-proxy
kube-system daemonset.apps/traefik-ingress-controller 3 3 3 3 3 <none> 5d traefik-ingress-lb traefik k8s-app=traefik-ingress-lb,name=traefik-ingress-lb
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
default deployment.apps/deploy-heketi 1 1 1 1 2d deploy-heketi heketi/heketi:dev deploy-heketi=pod,glusterfs=heketi-pod
default deployment.apps/stilton 3 3 3 3 1d cheese errm/cheese:stilton app=cheese,task=stilton
kube-system deployment.apps/calico-typha 0 0 0 0 1d calico-typha quay.io/calico/typha:v3.2.0 k8s-app=calico-typha
kube-system deployment.apps/coredns 2 2 2 2 8d coredns k8s.gcr.io/coredns:1.1.3 k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
default replicaset.apps/deploy-heketi-559446b649 1 1 1 2d deploy-heketi heketi/heketi:dev deploy-heketi=pod,glusterfs=heketi-pod,pod-template-hash=1150026205
default replicaset.apps/stilton-7bdb8cfb8f 3 3 3 1d cheese errm/cheese:stilton app=cheese,pod-template-hash=3686479649,task=stilton
kube-system replicaset.apps/calico-typha-5744654d66 0 0 0 1d calico-typha quay.io/calico/typha:v3.2.0 k8s-app=calico-typha,pod-template-hash=1300210822
kube-system replicaset.apps/coredns-78fcdf6894 2 2 2 8d coredns k8s.gcr.io/coredns:1.1.3 k8s-app=kube-dns,pod-template-hash=3497892450
dev@master:~$ kubectl -n kube-system exec -it calico-node-ln6h6 -- wget https://10.96.0.1
Defaulting container name to calico-node.
Use 'kubectl describe pod/calico-node-ln6h6 -n kube-system' to see all of the containers in this pod.
Connecting to 10.96.0.1 (10.96.0.1:443)
ssl_client: 10.96.0.1: certificate verification failed: unable to get local issuer certificate
wget: error getting response: Connection reset by peer
command terminated with exit code 1
dev@master:~$ kubectl -n kube-system exec -it calico-node-6vvkk -- wget https://10.96.0.1
Defaulting container name to calico-node.
Use 'kubectl describe pod/calico-node-6vvkk -n kube-system' to see all of the containers in this pod.
Connecting to 10.96.0.1 (10.96.0.1:443)
ssl_client: 10.96.0.1: certificate verification failed: unable to get local issuer certificate
wget: error getting response: Connection reset by peer
command terminated with exit code 1
dev@master:~$ kubectl -n default exec -it stilton-7bdb8cfb8f-p9kkq -- wget https://10.96.0.1
Connecting to 10.96.0.1 (10.96.0.1:443)
In this case I try to access default service/deploy-heketi ClusterIP 10.98.112.221 <none> 8080/TCP
dev@master:~/k8-test$ kubectl -n default exec -it stilton-7bdb8cfb8f-p9kkq -- wget http://10.98.112.221:8080
Connecting to 10.98.112.221:8080 (10.98.112.221:8080)
wget: server returned error: HTTP/1.1 404 Not Found
command terminated with exit code 1
I did run a TCP dump while trying to access the kube-api (curl -k https://10.96.0.1)
/ # tcpdump -A -s0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:32:55.301880 IP stilton-7bdb8cfb8f-8zwb8.47154 > kubernetes.default.svc.cluster.local.443: Flags [S], seq 561230321, win 28000, options [mss 1400,sackOK,TS val 21993499 ecr 0,nop,wscale 7], length 0
E..<).@[email protected].....
`...2..!s........m`.>.....x...
.O..........
09:32:55.302305 IP stilton-7bdb8cfb8f-8zwb8.39671 > kube-dns.kube-system.svc.cluster.local.53: 13814+ PTR? 1.0.96.10.in-addr.arpa. (40)
E..D@.@.@.,.....
`.
...5.0.Z5............1.0.96.10.in-addr.arpa.....
09:32:55.302937 IP kube-dns.kube-system.svc.cluster.local.53 > stilton-7bdb8cfb8f-8zwb8.39671: 13814* 1/0/0 PTR kubernetes.default.svc.cluster.local. (112)
E....]@.>...
`.
.....5...x..5............1.0.96.10.in-addr.arpa......1.0.96.10.in-addr.arpa..........&
kubernetes.default.svc.cluster.local.
09:32:55.305327 IP stilton-7bdb8cfb8f-8zwb8.38232 > kube-dns.kube-system.svc.cluster.local.53: 43411+ PTR? 10.0.96.10.in-addr.arpa. (41)
E..E@.@.@.,.....
`.
.X.5.1.[.............10.0.96.10.in-addr.arpa.....
09:32:55.306427 IP kube-dns.kube-system.svc.cluster.local.53 > stilton-7bdb8cfb8f-8zwb8.38232: 43411* 1/0/0 PTR kube-dns.kube-system.svc.cluster.local. (116)
E...9.@.>.4q
`.
.....5.X.|.
.............10.0.96.10.in-addr.arpa......10.0.96.10.in-addr.arpa..........(.kube-dns.kube-system.svc.cluster.local.
09:32:56.313780 IP stilton-7bdb8cfb8f-8zwb8.47154 > kubernetes.default.svc.cluster.local.443: Flags [S], seq 561230321, win 28000, options [mss 1400,sackOK,TS val 21993752 ecr 0,nop,wscale 7], length 0
E..<).@[email protected].....
`...2..!s........m`.>.....x...
.O..........
09:32:58.329749 IP stilton-7bdb8cfb8f-8zwb8.47154 > kubernetes.default.svc.cluster.local.443: Flags [S], seq 561230321, win 28000, options [mss 1400,sackOK,TS val 21994256 ecr 0,nop,wscale 7], length 0
E..<).@[email protected].....
`...2..!s........m`.>.....x...
.O..........
09:33:00.313723 ARP, Request who-has stilton-7bdb8cfb8f-8zwb8 tell 10.0.2.15, length 28
..............
.............
09:33:00.313737 ARP, Reply stilton-7bdb8cfb8f-8zwb8 is-at 86:29:d7:dc:5b:34 (oui Unknown), length 28
.........)..[4..........
...
09:33:00.313840 IP stilton-7bdb8cfb8f-8zwb8.46737 > kube-dns.kube-system.svc.cluster.local.53: 22146+ PTR? 15.2.0.10.in-addr.arpa. (40)
E..DC.@.@.(.....
`.
...5.0.ZV............15.2.0.10.in-addr.arpa.....
09:33:00.315971 IP kube-dns.kube-system.svc.cluster.local.53 > stilton-7bdb8cfb8f-8zwb8.46737: 22146 NXDomain 0/0/0 (40)
E..D>&@.>.0j
`.
.....5...0..V............15.2.0.10.in-addr.arpa.....
09:33:02.361706 IP stilton-7bdb8cfb8f-8zwb8.47154 > kubernetes.default.svc.cluster.local.443: Flags [S], seq 561230321, win 28000, options [mss 1400,sackOK,TS val 21995264 ecr 0,nop,wscale 7], length 0
E..<).@[email protected].....
`...2..!s........m`.>.....x...
.O..........
09:33:10.553717 IP stilton-7bdb8cfb8f-8zwb8.47154 > kubernetes.default.svc.cluster.local.443: Flags [S], seq 561230321, win 28000, options [mss 1400,sackOK,TS val 21997312 ecr 0,nop,wscale 7], length 0
E..<).@[email protected].....
`...2..!s........m`.>.....x...
.O..........
09:33:26.681676 IP stilton-7bdb8cfb8f-8zwb8.47154 > kubernetes.default.svc.cluster.local.443: Flags [S], seq 561230321, win 28000, options [mss 1400,sackOK,TS val 22001344 ecr 0,nop,wscale 7], length 0
E..<).@[email protected].....
`...2..!s........m`.>.....x...
.O..........
09:33:31.801700 ARP, Request who-has 169.254.1.1 tell stilton-7bdb8cfb8f-8zwb8, length 28
.........)..[4..............
09:33:31.801758 ARP, Reply 169.254.1.1 is-at ee:ee:ee:ee:ee:ee (oui Unknown), length 28
...................)..[4....
09:33:31.801997 IP stilton-7bdb8cfb8f-8zwb8.41898 > kube-dns.kube-system.svc.cluster.local.53: 24431+ PTR? 1.1.254.169.in-addr.arpa. (42)
E..FS.@.@.......
`.
...5.2.\_o...........1.1.254.169.in-addr.arpa.....
09:33:31.804662 IP kube-dns.kube-system.svc.cluster.local.53 > stilton-7bdb8cfb8f-8zwb8.41898: 24431 ServFail 0/0/0 (42)
E..F.w@.>.g.
`.
.....5...2% _o...........1.1.254.169.in-addr.arpa.....
09:33:31.804720 IP stilton-7bdb8cfb8f-8zwb8.41898 > kube-dns.kube-system.svc.cluster.local.53: 24431+ PTR? 1.1.254.169.in-addr.arpa. (42)
E..FS.@.@.......
`.
...5.2.\_o...........1.1.254.169.in-addr.arpa.....
09:33:31.807011 IP kube-dns.kube-system.svc.cluster.local.53 > stilton-7bdb8cfb8f-8zwb8.41898: 24431 ServFail 0/0/0 (42)
E..F.x@.>.g.
`.
.....5...2% _o...........1.1.254.169.in-addr.arpa.....
09:33:31.807065 IP stilton-7bdb8cfb8f-8zwb8.41898 > kube-dns.kube-system.svc.cluster.local.53: 24431+ PTR? 1.1.254.169.in-addr.arpa. (42)
E..FS.@.@.......
`.
...5.2.\_o...........1.1.254.169.in-addr.arpa.....
09:34:00.729661 IP stilton-7bdb8cfb8f-8zwb8.47154 > kubernetes.default.svc.cluster.local.443: Flags [S], seq 561230321, win 28000, options [mss 1400,sackOK,TS val 22009856 ecr 0,nop,wscale 7], length 0
E..<).@[email protected].....
`...2..!s........m`.>.....x...
.O..........
09:34:41.689607 IP6 fe80::ecee:eeff:feee:eeee > ip6-allrouters: ICMP6, router solicitation, length 16
`.....:.................................................
09:34:41.689724 IP stilton-7bdb8cfb8f-8zwb8.58792 > kube-dns.kube-system.svc.cluster.local.53: 50696+ PTR? 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (90)
E..v.U@.@.......
`.
...5.b...............2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa.....
09:34:41.690999 IP kube-dns.kube-system.svc.cluster.local.53 > stilton-7bdb8cfb8f-8zwb8.58792: 50696 1/0/0 PTR ip6-allrouters. (190)
E...u\@.>...
`.
.....5....F/.............2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa......2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa............ip6-allrouters.
09:34:41.691075 IP stilton-7bdb8cfb8f-8zwb8.41135 > kube-dns.kube-system.svc.cluster.local.53: 52957+ PTR? e.e.e.e.e.e.e.f.f.f.e.e.e.e.c.e.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa. (90)
E..v.V@.@.......
`.
...5.b...............e.e.e.e.e.e.e.f.f.f.e.e.e.e.c.e.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa.....
09:34:41.693783 IP kube-dns.kube-system.svc.cluster.local.53 > stilton-7bdb8cfb8f-8zwb8.41135: 52957 ServFail 0/0/0 (90)
E..v..@.>.R.
`.
.....5...b...............e.e.e.e.e.e.e.f.f.f.e.e.e.e.c.e.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa.....
09:34:41.693823 IP stilton-7bdb8cfb8f-8zwb8.41135 > kube-dns.kube-system.svc.cluster.local.53: 52957+ PTR? e.e.e.e.e.e.e.f.f.f.e.e.e.e.c.e.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa. (90)
E..v.W@.@.......
`.
...5.b...............e.e.e.e.e.e.e.f.f.f.e.e.e.e.c.e.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa.....
09:34:41.694915 IP kube-dns.kube-system.svc.cluster.local.53 > stilton-7bdb8cfb8f-8zwb8.41135: 52957 ServFail 0/0/0 (90)
E..v..@.>.R.
`.
.....5...b...............e.e.e.e.e.e.e.f.f.f.e.e.e.e.c.e.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa.....
09:34:46.809697 ARP, Request who-has 169.254.1.1 tell stilton-7bdb8cfb8f-8zwb8, length 28
.........)..[4..............
09:34:46.809724 ARP, Request who-has stilton-7bdb8cfb8f-8zwb8 tell 10.0.2.15, length 28
..............
.............
09:34:46.809733 ARP, Reply stilton-7bdb8cfb8f-8zwb8 is-at 86:29:d7:dc:5b:34 (oui Unknown), length 28
.........)..[4..........
...
09:34:46.809745 ARP, Reply 169.254.1.1 is-at ee:ee:ee:ee:ee:ee (oui Unknown), length 28
...................)..[4....
^C
33 packets captured
48 packets received by filter
15 packets dropped by kernel
What confuses me here is this line: 09:33:00.313723 ARP, Request who-has stilton-7bdb8cfb8f-8zwb8 tell 10.0.2.15. 10.0.2.15 is the ip address of the NAT interface.
On each node I have bound the IP address of kubelet to the Host-Only Adapter.
Also calicoctl shows that IP address:
root@master:~# calicoctl get nodes -o wide
NAME ASN IPV4 IPV6
master (unknown) 192.168.56.10/24
node1 (unknown) 192.168.56.11/24
node2 (unknown) 192.168.56.12/24
node3 (unknown) 192.168.56.13/24
Also the IP tables on a node look fine to me:
root@node1:~# iptables-save | grep "10.96.0.1/"
-A KUBE-SERVICES ! -s 192.168.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: stilton
labels:
app: cheese
cheese: stilton
spec:
replicas: 3
selector:
matchLabels:
app: cheese
task: stilton
template:
metadata:
labels:
app: cheese
task: stilton
version: v0.0.1
spec:
containers:
- name: cheese
image: errm/cheese:stilton
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: stilton
spec:
ports:
- name: http
targetPort: 80
port: 80
selector:
app: cheese
task: stilton
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: cheeses
annotations:
kubernetes.io/ingress.class: traefik
traefik.frontend.rule.type: PathPrefixStrip
spec:
rules:
- host: master.test.io
http:
paths:
- path: /stilton
backend:
serviceName: stilton
servicePort: http
Does it have to do something with the namespaces / permissions?
Or is it some other kind of weird networking error?
I created a small test kubernetes setup with 4 Virtualbox VMs running debian (9.5).
Each VM has two network interfaces (enp0s3 & enp0s8).
enp0s3 is a NAT interface
enp0s8 is a host-only adapter. Each of the VMs has a static IP for this. (192.168.56.10-14) 10master, 11node1, ...
Install kubernetes (https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/)
Install calico (https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/calico#installing-with-the-kubernetes-api-datastore50-nodes-or-less)
(also added traefik which is working fine so far)
I want to try to enable glusterfs in my nodes with heketi. To install the topology the heketi-pod needs access to the kube-api.
Calico version:
Client Version: v3.2.0
Build date: 2018-08-10T18:04:05+0000
Git commit: c158322b
Cluster Version: v3.2.0
Cluster Type: k8s,bgp,kdd
Orchestrator version (e.g. kubernetes, mesos, rkt):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Operating System and version:
Distributor ID: Debian
Description: Debian GNU/Linux 9.5 (stretch)
Release: 9.5
Codename: stretch
Link to your project (optional):
I appreciate your help on this!
Meanwhile, I changed my setup to use only one (bridged) adapter in the VMs. The issue however still persists.
If the pod's IP is equal to one of the host IPs the connection still works OR if the pod's IP is different but it is scheduled on the master.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
- default pod/deploy-heketi-559446b649-t92b4 1/1 Running 0 2h 192.168.3.11 node3 <none>
+ default pod/glusterfs-79z8k 1/1 Running 0 2h 192.168.4.4 node1 <none>
+ default pod/glusterfs-gmv7r 1/1 Running 0 2h 192.168.4.5 node2 <none>
+ default pod/glusterfs-ptfvd 1/1 Running 0 2h 192.168.4.6 node3 <none>
- default pod/stilton-7bdb8cfb8f-2lndt 1/1 Running 0 1h 192.168.1.25 node1 <none>
- default pod/stilton-7bdb8cfb8f-gf5p2 1/1 Running 0 1h 192.168.3.13 node3 <none>
- default pod/stilton-7bdb8cfb8f-p9vmh 1/1 Running 0 1h 192.168.2.13 node2 <none>
+ kube-system pod/calico-node-bh9ff 2/2 Running 0 2h 192.168.4.5 node2 <none>
+ kube-system pod/calico-node-c4x9v 2/2 Running 0 2h 192.168.4.4 node1 <none>
+ kube-system pod/calico-node-d959r 2/2 Running 0 2h 192.168.4.6 node3 <none>
+ kube-system pod/calico-node-xd7sp 2/2 Running 2 2h 192.168.4.3 master <none>
+ kube-system pod/coredns-78fcdf6894-clcd5 1/1 Running 3 5h 192.168.0.24 master <none>
+ kube-system pod/coredns-78fcdf6894-ppfcb 1/1 Running 3 5h 192.168.0.25 master <none>
+ kube-system pod/etcd-master 1/1 Running 3 5h 192.168.4.3 master <none>
+ kube-system pod/kube-apiserver-master 1/1 Running 3 5h 192.168.4.3 master <none>
+ kube-system pod/kube-controller-manager-master 1/1 Running 3 5h 192.168.4.3 master <none>
+ kube-system pod/kube-proxy-cd48f 1/1 Running 3 5h 192.168.4.4 node1 <none>
+ kube-system pod/kube-proxy-mf82f 1/1 Running 3 5h 192.168.4.3 master <none>
+ kube-system pod/kube-proxy-rpktr 1/1 Running 3 5h 192.168.4.6 node3 <none>
+ kube-system pod/kube-proxy-tft99 1/1 Running 3 5h 192.168.4.5 node2 <none>
+ kube-system pod/kube-scheduler-master 1/1 Running 3 5h 192.168.4.3 master <none>
+ kube-system pod/traefik-ingress-controller-6bwtn 1/1 Running 1 2h 192.168.4.3 master <none>
+ kube-system pod/traefik-ingress-controller-xwl86 1/1 Running 0 2h 192.168.4.4 node1 <none>
+ kube-system pod/traefik-ingress-controller-xzfsk 1/1 Running 0 2h 192.168.4.5 node2 <none>
+ kube-system pod/traefik-ingress-controller-zbccv 1/1 Running 0 2h 192.168.4.6 node3 <none>
+ container/pod has access to kube-api / - fails.
It looks like maybe your pod CIDR and host CIDR overlap, both are in the 192.168.0.0/16 range. While this is possible it can lead to problems in most cases so I would suggest making sure they do not overlap by either changing your Pod CIDR (also the Calico IP Pool) or changing the CIDR used for your VMs.
The description of your problem doesn't specifically indicate the CIDR overlap as the problem but I think would be a good thing to first address and see if the problem still exists.
You may also want to consider configuring calico's IP/interface autodetection, it is sometimes necessary when running on multi-interface hosts. https://docs.projectcalico.org/v3.2/reference/node/configuration#ip-autodetection-methods
After those I'd suggest testing pod to pod traffic between nodes and master if possible, specifically testing using the pod IPs instead of a service. That will remove the indirection K8s services introduce and services can hide some problems that need to be addressed when having problems.
Thanks!
Actually, I already adjusted the calico IP autodetection but I forgot to mention it (sorry about that).
See my yaml file for reference:
# Calico Version v3.2.0
# https://docs.projectcalico.org/v3.2/releases#v3.2.0
# This manifest includes the following component versions:
# calico/node:v3.2.0
# calico/cni:v3.2.0
# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
# To enable Typha, set this to "calico-typha" *and* set a non-zero value for Typha replicas
# below. We recommend using Typha if you have more than 50 nodes. Above 100 nodes it is
# essential.
typha_service_name: "none"
# Configure the Calico backend to use.
calico_backend: "bird"
# Configure the MTU to use
veth_mtu: "1440"
# The CNI network configuration to install on each node. The special
# values in this config will be automatically populated.
cni_network_config: |-
{
"name": "k8s-pod-network",
"cniVersion": "0.3.0",
"plugins": [
{
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "__KUBERNETES_NODE_NAME__",
"mtu": __CNI_MTU__,
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "__KUBECONFIG_FILEPATH__"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
}
]
}
---
# This manifest creates a Service, which will be backed by Calico's Typha daemon.
# Typha sits in between Felix and the API server, reducing Calico's load on the API server.
apiVersion: v1
kind: Service
metadata:
name: calico-typha
namespace: kube-system
labels:
k8s-app: calico-typha
spec:
ports:
- port: 5473
protocol: TCP
targetPort: calico-typha
name: calico-typha
selector:
k8s-app: calico-typha
---
# This manifest creates a Deployment of Typha to back the above service.
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: calico-typha
namespace: kube-system
labels:
k8s-app: calico-typha
spec:
# Number of Typha replicas. To enable Typha, set this to a non-zero value *and* set the
# typha_service_name variable in the calico-config ConfigMap above.
#
# We recommend using Typha if you have more than 50 nodes. Above 100 nodes it is essential
# (when using the Kubernetes datastore). Use one replica for every 100-200 nodes. In
# production, we recommend running at least 3 replicas to reduce the impact of rolling upgrade.
replicas: 0
revisionHistoryLimit: 2
template:
metadata:
labels:
k8s-app: calico-typha
annotations:
# This, along with the CriticalAddonsOnly toleration below, marks the pod as a critical
# add-on, ensuring it gets priority scheduling and that its resources are reserved
# if it ever gets evicted.
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
hostNetwork: true
tolerations:
# Mark the pod as a critical add-on for rescheduling.
- key: CriticalAddonsOnly
operator: Exists
# Since Calico can't network a pod until Typha is up, we need to run Typha itself
# as a host-networked pod.
serviceAccountName: calico-node
containers:
- image: quay.io/calico/typha:v3.2.0
name: calico-typha
ports:
- containerPort: 5473
name: calico-typha
protocol: TCP
env:
# Enable "info" logging by default. Can be set to "debug" to increase verbosity.
- name: TYPHA_LOGSEVERITYSCREEN
value: "info"
# Disable logging to file and syslog since those don't make sense in Kubernetes.
- name: TYPHA_LOGFILEPATH
value: "none"
- name: TYPHA_LOGSEVERITYSYS
value: "none"
# Monitor the Kubernetes API to find the number of running instances and rebalance
# connections.
- name: TYPHA_CONNECTIONREBALANCINGMODE
value: "kubernetes"
- name: TYPHA_DATASTORETYPE
value: "kubernetes"
- name: TYPHA_HEALTHENABLED
value: "true"
# Uncomment these lines to enable prometheus metrics. Since Typha is host-networked,
# this opens a port on the host, which may need to be secured.
#- name: TYPHA_PROMETHEUSMETRICSENABLED
# value: "true"
#- name: TYPHA_PROMETHEUSMETRICSPORT
# value: "9093"
livenessProbe:
exec:
command:
- calico-typha
- check
- liveness
periodSeconds: 30
initialDelaySeconds: 30
readinessProbe:
exec:
command:
- calico-typha
- check
- readiness
periodSeconds: 10
---
# This manifest installs the calico/node container, as well
# as the Calico CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: calico-node
namespace: kube-system
labels:
k8s-app: calico-node
spec:
selector:
matchLabels:
k8s-app: calico-node
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
k8s-app: calico-node
annotations:
# This, along with the CriticalAddonsOnly toleration below,
# marks the pod as a critical add-on, ensuring it gets
# priority scheduling and that its resources are reserved
# if it ever gets evicted.
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
hostNetwork: true
tolerations:
# Make sure calico-node gets scheduled on all nodes.
- effect: NoSchedule
operator: Exists
# Mark the pod as a critical add-on for rescheduling.
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
serviceAccountName: calico-node
# Minimize downtime during a rolling upgrade or deletion; tell Kubernetes to do a "force
# deletion": https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
terminationGracePeriodSeconds: 0
containers:
# Runs calico/node container on each Kubernetes node. This
# container programs network policy and routes on each
# host.
- name: calico-node
image: quay.io/calico/node:v3.2.0
env:
# Use Kubernetes API as the backing datastore.
- name: DATASTORE_TYPE
value: "kubernetes"
# Typha support: controlled by the ConfigMap.
- name: FELIX_TYPHAK8SSERVICENAME
valueFrom:
configMapKeyRef:
name: calico-config
key: typha_service_name
# Wait for the datastore.
- name: WAIT_FOR_DATASTORE
value: "true"
# Set based on the k8s node name.
- name: NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# Choose the backend to use.
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
name: calico-config
key: calico_backend
# Cluster type to identify the deployment type
- name: CLUSTER_TYPE
value: "k8s,bgp"
# Auto-detect the BGP IP address.
- name: IP
value: "autodetect"
# https://github.com/projectcalico/calico/issues/2042#issuecomment-408488357
- name: IP_AUTODETECTION_METHOD
value: "interface=enp0s3"
# Enable IPIP
- name: CALICO_IPV4POOL_IPIP
value: "Always"
# Enable IP-in-IP within Felix.
- name: FELIX_IPINIPENABLED
value: "true"
# Set MTU for tunnel device used if ipip is enabled
- name: FELIX_IPINIPMTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
# The default IPv4 pool to create on startup if none exists. Pod IPs will be
# chosen from this range. Changing this value after installation will have
# no effect. This should fall within `--cluster-cidr`.
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
# Disable file logging so `kubectl logs` works.
- name: CALICO_DISABLE_FILE_LOGGING
value: "true"
# Set Felix endpoint to host default action to ACCEPT.
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: "ACCEPT"
# Disable IPv6 on Kubernetes.
- name: FELIX_IPV6SUPPORT
value: "false"
# Set Felix logging to "info"
- name: FELIX_LOGSEVERITYSCREEN
value: "info"
- name: FELIX_HEALTHENABLED
value: "true"
securityContext:
privileged: true
resources:
requests:
cpu: 250m
livenessProbe:
httpGet:
path: /liveness
port: 9099
host: localhost
periodSeconds: 10
initialDelaySeconds: 10
failureThreshold: 6
readinessProbe:
exec:
command:
- /bin/calico-node
- -bird-ready
- -felix-ready
periodSeconds: 10
volumeMounts:
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /var/run/calico
name: var-run-calico
readOnly: false
- mountPath: /var/lib/calico
name: var-lib-calico
readOnly: false
# This container installs the Calico CNI binaries
# and CNI network config file on each node.
- name: install-cni
image: quay.io/calico/cni:v3.2.0
command: ["/install-cni.sh"]
env:
# Name of the CNI config file to create.
- name: CNI_CONF_NAME
value: "10-calico.conflist"
# Set the hostname based on the k8s node name.
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# The CNI network config to install on each node.
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
name: calico-config
key: cni_network_config
# CNI MTU Config variable
- name: CNI_MTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
volumes:
# Used by calico/node.
- name: lib-modules
hostPath:
path: /lib/modules
- name: var-run-calico
hostPath:
path: /var/run/calico
- name: var-lib-calico
hostPath:
path: /var/lib/calico
# Used to install CNI.
- name: cni-bin-dir
hostPath:
path: /opt/cni/bin
- name: cni-net-dir
hostPath:
path: /etc/cni/net.d
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-node
namespace: kube-system
---
# Create all the CustomResourceDefinitions needed for
# Calico policy and networking mode.
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: felixconfigurations.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: FelixConfiguration
plural: felixconfigurations
singular: felixconfiguration
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: bgppeers.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: BGPPeer
plural: bgppeers
singular: bgppeer
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: bgpconfigurations.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: BGPConfiguration
plural: bgpconfigurations
singular: bgpconfiguration
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: ippools.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: IPPool
plural: ippools
singular: ippool
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: hostendpoints.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: HostEndpoint
plural: hostendpoints
singular: hostendpoint
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: clusterinformations.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: ClusterInformation
plural: clusterinformations
singular: clusterinformation
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: globalnetworkpolicies.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: GlobalNetworkPolicy
plural: globalnetworkpolicies
singular: globalnetworkpolicy
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: globalnetworksets.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: GlobalNetworkSet
plural: globalnetworksets
singular: globalnetworkset
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: networkpolicies.crd.projectcalico.org
spec:
scope: Namespaced
group: crd.projectcalico.org
version: v1
names:
kind: NetworkPolicy
plural: networkpolicies
singular: networkpolicy
I will try to change the network of my VMs and try out the pod-to-pod connection.
I read that kube-proxy also has a setting to bind to an interface. I'm not quite sure how to configure this in my setup (with kubeadm) as the kube-proxies are deployed as a daemonset. I tried to eliminate this binding possibilities by switching to a bridged adapter in my VMs and, therefore, only have one network-interface available.
Thank you for all the hints!
Thank you @tmjd ! Switching the host network to 11.0/ helped! I have tried pod-pod, pod-service, pod-kubeapi and all of the requests work seamlessly :)
I'm trying to establish a glusterfs-cluster on the three worker-nodes I have (https://github.com/gluster/gluster-kubernetes/issues/509).
It doesn't fail anymore at the stage where heketi needs to access the kube-api (my workaround for this was also to schedule the pod on the master but now it's even better). However it fails at some stage later and I receive yet again some network issues (?) in the syslog on the hosts of the system:
Aug 31 21:38:02 node1 kubelet[351]: E0831 21:38:02.994663 351 upgradeaware.go:310] Error proxying data from client to backend: readfrom tcp 127.0.0.1:41290->127.0.0.1:40263: write tcp 127.0.0.1:41290->127.0.0.1:40263: write: broken pipe
Aug 31 21:40:02 node1 kubelet[351]: E0831 21:40:02.991630 351 upgradeaware.go:310] Error proxying data from client to backend: readfrom tcp 127.0.0.1:41442->127.0.0.1:40263: write tcp 127.0.0.1:41442->127.0.0.1:40263: write: broken pipe
Aug 31 21:42:02 node1 kubelet[351]: E0831 21:42:02.981246 351 upgradeaware.go:310] Error proxying data from client to backend: readfrom tcp 127.0.0.1:41594->127.0.0.1:40263: write tcp 127.0.0.1:41594->127.0.0.1:40263: write: broken pipe
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
deploy-heketi-559446b649-542dd 1/1 Running 0 4h 192.168.1.33 node1 <none>
glusterfs-jvx46 1/1 Running 0 4h 11.0.1.101 node1 <none>
glusterfs-rzzdd 1/1 Running 0 4h 11.0.1.103 node3 <none>
glusterfs-wr5jv 1/1 Running 0 4h 11.0.1.102 node2 <none>
I'm going to close this issue as the original problem has been fixed and I do not think your new problem would be due to Calico. It looks like the traffic is using localhost which Calico is not involved with setting up.
If you do believe your current issue is due to Calico please open another issue and provide details on the problem and what indicates it is a Calico issue.
Most helpful comment
It looks like maybe your pod CIDR and host CIDR overlap, both are in the 192.168.0.0/16 range. While this is possible it can lead to problems in most cases so I would suggest making sure they do not overlap by either changing your Pod CIDR (also the Calico IP Pool) or changing the CIDR used for your VMs.
The description of your problem doesn't specifically indicate the CIDR overlap as the problem but I think would be a good thing to first address and see if the problem still exists.
You may also want to consider configuring calico's IP/interface autodetection, it is sometimes necessary when running on multi-interface hosts. https://docs.projectcalico.org/v3.2/reference/node/configuration#ip-autodetection-methods
After those I'd suggest testing pod to pod traffic between nodes and master if possible, specifically testing using the pod IPs instead of a service. That will remove the indirection K8s services introduce and services can hide some problems that need to be addressed when having problems.