Minikube: CoreDNS fails on minions on multi-node clusters. Can't resolve external DNS from non-master pods.

Created on 9 May 2020  路  13Comments  路  Source: kubernetes/minikube


So, I already fixed this and lost some of the logs. But it's pretty straight-forward.

  1. Make a cluster
minikube start --vm-driver=kvm2 --cpus=2 --nodes 3 --network-plugin=cni \
--addons registry --enable-default-cni=false \
--insecure-registry "10.0.0.0/24" --insecure-registry "192.168.39.0/24" \
--extra-config=kubeadm.pod-network-cidr=10.244.0.0/16 \
--extra-config=kubelet.network-plugin=cni
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

n.b. I built from head a couple days ago

minikube version: v1.10.0-beta.2
commit: 80c3324b6f526911d46033721df844174fe7f597
  1. make a pod on master and a pod on a node
  2. from node pod: curl google.com
  3. from master pod: curl google.com

CoreDNS was crashing per https://github.com/kubernetes/kubernetes/issues/75414

Fixed with

kubectl patch deployment coredns -n kube-system --patch '{"spec":{"template":{"spec":{"volumes":[{"name":"emptydir-tmp","emptyDir":{}}],"containers":[{"name":"coredns","volumeMounts":[{"name":"emptydir-tmp","mountPath":"/tmp"}]}]}}}}' 

Edit: had wrong flannel yaml listed.

arecni arenetworking ccoredns cmultinode kinbug kinsupport prioritimportant-soon

All 13 comments

@aasmall thank you for bringing this to our attention,
interesting !

I have a few questions
1- does this happen only when you have flannel as cni ? or does it happen for all cni ?
2- does this happen only on multi node clusetrs?

I assume it doesn't happen for normal docker runtime no cni secnarios ?

multi node is experimental at the moment but we have WIP PRs that would remove the need for flannel .

@sharifelgamal

HEAD should no longer need flannel at all, we should automatically apply CNI for multinode

@medyagh
1) it applies to all CNIs as it's in CoreDNS
2) the bug inherently only applies to multi-node clusters

@sharifelgamal - Thank you. I'll validate in a spell. Busy working on the actual app rn, though I AM having a lot of fun playing with minikube.

Not sure this is related or not, But I experienced dns failing on the minions.
started like:
minikube start --cpus 2 --memory=2096 --disk-size=20000mb -n 3
on minikube version: v1.10.1
coredns seems stable, but not accessible outside of the master.
linux, kvm

Tried disabling kindnet so I could add my own driver:

minikube start --cpus 2 --memory=2048 --disk-size=20000mb -n 3 --enable-default-cni=false --network-plugin=cni --extra-config=kubeadm.pod-network-cidr=10.244.0.0/16
$ kubectl get pods -n kube-system | grep kindnet
kindnet-b4qqn                      1/1     Running   0          45s
kindnet-tvlt5                      1/1     Running   0          32s
kindnet-xxmk2                      1/1     Running   0          14s

Not sure how to disable kindnet.

I checked connectivity in the pods via launching a pod on each node and trying to connect to each other with nc.

workers work. master connectivity is not.

I deleted the coredns pods and they restarted on the non master nodes. and dns started working.

So something is not working with kindnet on the master.

there seems to be a difference between the master and the workers. Not sure its relevant though:

$ diff -u /tmp/ip-worker3 /tmp/ip-master 
--- /tmp/ip-worker3 2020-05-16 10:59:37.064470264 -0700
+++ /tmp/ip-master  2020-05-16 10:59:16.232281763 -0700
@@ -1,9 +1,11 @@
-# Generated by iptables-save v1.8.3 on Sat May 16 17:58:30 2020
+# Generated by iptables-save v1.8.3 on Sat May 16 17:59:03 2020
 *nat
-:PREROUTING ACCEPT [75:5268]
-:INPUT ACCEPT [2:120]
-:OUTPUT ACCEPT [79:4740]
-:POSTROUTING ACCEPT [150:9776]
+:PREROUTING ACCEPT [31:1860]
+:INPUT ACCEPT [29:1740]
+:OUTPUT ACCEPT [215:12962]
+:POSTROUTING ACCEPT [191:11460]
+:CNI-4b264cc7114301b74b8d967a - [0:0]
+:CNI-de8faca36f95f967aca64e60 - [0:0]
 :DOCKER - [0:0]
 :KIND-MASQ-AGENT - [0:0]
 :KUBE-KUBELET-CANARY - [0:0]
@@ -30,7 +32,13 @@
 -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
 -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
 -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
+-A POSTROUTING -s 10.88.0.2/32 -m comment --comment "name: \"podman\" id: \"9af909fd3a3d3822201c1c4a504ec7baa704b71309872154020ea915b8571d88\"" -j CNI-de8faca36f95f967aca64e60
+-A POSTROUTING -s 10.88.0.3/32 -m comment --comment "name: \"podman\" id: \"84903f24fb88f7a7d040819e0865388cd9112ce365881e1f1284e4a5add42438\"" -j CNI-4b264cc7114301b74b8d967a
 -A POSTROUTING -m addrtype ! --dst-type LOCAL -m comment --comment "kind-masq-agent: ensure nat POSTROUTING directs all non-LOCAL destination traffic to our custom KIND-MASQ-AGENT chain" -j KIND-MASQ-AGENT
+-A CNI-4b264cc7114301b74b8d967a -d 10.88.0.0/16 -m comment --comment "name: \"podman\" id: \"84903f24fb88f7a7d040819e0865388cd9112ce365881e1f1284e4a5add42438\"" -j ACCEPT
+-A CNI-4b264cc7114301b74b8d967a ! -d 224.0.0.0/4 -m comment --comment "name: \"podman\" id: \"84903f24fb88f7a7d040819e0865388cd9112ce365881e1f1284e4a5add42438\"" -j MASQUERADE
+-A CNI-de8faca36f95f967aca64e60 -d 10.88.0.0/16 -m comment --comment "name: \"podman\" id: \"9af909fd3a3d3822201c1c4a504ec7baa704b71309872154020ea915b8571d88\"" -j ACCEPT
+-A CNI-de8faca36f95f967aca64e60 ! -d 224.0.0.0/4 -m comment --comment "name: \"podman\" id: \"9af909fd3a3d3822201c1c4a504ec7baa704b71309872154020ea915b8571d88\"" -j MASQUERADE
 -A DOCKER -i docker0 -j RETURN
 -A KIND-MASQ-AGENT -d 10.244.0.0/16 -m comment --comment "kind-masq-agent: local traffic is not subject to MASQUERADE" -j RETURN
 -A KIND-MASQ-AGENT -m comment --comment "kind-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain)" -j MASQUERADE
@@ -68,23 +76,25 @@
 -A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-EJJ3L23ZA35VLW6X
 -A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-RJHMR3QLYGJVBWVL
 COMMIT
-# Completed on Sat May 16 17:58:30 2020
-# Generated by iptables-save v1.8.3 on Sat May 16 17:58:30 2020
+# Completed on Sat May 16 17:59:03 2020
+# Generated by iptables-save v1.8.3 on Sat May 16 17:59:03 2020
 *mangle
-:PREROUTING ACCEPT [35417:67062632]
-:INPUT ACCEPT [34008:66687141]
-:FORWARD ACCEPT [1389:374215]
-:OUTPUT ACCEPT [20262:1829311]
-:POSTROUTING ACCEPT [21666:2204396]
+:PREROUTING ACCEPT [405427:144956612]
+:INPUT ACCEPT [405404:144954818]
+:FORWARD ACCEPT [14:1090]
+:OUTPUT ACCEPT [400927:112195409]
+:POSTROUTING ACCEPT [400950:112196985]
 :KUBE-KUBELET-CANARY - [0:0]
 :KUBE-PROXY-CANARY - [0:0]
 COMMIT
-# Completed on Sat May 16 17:58:30 2020
-# Generated by iptables-save v1.8.3 on Sat May 16 17:58:30 2020
+# Completed on Sat May 16 17:59:03 2020
+# Generated by iptables-save v1.8.3 on Sat May 16 17:59:03 2020
 *filter
-:INPUT ACCEPT [1837:1495255]
-:FORWARD ACCEPT [143:10102]
-:OUTPUT ACCEPT [1864:289671]
+:INPUT ACCEPT [85938:56432847]
+:FORWARD ACCEPT [10:600]
+:OUTPUT ACCEPT [82148:22332958]
+:CNI-ADMIN - [0:0]
+:CNI-FORWARD - [0:0]
 :DOCKER - [0:0]
 :DOCKER-ISOLATION-STAGE-1 - [0:0]
 :DOCKER-ISOLATION-STAGE-2 - [0:0]
@@ -98,6 +108,8 @@
 -A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
 -A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
 -A INPUT -j KUBE-FIREWALL
+-A FORWARD -m comment --comment "CNI firewall plugin rules" -j CNI-FORWARD
+-A FORWARD -m comment --comment "CNI firewall plugin rules" -j CNI-FORWARD
 -A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
 -A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
 -A FORWARD -j DOCKER-USER
@@ -108,6 +120,11 @@
 -A FORWARD -i docker0 -o docker0 -j ACCEPT
 -A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
 -A OUTPUT -j KUBE-FIREWALL
+-A CNI-FORWARD -m comment --comment "CNI firewall plugin rules" -j CNI-ADMIN
+-A CNI-FORWARD -d 10.88.0.3/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
+-A CNI-FORWARD -d 10.88.0.2/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
+-A CNI-FORWARD -s 10.88.0.3/32 -j ACCEPT
+-A CNI-FORWARD -s 10.88.0.2/32 -j ACCEPT
 -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
 -A DOCKER-ISOLATION-STAGE-1 -j RETURN
 -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
@@ -119,5 +136,5 @@
 -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
 -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
 COMMIT
-# Completed on Sat May 16 17:58:30 2020
+# Completed on Sat May 16 17:59:03 2020

Hey @kfox1111 thanks for providing the additional info. Looks like this is a bug & will be tracked in #7538

I'm going to take a look into this today.

I think this is a bug, but at the same time, I think it should be a fairly rare bug to run into, at least with the current state of minikube: minikube will only deploy CoreDNS to the master pod by default. Even if you scale the deployment to 30 replicas.

I do see now that it does not appear to be possible to select CNI's in multi-node (kind is applied by default): That will be fixed by #8222 - probably by adding a flag like --cni=flannel.

I scaled minikube up to 150 DNS replicas in order to get it scaled across the 3 nodes, and had no issue with pods crashing or not resolving records. I wonder if we accidentally fixed this due to applying a default CNI.

$ ./out/minikube start -n=3 --enable-default-cni=false --network-plugin=cni
$ kubectl scale deployment --replicas=150 coredns --namespace=kube-system

I will revisit this once I'm able to disable kindnet as part of #8222

My tests were based on https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution.

My env
Ubuntu 19.10
Minikube v1.11.0
Multi-node
KVM2


Scenario 1
minikube start -p dns --cpus=2 --memory=2g --nodes=2 --driver=kvm2 --extra-config=kubelet.resolv-conf=/run/systemd/resolve/resolv.conf

  • CoreDNS started normally. Just an event.

    • Event: Warning FailedScheduling 36m default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.

    • DNS did not work internally in a pod.

    • kubectl exec -ti dnsutils -- nslookup kubernetes.default

    • delete DNS pods, k8s recreates DNS PODS. Then DNS works normally.


Scenario 2
minikube start -p dns --cpus=2 --memory=2g --nodes=2 --driver=kvm2 --enable-default-cni=false --network-plugin=cni

  • CoreDNS started normally. Just an event.

    • Event: Event: Warning FailedScheduling 36m default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.

  • DNS did not work internally in a pod.

    • kubectl exec -ti dnsutils -- nslookup kubernetes.default

  • delete DNS pods, k8s recreates DNS PODS. Then DNS works normally in pods.

Conclusion:

  • Inicially the DNS pods were hosted on master node. And DNS was not working in PODs.
  • in both situation I had to delete de DNS pods. Then DNS pods were spread to the nodes. DNS worked in PODS
  • Forced DNS Pods run only in master node. It worked normally.
  • It's somehow cluster's startup related.

After testing, I can confirm that resolution of Kubernetes hosts from non-master pods is broken. I was not able to replicate issues with DNS resolution, however.

In a nutshell, I believe that the issue of CoreDNS access from non-master nodes is a sign of a broken CNI configuration. I'll continue to investigate.

Was this page helpful?
0 / 5 - 0 ratings