What happened:
There is no DNS within the cluster, pods cannot resolve services within the cluster or access the internet on hostnames.
What you expected to happen:
DNS working to the internet and to services within the cluster
How to reproduce it (as minimally and precisely as possible):
I am running kind in WSL (v1) setup to run containers in Docker for Desktop installed within Win10. Docker for Desktop exposes the daemon over tcp and the DOCKER_HOST env var within WSL is set to tcp://127.0.0.1:2375.
I used the following configuration
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: worker
extraPortMappings:
# Home Assistant
- containerPort: 31123
hostPort: 31123
listenAddress: 127.0.0.1
# Mosquitto
- containerPort: 31883
hostPort: 31883
listenAddress: 127.0.0.1
# Traefik
- containerPort: 30080
hostPort: 30080
listenAddress: 127.0.0.1
# Traefik
- containerPort: 30443
hostPort: 30443
listenAddress: 127.0.0.1
Anything else we need to know?:
I am experiencing a very similar situation on my home arm64 cluster so this might not be related to kind and more to kubernetes. I am not to the bottom of this issue yet but the symptoms (no DNS) are the same.
I updated the CoreDNS configmap the following to see the logs and I updated the forward to 9.9.9.9 to ensure it isn't a loopback issue in /etc/resolv.conf (link):
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . 9.9.9.9
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2019-07-16T13:00:43Z"
name: coredns
namespace: kube-system
resourceVersion: "1234"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: 862c0677-19d5-48ee-8647-1b962737e2c9
When I start a simple shell pod, drop to the shell and execute ping google.com:
---
# Drop to shell like kubectl exec -it shell -- /bin/sh
apiVersion: v1
kind: Pod
metadata:
name: shell
spec:
containers:
- image: alpine:3.9
args:
- sleep
- "1000000"
name: shell
I see the logs in CoreDNS:
2019-07-16T13:27:19.993Z [INFO] 10.244.1.2:50752 - 43950 "AAAA IN google.com.automating.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.000251s
2019-07-16T13:27:19.993Z [INFO] 10.244.1.2:50752 - 43450 "A IN google.com.automating.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.0003521s
2019-07-16T13:27:19.994Z [INFO] 172.17.0.1:38702 - 3964 "AAAA IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.0002506s
2019-07-16T13:27:19.994Z [INFO] 172.17.0.1:38702 - 3364 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000387s
2019-07-16T13:27:22.494Z [INFO] 172.17.0.1:38702 - 3964 "AAAA IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,rd 139 0.0001031s
2019-07-16T13:27:22.494Z [INFO] 172.17.0.1:38702 - 3364 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,rd 139 0.0002011s
But ping returns bad address. Same issue for an internal service.
More debugging output:
/ # cat /etc/resolv.conf
search automating.svc.cluster.local svc.cluster.local cluster.local intermax.local
nameserver 10.96.0.10
options ndots:5
/ # ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=36 time=10.310 ms
64 bytes from 8.8.8.8: seq=1 ttl=36 time=10.009 ms
md5-4077bb26433eac2a821041a2b9435c41
kubectl -n kube-system get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 28m
md5-4077bb26433eac2a821041a2b9435c41
kubectl -n kube-system get endpoints
NAME ENDPOINTS AGE
kube-controller-manager <none> 29m
kube-dns 10.244.0.4:53,10.244.1.3:53,10.244.0.4:53 + 3 more... 28m
kube-scheduler <none> 29m
Environment:
/etc/os-release): Windows WSL (v1)can you check if you have SERVFAIL answers in CoreDNS logs?
@aojea Thanks for the fast respone <3
No SERVFAIL answers in the log, this is the current full log:
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-07-16T13:10:50.710Z [INFO] plugin/reload: Running configuration MD5 = 4535707ba6147e45b0d2cb9e689e1760
2019-07-16T13:11:07.754Z [INFO] 172.17.0.1:52318 - 6580 "AAAA IN dl-cdn.alpinelinux.org.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,rd 151 0.0002369s
2019-07-16T13:10:50.735Z [INFO] 127.0.0.1:52319 - 53664 "HINFO IN 5478662953990634785.3045020278082460155. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.0249001s
2019-07-16T13:11:07.755Z [INFO] 172.17.0.1:52318 - 6080 "A IN dl-cdn.alpinelinux.org.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,rd 151 0.0006047s
2019-07-16T13:11:00.244Z [INFO] 10.244.1.2:37159 - 52400 "A IN dl-cdn.alpinelinux.org.automating.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd 162 0.000206s
2019-07-16T13:27:19.994Z [INFO] 172.17.0.1:38702 - 3964 "AAAA IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.0002506s
2019-07-16T13:27:19.994Z [INFO] 172.17.0.1:38702 - 3364 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000387s
2019-07-16T13:11:00.244Z [INFO] 10.244.1.2:37159 - 53000 "AAAA IN dl-cdn.alpinelinux.org.automating.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd 162 0.0004419s
2019-07-16T13:27:22.494Z [INFO] 172.17.0.1:38702 - 3964 "AAAA IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,rd 139 0.0001031s
2019-07-16T13:11:05.250Z [INFO] 10.244.1.2:53339 - 25600 "AAAA IN dl-cdn.alpinelinux.org.automating.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,rd 162 0.0002075s
2019-07-16T13:27:22.494Z [INFO] 172.17.0.1:38702 - 3364 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,rd 139 0.0002011s
2019-07-16T13:30:38.074Z [INFO] 172.17.0.1:50198 - 15930 "AAAA IN kubernetes.automating.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.0001651s
2019-07-16T13:11:05.251Z [INFO] 10.244.1.2:53339 - 25100 "A IN dl-cdn.alpinelinux.org.automating.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,rd 162 0.0003713s
2019-07-16T13:30:38.074Z [INFO] 172.17.0.1:50198 - 15030 "A IN kubernetes.automating.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.0002506s
2019-07-16T13:27:19.993Z [INFO] 10.244.1.2:50752 - 43950 "AAAA IN google.com.automating.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.000251s
2019-07-16T13:27:19.993Z [INFO] 10.244.1.2:50752 - 43450 "A IN google.com.automating.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.0003521s
2019-07-16T13:30:40.576Z [INFO] 172.17.0.1:50198 - 15930 "AAAA IN kubernetes.automating.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,rd 150 0.000052s
2019-07-16T13:30:40.576Z [INFO] 172.17.0.1:50198 - 15030 "A IN kubernetes.automating.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,rd 150 0.0000963s
shouldn麓t it be trying to append all the domains?
search automating.svc.cluster.local svc.cluster.local cluster.local intermax.local
Hmm I don't know but it also does not resolve to the outside where that shouldn't be an issue.
Edit: I will run the same test on a prod cluster to see CoreDNS output :)
Ran an apk update and ping kubernetes in the same shell pod an a prod cluster:
2019-07-16T14:50:12.113Z [INFO] 10.233.70.174:56125 - 17900 "AAAA IN dl-cdn.alpinelinux.org.default.svc.autotest.local. udp 67 false 512" NXDOMAIN qr,aa,rd,ra 163 0.000251416s
2019-07-16T14:50:12.113Z [INFO] 10.233.70.174:56125 - 17616 "A IN dl-cdn.alpinelinux.org.default.svc.autotest.local. udp 67 false 512" NXDOMAIN qr,aa,rd,ra 163 0.000341572s
2019-07-16T14:50:12.115Z [INFO] 10.233.70.174:50949 - 25837 "AAAA IN dl-cdn.alpinelinux.org.autotest.local. udp 55 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000083053s
2019-07-16T14:50:12.115Z [INFO] 10.233.70.174:50949 - 25537 "A IN dl-cdn.alpinelinux.org.autotest.local. udp 55 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000070021s
2019-07-16T14:50:12.114Z [INFO] 10.233.70.174:46324 - 46638 "AAAA IN dl-cdn.alpinelinux.org.svc.autotest.local. udp 59 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000190704s
2019-07-16T14:50:12.115Z [INFO] 10.233.70.174:46324 - 46348 "A IN dl-cdn.alpinelinux.org.svc.autotest.local. udp 59 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000390738s
2019-07-16T14:50:12.136Z [INFO] 10.233.70.174:53617 - 45145 "AAAA IN dl-cdn.alpinelinux.org. udp 40 false 512" NOERROR qr,rd,ra 440 0.020942952s
2019-07-16T14:50:57.3Z [INFO] 10.233.70.174:47037 - 26066 "AAAA IN kubernetes.default.svc.autotest.local. udp 55 false 512" NOERROR qr,aa,rd,ra 151 0.000140426s
2019-07-16T14:50:57.3Z [INFO] 10.233.70.174:47037 - 25827 "A IN kubernetes.default.svc.autotest.local. udp 55 false 512" NOERROR qr,aa,rd,ra 108 0.00007576s
So the output above seems normal, the issue might be network related? But I have no clue how to debug this.
Hmm when I set the replicas to one in the CoreDNS deployment the issue is partially gone. I can now ping the internet!:
/ # ping google.nl
PING google.nl (216.58.208.99): 56 data bytes
64 bytes from 216.58.208.99: seq=0 ttl=36 time=20.553 ms
64 bytes from 216.58.208.99: seq=1 ttl=36 time=13.854 ms
64 bytes from 216.58.208.99: seq=2 ttl=36 time=35.334 ms
^C
--- google.nl ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 13.854/23.247/35.334 ms
/ # ping kubernetes
PING kubernetes (10.96.0.1): 56 data bytes
^C
--- kubernetes ping statistics ---
7 packets transmitted, 0 packets received, 100% packet loss
2019-07-16T15:14:32.882Z [INFO] 10.244.1.5:34450 - 50160 "AAAA IN google.nl.default.svc.cluster.local. udp 53 false 512" NXDOMAIN qr,aa,rd 146 0.0002705s
2019-07-16T15:14:32.883Z [INFO] 10.244.1.5:34450 - 49660 "A IN google.nl.default.svc.cluster.local. udp 53 false 512" NXDOMAIN qr,aa,rd 146 0.0003856s
2019-07-16T15:14:32.883Z [INFO] 10.244.1.5:55131 - 1104 "A IN google.nl.svc.cluster.local. udp 45 false 512" NXDOMAIN qr,aa,rd 138 0.000258s
2019-07-16T15:14:32.883Z [INFO] 10.244.1.5:55131 - 1604 "AAAA IN google.nl.svc.cluster.local. udp 45 false 512" NXDOMAIN qr,aa,rd 138 0.0004112s
2019-07-16T15:14:32.884Z [INFO] 10.244.1.5:54365 - 21620 "AAAA IN google.nl.cluster.local. udp 41 false 512" NXDOMAIN qr,aa,rd 134 0.0001243s
2019-07-16T15:14:32.884Z [INFO] 10.244.1.5:54365 - 21020 "A IN google.nl.cluster.local. udp 41 false 512" NXDOMAIN qr,aa,rd 134 0.000166s
2019-07-16T15:14:32.914Z [INFO] 10.244.1.5:48644 - 18440 "A IN google.nl.intermax.local. udp 42 false 512" NXDOMAIN qr,rd,ra 117 0.0298592s
2019-07-16T15:14:32.914Z [INFO] 10.244.1.5:48644 - 19040 "AAAA IN google.nl.intermax.local. udp 42 false 512" NXDOMAIN qr,rd,ra 117 0.0299696s
2019-07-16T15:14:32.950Z [INFO] 10.244.1.5:46038 - 55740 "A IN google.nl. udp 27 false 512" NOERROR qr,rd,ra 52 0.0354438s
2019-07-16T15:14:32.990Z [INFO] 10.244.1.5:46038 - 56140 "AAAA IN google.nl. udp 27 false 512" NOERROR qr,rd,ra 64 0.0759243s
2019-07-16T15:14:46.290Z [INFO] 10.244.1.5:42557 - 12225 "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,rd 147 0.0002117s
2019-07-16T15:14:46.290Z [INFO] 10.244.1.5:42557 - 11625 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,rd 106 0.0003265s
Is that debugging etc/resolv.conf from within a kind node or?
Kind nodes will pick up the nameserver from the host, and then the pods will get that + what Kubernetes sets.
This is a strange failure, I'm not sure if this is WSL2 related or not.
Oh I see this is WSLv1, that is completely untested (until you now, as far as I know). WSL2 has been tested and kind on windows with docker for windows.
Is that debugging etc/resolv.conf from within a kind node or?
No it is the /etc/resolv.conf within the alpine pod. The /etc/resolv.conf of the kind nodes looks like this:
nameserver 192.168.65.1
search intermax.local
domain intermax.local
And I can resolve the internet fine from within the kind nodes.
I upped the CoreDNS replicas to two again and the DNS stops working. When downscaling it to one I have a fully functional DNS to the internet and within the cluster:
/ # nslookup google.nl
Server: 10.96.0.10
Address: 10.96.0.10#53
Non-authoritative answer:
Name: google.nl
Address: 172.217.17.67
Name: google.nl
Address: 2a00:1450:400e:808::2003
/ # nslookup kubernetes
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
/ # nslookup mosquitto.automating
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: mosquitto.automating.svc.cluster.local
Address: 10.98.54.254
Scaling it up to two again and it stops working.
/ # nslookup mosquitto.automating
^C
Will open a bug over at Kubernetes to triage this further since it doesn't seem to be kind specific. On the other hand it could be network related but we will see what the Kubernetes guys have to say :)
Thanks for the fast response @BenTheElder 馃憤
I will leave this open until a conclusion is reached, will keep you guys updated!
馃 I can麓t reproduce the issue with your config
possibly interesting comment for anyone else following along: https://github.com/kubernetes/kubernetes/issues/80243#issuecomment-512262392
Followed up over at kubernetes/kubernetes#80243 and the PR with the fix #739
@wilmardo can you post your iptables rules in the nodes iptables-save and your pods kubectl get pods --all-namespaces -o wide when it is failing?
@aojea Of course, just did a fresh deployment.
# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default shell 1/1 Running 0 19s 10.244.1.2 kind-worker <none> <none>
kube-system coredns-5c98db65d4-6jbkm 1/1 Running 0 7m37s 10.244.0.3 kind-control-plane <none> <none>
kube-system coredns-5c98db65d4-cx427 1/1 Running 0 7m37s 10.244.0.2 kind-control-plane <none> <none>
kube-system etcd-kind-control-plane 1/1 Running 0 6m43s 172.17.0.2 kind-control-plane <none> <none>
kube-system kindnet-lcr6d 1/1 Running 0 7m22s 172.17.0.3 kind-worker <none> <none>
kube-system kindnet-nr6jt 1/1 Running 0 7m37s 172.17.0.2 kind-control-plane <none> <none>
kube-system kube-apiserver-kind-control-plane 1/1 Running 0 6m35s 172.17.0.2 kind-control-plane <none> <none>
kube-system kube-controller-manager-kind-control-plane 1/1 Running 0 6m33s 172.17.0.2 kind-control-plane <none> <none>
kube-system kube-proxy-kc6n8 1/1 Running 0 7m37s 172.17.0.2 kind-control-plane <none> <none>
kube-system kube-proxy-xwv5r 1/1 Running 0 7m22s 172.17.0.3 kind-worker <none> <none>
kube-system kube-scheduler-kind-control-plane 1/1 Running 0 6m40s 172.17.0.2 kind-control-plane <none> <none>
# iptables-save
# Generated by iptables-save v1.6.1 on Tue Jul 30 08:12:28 2019
*filter
:INPUT ACCEPT [899:143525]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [898:164852]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -s 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -d 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT
# Completed on Tue Jul 30 08:12:28 2019
# Generated by iptables-save v1.6.1 on Tue Jul 30 08:12:28 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [7:420]
:POSTROUTING ACCEPT [7:420]
:KIND-MASQ-AGENT - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-5ZUVGKEDQRTZFI3V - [0:0]
:KUBE-SEP-6E7XQMQ4RAYOWTTM - [0:0]
:KUBE-SEP-IT2ZTR26TO4XFPTO - [0:0]
:KUBE-SEP-N4G2XR5TDX7PQE7P - [0:0]
:KUBE-SEP-YIL6JZP7A3QYXJU2 - [0:0]
:KUBE-SEP-ZP3FB6NMPNCO4VBJ - [0:0]
:KUBE-SEP-ZXMNUKOKXUTL2MK2 - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-JD5MR3NA4I4DYORP - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "kind-masq-agent: ensure nat POSTROUTING directs all non-LOCAL destination traffic to our custom KIND-MASQ-AGENT chain" -m addrtype ! --dst-type LOCAL -j KIND-MASQ-AGENT
-A KIND-MASQ-AGENT -d 10.244.0.0/16 -m comment --comment "kind-masq-agent: local traffic is not subject to MASQUERADE" -j RETURN
-A KIND-MASQ-AGENT -m comment --comment "ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain)" -j MASQUERADE
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-5ZUVGKEDQRTZFI3V -s 172.17.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-5ZUVGKEDQRTZFI3V -p tcp -m tcp -j DNAT --to-destination 172.17.0.2:6443
-A KUBE-SEP-6E7XQMQ4RAYOWTTM -s 10.244.0.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6E7XQMQ4RAYOWTTM -p udp -m udp -j DNAT --to-destination 10.244.0.3:53
-A KUBE-SEP-IT2ZTR26TO4XFPTO -s 10.244.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-IT2ZTR26TO4XFPTO -p tcp -m tcp -j DNAT --to-destination 10.244.0.2:53
-A KUBE-SEP-N4G2XR5TDX7PQE7P -s 10.244.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-N4G2XR5TDX7PQE7P -p tcp -m tcp -j DNAT --to-destination 10.244.0.2:9153
-A KUBE-SEP-YIL6JZP7A3QYXJU2 -s 10.244.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-YIL6JZP7A3QYXJU2 -p udp -m udp -j DNAT --to-destination 10.244.0.2:53
-A KUBE-SEP-ZP3FB6NMPNCO4VBJ -s 10.244.0.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-ZP3FB6NMPNCO4VBJ -p tcp -m tcp -j DNAT --to-destination 10.244.0.3:9153
-A KUBE-SEP-ZXMNUKOKXUTL2MK2 -s 10.244.0.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-ZXMNUKOKXUTL2MK2 -p tcp -m tcp -j DNAT --to-destination 10.244.0.3:53
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-IT2ZTR26TO4XFPTO
-A KUBE-SVC-ERIFXISQEP7F7OF4 -j KUBE-SEP-ZXMNUKOKXUTL2MK2
-A KUBE-SVC-JD5MR3NA4I4DYORP -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-N4G2XR5TDX7PQE7P
-A KUBE-SVC-JD5MR3NA4I4DYORP -j KUBE-SEP-ZP3FB6NMPNCO4VBJ
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-5ZUVGKEDQRTZFI3V
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YIL6JZP7A3QYXJU2
-A KUBE-SVC-TCOU7JCQXEZGVUNU -j KUBE-SEP-6E7XQMQ4RAYOWTTM
COMMIT
# Completed on Tue Jul 30 08:12:28 2019
# iptables-save
# Generated by iptables-save v1.6.1 on Tue Jul 30 08:16:00 2019
*filter
:INPUT ACCEPT [27:15269]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [24:1991]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -s 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -d 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT
# Completed on Tue Jul 30 08:16:00 2019
# Generated by iptables-save v1.6.1 on Tue Jul 30 08:16:00 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KIND-MASQ-AGENT - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-5ZUVGKEDQRTZFI3V - [0:0]
:KUBE-SEP-6E7XQMQ4RAYOWTTM - [0:0]
:KUBE-SEP-IT2ZTR26TO4XFPTO - [0:0]
:KUBE-SEP-N4G2XR5TDX7PQE7P - [0:0]
:KUBE-SEP-YIL6JZP7A3QYXJU2 - [0:0]
:KUBE-SEP-ZP3FB6NMPNCO4VBJ - [0:0]
:KUBE-SEP-ZXMNUKOKXUTL2MK2 - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-JD5MR3NA4I4DYORP - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "kind-masq-agent: ensure nat POSTROUTING directs all non-LOCAL destination traffic to our custom KIND-MASQ-AGENT chain" -m addrtype ! --dst-type LOCAL -j KIND-MASQ-AGENT
-A KIND-MASQ-AGENT -d 10.244.0.0/16 -m comment --comment "kind-masq-agent: local traffic is not subject to MASQUERADE" -j RETURN
-A KIND-MASQ-AGENT -m comment --comment "ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain)" -j MASQUERADE
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-5ZUVGKEDQRTZFI3V -s 172.17.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-5ZUVGKEDQRTZFI3V -p tcp -m tcp -j DNAT --to-destination 172.17.0.2:6443
-A KUBE-SEP-6E7XQMQ4RAYOWTTM -s 10.244.0.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6E7XQMQ4RAYOWTTM -p udp -m udp -j DNAT --to-destination 10.244.0.3:53
-A KUBE-SEP-IT2ZTR26TO4XFPTO -s 10.244.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-IT2ZTR26TO4XFPTO -p tcp -m tcp -j DNAT --to-destination 10.244.0.2:53
-A KUBE-SEP-N4G2XR5TDX7PQE7P -s 10.244.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-N4G2XR5TDX7PQE7P -p tcp -m tcp -j DNAT --to-destination 10.244.0.2:9153
-A KUBE-SEP-YIL6JZP7A3QYXJU2 -s 10.244.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-YIL6JZP7A3QYXJU2 -p udp -m udp -j DNAT --to-destination 10.244.0.2:53
-A KUBE-SEP-ZP3FB6NMPNCO4VBJ -s 10.244.0.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-ZP3FB6NMPNCO4VBJ -p tcp -m tcp -j DNAT --to-destination 10.244.0.3:9153
-A KUBE-SEP-ZXMNUKOKXUTL2MK2 -s 10.244.0.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-ZXMNUKOKXUTL2MK2 -p tcp -m tcp -j DNAT --to-destination 10.244.0.3:53
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-IT2ZTR26TO4XFPTO
-A KUBE-SVC-ERIFXISQEP7F7OF4 -j KUBE-SEP-ZXMNUKOKXUTL2MK2
-A KUBE-SVC-JD5MR3NA4I4DYORP -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-N4G2XR5TDX7PQE7P
-A KUBE-SVC-JD5MR3NA4I4DYORP -j KUBE-SEP-ZP3FB6NMPNCO4VBJ
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-5ZUVGKEDQRTZFI3V
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YIL6JZP7A3QYXJU2
-A KUBE-SVC-TCOU7JCQXEZGVUNU -j KUBE-SEP-6E7XQMQ4RAYOWTTM
COMMIT
# Completed on Tue Jul 30 08:16:00 2019
@wilmardo is still not working?
I can see that boths coredns pods are in the same node
kube-system coredns-5c98db65d4-6jbkm 1/1 Running 0 7m37s 10.244.0.3 kind-control-plane
kube-system coredns-5c98db65d4-cx427 1/1 Running 0 7m37s 10.244.0.2 kind-control-plane
and the iptables rules DNAT the traffic to their internal ip addresses with probability 0.5
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
...
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YIL6JZP7A3QYXJU2
-A KUBE-SVC-TCOU7JCQXEZGVUNU -j KUBE-SEP-6E7XQMQ4RAYOWTTM
...
-A KUBE-SEP-YIL6JZP7A3QYXJU2 -s 10.244.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-YIL6JZP7A3QYXJU2 -p udp -m udp -j DNAT --to-destination 10.244.0.2:53
...
-A KUBE-SEP-6E7XQMQ4RAYOWTTM -s 10.244.0.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6E7XQMQ4RAYOWTTM -p udp -m udp -j DNAT --to-destination 10.244.0.3:53
maybe I`m missing something but it should work :man_shrugging:
I can see that boths coredns pods are in the same node
Yes happens most of the time with a kubeadm deployment since CoreDNS gets started replicated before any other node has joined. So they start on the same node since that is the only option, since they are already scheduled when the next node joins. So they stay on one node until something triggers a reschedule.
Did that now by running kubectl delete -n kube-system pods --selector k8s-app=kube-dns
Now that is fixed I can test the traffic between nodes.
all pods:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default shell 1/1 Running 0 86s 10.244.1.4 kind-worker <none> <none>
kube-system coredns-5c98db65d4-mg6dt 1/1 Running 0 2m2s 10.244.0.5 kind-control-plane <none> <none>
kube-system coredns-5c98db65d4-nd5rp 1/1 Running 0 2m2s 10.244.1.3 kind-worker <none> <none>
kube-system etcd-kind-control-plane 1/1 Running 0 110s 172.17.0.2 kind-control-plane <none> <none>
kube-system kindnet-4nwxc 1/1 Running 0 2m50s 172.17.0.2 kind-control-plane <none> <none>
kube-system kindnet-gz2qv 1/1 Running 0 2m34s 172.17.0.3 kind-worker <none> <none>
kube-system kube-apiserver-kind-control-plane 1/1 Running 0 2m7s 172.17.0.2 kind-control-plane <none> <none>
kube-system kube-controller-manager-kind-control-plane 1/1 Running 0 2m1s 172.17.0.2 kind-control-plane <none> <none>
kube-system kube-proxy-r7clv 1/1 Running 0 2m50s 172.17.0.2 kind-control-plane <none> <none>
kube-system kube-proxy-z2jhb 1/1 Running 0 2m34s 172.17.0.3 kind-worker <none> <none>
kube-system kube-scheduler-kind-control-plane 1/1 Running 0 112s 172.17.0.2 kind-control-plane <none> <none>
Run within the shell pod:
/ # nslookup google.nl
;; connection timed out; no servers could be reached
/ # nslookup google.nl 10.244.0.5
;; connection timed out; no servers could be reached
/ # nslookup google.nl 10.244.1.3
Server: 10.244.1.3
Address: 10.244.1.3#53
Non-authoritative answer:
Name: google.nl
Address: 172.217.17.131
Name: google.nl
Address: 2a00:1450:400e:807::2003
So as soon as I try to lookup DNS on the CoreDNS pod not on the same node as the shell pod it fails. But the requests are received by CoreDNS (forget to save the logs) but nslookup returns a timeout. It seems the response is never received.
Just to be sure my iptables output once more (needed to recreate the cluster).
# iptables-save
# Generated by iptables-save v1.6.1 on Tue Jul 30 11:56:45 2019
*filter
:INPUT ACCEPT [3228:536047]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [3229:598814]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -s 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -d 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT
# Completed on Tue Jul 30 11:56:45 2019
# Generated by iptables-save v1.6.1 on Tue Jul 30 11:56:45 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [17:1020]
:POSTROUTING ACCEPT [17:1020]
:KIND-MASQ-AGENT - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-5ZUVGKEDQRTZFI3V - [0:0]
:KUBE-SEP-EJJ3L23ZA35VLW6X - [0:0]
:KUBE-SEP-FVQSBIWR5JTECIVC - [0:0]
:KUBE-SEP-LASJGFFJP3UOS6RQ - [0:0]
:KUBE-SEP-LPGSDLJ3FDW46N4W - [0:0]
:KUBE-SEP-P6ZV3VC6PB5OMAHT - [0:0]
:KUBE-SEP-R75T7LXI5PWKQPQA - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-JD5MR3NA4I4DYORP - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "kind-masq-agent: ensure nat POSTROUTING directs all non-LOCAL destination traffic to our custom KIND-MASQ-AGENT chain" -m addrtype ! --dst-type LOCAL -j KIND-MASQ-AGENT
-A KIND-MASQ-AGENT -d 10.244.0.0/16 -m comment --comment "kind-masq-agent: local traffic is not subject to MASQUERADE" -j RETURN
-A KIND-MASQ-AGENT -m comment --comment "ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain)" -j MASQUERADE
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-5ZUVGKEDQRTZFI3V -s 172.17.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-5ZUVGKEDQRTZFI3V -p tcp -m tcp -j DNAT --to-destination 172.17.0.2:6443
-A KUBE-SEP-EJJ3L23ZA35VLW6X -s 10.244.1.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-EJJ3L23ZA35VLW6X -p udp -m udp -j DNAT --to-destination 10.244.1.3:53
-A KUBE-SEP-FVQSBIWR5JTECIVC -s 10.244.0.5/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-FVQSBIWR5JTECIVC -p tcp -m tcp -j DNAT --to-destination 10.244.0.5:9153
-A KUBE-SEP-LASJGFFJP3UOS6RQ -s 10.244.0.5/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-LASJGFFJP3UOS6RQ -p tcp -m tcp -j DNAT --to-destination 10.244.0.5:53
-A KUBE-SEP-LPGSDLJ3FDW46N4W -s 10.244.0.5/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-LPGSDLJ3FDW46N4W -p udp -m udp -j DNAT --to-destination 10.244.0.5:53
-A KUBE-SEP-P6ZV3VC6PB5OMAHT -s 10.244.1.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-P6ZV3VC6PB5OMAHT -p tcp -m tcp -j DNAT --to-destination 10.244.1.3:9153
-A KUBE-SEP-R75T7LXI5PWKQPQA -s 10.244.1.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-R75T7LXI5PWKQPQA -p tcp -m tcp -j DNAT --to-destination 10.244.1.3:53
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-LASJGFFJP3UOS6RQ
-A KUBE-SVC-ERIFXISQEP7F7OF4 -j KUBE-SEP-R75T7LXI5PWKQPQA
-A KUBE-SVC-JD5MR3NA4I4DYORP -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-FVQSBIWR5JTECIVC
-A KUBE-SVC-JD5MR3NA4I4DYORP -j KUBE-SEP-P6ZV3VC6PB5OMAHT
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-5ZUVGKEDQRTZFI3V
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-LPGSDLJ3FDW46N4W
-A KUBE-SVC-TCOU7JCQXEZGVUNU -j KUBE-SEP-EJJ3L23ZA35VLW6X
COMMIT
# Completed on Tue Jul 30 11:56:45 2019
# iptables-save
# Generated by iptables-save v1.6.1 on Tue Jul 30 11:59:05 2019
*filter
:INPUT ACCEPT [68:25196]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [66:5294]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -s 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -d 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT
# Completed on Tue Jul 30 11:59:05 2019
# Generated by iptables-save v1.6.1 on Tue Jul 30 11:59:05 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [4:240]
:POSTROUTING ACCEPT [4:240]
:KIND-MASQ-AGENT - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-5ZUVGKEDQRTZFI3V - [0:0]
:KUBE-SEP-EJJ3L23ZA35VLW6X - [0:0]
:KUBE-SEP-FVQSBIWR5JTECIVC - [0:0]
:KUBE-SEP-LASJGFFJP3UOS6RQ - [0:0]
:KUBE-SEP-LPGSDLJ3FDW46N4W - [0:0]
:KUBE-SEP-P6ZV3VC6PB5OMAHT - [0:0]
:KUBE-SEP-R75T7LXI5PWKQPQA - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-JD5MR3NA4I4DYORP - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "kind-masq-agent: ensure nat POSTROUTING directs all non-LOCAL destination traffic to our custom KIND-MASQ-AGENT chain" -m addrtype ! --dst-type LOCAL -j KIND-MASQ-AGENT
-A KIND-MASQ-AGENT -d 10.244.0.0/16 -m comment --comment "kind-masq-agent: local traffic is not subject to MASQUERADE" -j RETURN
-A KIND-MASQ-AGENT -m comment --comment "ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain)" -j MASQUERADE
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-5ZUVGKEDQRTZFI3V -s 172.17.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-5ZUVGKEDQRTZFI3V -p tcp -m tcp -j DNAT --to-destination 172.17.0.2:6443
-A KUBE-SEP-EJJ3L23ZA35VLW6X -s 10.244.1.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-EJJ3L23ZA35VLW6X -p udp -m udp -j DNAT --to-destination 10.244.1.3:53
-A KUBE-SEP-FVQSBIWR5JTECIVC -s 10.244.0.5/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-FVQSBIWR5JTECIVC -p tcp -m tcp -j DNAT --to-destination 10.244.0.5:9153
-A KUBE-SEP-LASJGFFJP3UOS6RQ -s 10.244.0.5/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-LASJGFFJP3UOS6RQ -p tcp -m tcp -j DNAT --to-destination 10.244.0.5:53
-A KUBE-SEP-LPGSDLJ3FDW46N4W -s 10.244.0.5/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-LPGSDLJ3FDW46N4W -p udp -m udp -j DNAT --to-destination 10.244.0.5:53
-A KUBE-SEP-P6ZV3VC6PB5OMAHT -s 10.244.1.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-P6ZV3VC6PB5OMAHT -p tcp -m tcp -j DNAT --to-destination 10.244.1.3:9153
-A KUBE-SEP-R75T7LXI5PWKQPQA -s 10.244.1.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-R75T7LXI5PWKQPQA -p tcp -m tcp -j DNAT --to-destination 10.244.1.3:53
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-LASJGFFJP3UOS6RQ
-A KUBE-SVC-ERIFXISQEP7F7OF4 -j KUBE-SEP-R75T7LXI5PWKQPQA
-A KUBE-SVC-JD5MR3NA4I4DYORP -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-FVQSBIWR5JTECIVC
-A KUBE-SVC-JD5MR3NA4I4DYORP -j KUBE-SEP-P6ZV3VC6PB5OMAHT
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-5ZUVGKEDQRTZFI3V
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-LPGSDLJ3FDW46N4W
-A KUBE-SVC-TCOU7JCQXEZGVUNU -j KUBE-SEP-EJJ3L23ZA35VLW6X
COMMIT
# Completed on Tue Jul 30 11:59:05 2019
I still can't reproduce it, I have followed exactly the same steps, can it be because I`m using Linux?
kubectl get -n kube-system pods --selector k8s-app=kube-dns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-5c98db65d4-qdmhs 1/1 Running 0 4m47s 10.244.1.3 kind-worker <none> <none>
coredns-5c98db65d4-x52mq 1/1 Running 0 4m46s 10.244.0.4 kind-control-plane <none> <none>
/ # nslookup google.nl
nslookup: can't resolve '(null)': Name does not resolve
Name: google.nl
Address 1: 172.217.17.3 mad07s09-in-f3.1e100.net
Address 2: 2a00:1450:4003:809::2003 mad08s05-in-x03.1e100.net
/ # nslookup google.nl 10.244.1.3
Server: 10.244.1.3
Address 1: 10.244.1.3 10-244-1-3.kube-dns.kube-system.svc.cluster.local
Name: google.nl
Address 1: 172.217.17.3 mad07s09-in-f3.1e100.net
Address 2: 2a00:1450:4003:809::2003 mad08s05-in-x03.1e100.net
/ # nslookup google.nl 10.244.0.2
Server: 10.244.0.2
Address 1: 10.244.0.2
Name: google.nl
Address 1: 172.217.17.3 mad07s09-in-f3.1e100.net
Address 2: 2a00:1450:4003:809::2003 mad08s05-in-x03.1e100.net
I still can't reproduce it, I have followed exactly the same steps, can it be because I`m using Linux?
Yes could be I suspect some incompatibility with WSL v1 and Docker for Windows. Will try to get a colleague to try and reproduce this :)
I'm on linux and currently having this problem. No dns resolution, but connecting via IP address works fine.
I'm getting really sporadic results from nslookup
/ # nslookup mysql-server
Server: 10.96.0.10
Address: 10.96.0.10:53
Name: mysql-server.default.svc.cluster.local
Address: 10.98.244.179
*** Can't find mysql-server.svc.cluster.local: No answer
*** Can't find mysql-server.cluster.local: No answer
*** Can't find mysql-server.default.svc.cluster.local: No answer
*** Can't find mysql-server.svc.cluster.local: No answer
*** Can't find mysql-server.cluster.local: No answer
```
/ # nslookup mysql-server
Server: 10.96.0.10
Address: 10.96.0.10:53
Name: mysql-server.default.svc.cluster.local
Address: 10.98.244.179
* Can't find mysql-server.svc.cluster.local: No answer
Can't find mysql-server.cluster.local: No answer
Can't find mysql-server.default.svc.cluster.local: No answer
Can't find mysql-server.svc.cluster.local: No answer
** Can't find mysql-server.cluster.local: No answer
/ # nslookup mysql-server
Server: 10.96.0.10
Address: 10.96.0.10:53
Name: mysql-server.default.svc.cluster.local
Address: 10.98.244.179
* Can't find mysql-server.svc.cluster.local: No answer
Can't find mysql-server.cluster.local: No answer
Can't find mysql-server.default.svc.cluster.local: No answer
Can't find mysql-server.svc.cluster.local: No answer
** Can't find mysql-server.cluster.local: No answer
meanwhile, here's the logs from core-dns...
(cmd)-> k logs -n kube-system -l k8s-app=kube-dns
.:53
2019-08-08T20:29:45.442Z [INFO] CoreDNS-1.2.6
2019-08-08T20:29:45.442Z [INFO] linux/amd64, go1.11.2, 756749c
CoreDNS-1.2.6
linux/amd64, go1.11.2, 756749c
[INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
.:53
2019-08-08T20:29:45.421Z [INFO] CoreDNS-1.2.6
2019-08-08T20:29:45.421Z [INFO] linux/amd64, go1.11.2, 756749c
CoreDNS-1.2.6
linux/amd64, go1.11.2, 756749c
[INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
which confuses me because that seems an awful lot like they're not accepting requests
@wreed4 Do you have log in the CoreDNS configmap?
/close
this has not activity, we couldn't reproduce it and e2e DNS tests are running periodically without any issue
Feel free to reopen
@aojea: Closing this issue.
In response to this:
/close
this has not activity, we couldn't reproduce it and e2e DNS tests are running periodically without any issue
Feel free to reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I encountered exactly the same issue in my kind env in my MacBook.
As I've got similar experience from here, so simply killed the CoreDNS pods helped, again!
I believe it's a bug somewhere
It maybe, but the problem is that there are many moving parts and difference between the environment reported.
Kind is running in the kubernetes CI with a very low rate of failure and a considerable amount of times per day, that makes me think the failure has to be environmental and not in kind default setup.
I guess i have the same problem on Ubuntu Server arm64 https://github.com/Trackhe/Raspberry64bitKubernetesServerDualstack
sometime works and in the next second on pod
master node /etc/resolv.conf
nameserver 127.0.0.53
options edns0
search fritz.box
for debug i use only one dns pod on master node: kubectl logs --namespace=kube-system -l k8s-app=kube-dns
[INFO] 200.200.208.17:33318 - 408 "AAAA IN google.com.fritz.box. udp 38 false 512" NXDOMAIN qr,aa,rd,ra 107 0.000339958s
[INFO] 200.200.208.17:33318 - 65259 "A IN google.com.fritz.box. udp 38 false 512" NOERROR qr,aa,rd,ra 38 0.000587436s
[INFO] 200.200.208.17:60191 - 7669 "AAAA IN google.com.default.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000403772s
[INFO] 200.200.208.17:60191 - 6521 "A IN google.com.default.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000721008s
[INFO] 200.200.208.17:56756 - 57719 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000352365s
[INFO] 200.200.208.17:56756 - 58756 "AAAA IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000859247s
[INFO] 200.200.208.17:59421 - 26342 "AAAA IN google.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000350032s
[INFO] 200.200.208.17:59421 - 25527 "A IN google.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000672139s
[INFO] 200.200.208.17:36074 - 10831 "AAAA IN google.com.fritz.box. udp 38 false 512" NXDOMAIN qr,aa,rd,ra 107 0.000352976s
[INFO] 200.200.208.17:36074 - 9942 "A IN google.com.fritz.box. udp 38 false 512" NOERROR qr,aa,rd,ra 38 0.000646046s
To be clear: the link above does NOT appear to involve using KIND.
There are lots of ways to wind up with broken cluster networking unrelated to KIND 馃槄
Its the way to reproduce the problem in my case. i can remove it if you want. I though it can help
i realy dont understand why it is working for a small moment. and in the next second it fails..
[INFO] 200.200.208.17:38092 - 5890 "AAAA IN google.de.default.svc.cluster.local. udp 53 false 512" NXDOMAIN qr,aa,rd 146 0.000803937s
[INFO] 200.200.208.17:38092 - 4722 "A IN google.de.default.svc.cluster.local. udp 53 false 512" NXDOMAIN qr,aa,rd 146 0.001110953s
[INFO] 200.200.208.17:45374 - 4306 "A IN google.de.svc.cluster.local. udp 45 false 512" NXDOMAIN qr,aa,rd 138 0.000632586s
[INFO] 200.200.208.17:45374 - 5121 "AAAA IN google.de.svc.cluster.local. udp 45 false 512" NXDOMAIN qr,aa,rd 138 0.000953695s
[INFO] 200.200.208.17:60195 - 55455 "AAAA IN google.de.cluster.local. udp 41 false 512" NXDOMAIN qr,aa,rd 134 0.000715363s
[INFO] 200.200.208.17:60195 - 54715 "A IN google.de.cluster.local. udp 41 false 512" NXDOMAIN qr,aa,rd 134 0.001026694s
[INFO] 200.200.208.17:44190 - 20295 "A IN google.de.fritz.box. udp 37 false 512" NXDOMAIN qr,aa,rd,ra 106 0.004858937s
[INFO] 200.200.208.17:44190 - 20832 "AAAA IN google.de.fritz.box. udp 37 false 512" NOERROR qr,aa,rd,ra 37 0.006597845s
[INFO] 200.200.208.17:50558 - 64294 "AAAA IN google.de. udp 27 false 512" NOERROR qr,rd,ra 64 0.016610308s
[INFO] 200.200.208.17:50558 - 63572 "A IN google.de. udp 27 false 512" NOERROR qr,rd,ra 52 0.017251375s
[INFO] 200.200.208.17:33200 - 50524 "PTR IN 195.16.217.172.in-addr.arpa. udp 45 false 512" NOERROR qr,rd,ra 177 0.00622646s
[INFO] 200.200.208.17:40313 - 20979 "AAAA IN google.com.default.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000727567s
[INFO] 200.200.208.17:40313 - 20182 "A IN google.com.default.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.001072305s
[INFO] 200.200.208.17:49206 - 40776 "AAAA IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000631531s
[INFO] 200.200.208.17:49206 - 40313 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000975269s
[INFO] 200.200.208.17:39480 - 21779 "AAAA IN google.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000554291s
[INFO] 200.200.208.17:39480 - 21353 "A IN google.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000822177s
[INFO] 200.200.208.17:40032 - 46613 "AAAA IN google.com.fritz.box. udp 38 false 512" NXDOMAIN qr,aa,rd,ra 107 0.004968694s
[INFO] 200.200.208.17:40032 - 46058 "A IN google.com.fritz.box. udp 38 false 512" NOERROR qr,aa,rd,ra 38 0.006061092s