Hi,
We see the following behaviour in our kubernetes cluster:
The company DNS Servers/Resolvers are using our coredns to resolve the subdomain dev.xxx.yyy.de.
If a clients asks for the A record of artifacts.dev.xxx.yyy.de (an ingress), the company DNS Server/Resolver asks our coredns and gets the answer, that this is a CNAME to traefik-lan.dev.xxx.yyy.de and the A record of traefik-lan.dev.xxx.yyy.de. All is fine.
But the company DNS Servers/Resovers see that artifacts.dev.xxx.yyy.de doesn't have a A record but a CNAME, and ask for the CNAME shortly before TTL runs out. Coredns answers with NXDOMAIN, and now for 30 sec nobody in the company can resolve artifacts.dev.xxx.yyy.de
The Question is:
Version: 1.3.1
Config:
.:53 {
cache 30 {
prefetch 20 5m
}
errors
etcd xxx.yyy.de {
path /skydns
endpoint http://localhost:2379
upstream
}
health
loadbalance round_robin
log
prometheus 0.0.0.0:9153
}
Lines from etcd:
/ # ETCDCTL_API=3 etcdctl get /skydns/de/yyy/xxx/dev/artifacts/16ba5713
/skydns/de/yyy/xxx/dev/artifacts/16ba5713
{"host":"traefik-lan.dev.xxx.yyy.de","text":"\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/tools/artifactory\"","targetstrip":1}
/ # ETCDCTL_API=3 etcdctl get /skydns/de/yyy/xxx/dev/traefik-lan/62319966
/skydns/de/yyy/xxx/dev/traefik-lan/62319966
{"host":"10.245.0.0","text":"\"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/kube-system/traefik-lan\"","targetstrip":1}
Dns queries:
dig @10.254.0.62 artifacts.dev.xxx.yyy.de a
; <<>> DiG 9.9.4-RedHat-9.9.4-73.el7_6 <<>> @10.254.0.62 artifacts.dev.xxx.yyy.de a
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63013
;; flags: qr rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;artifacts.dev.xxx.yyy.de. IN A
;; ANSWER SECTION:
artifacts.dev.xxx.yyy.de. 27 IN CNAME traefik-lan.dev.xxx.yyy.de.
traefik-lan.dev.xxx.yyy.de. 27 IN A 10.245.0.0
;; Query time: 0 msec
;; SERVER: 10.254.0.62#53(10.254.0.62)
;; WHEN: Tue May 28 15:48:32 CEST 2019
;; MSG SIZE rcvd: 159
dig @10.254.0.62 artifacts.dev.xxx.yyy.de cname
; <<>> DiG 9.9.4-RedHat-9.9.4-73.el7_6 <<>> @10.254.0.62 artifacts.dev.xxx.yyy.de cname
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 33915
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;artifacts.dev.xxx.yyy.de. IN CNAME
;; AUTHORITY SECTION:
xxx.yyy.de. 30 IN SOA ns.dns.xxx.yyy.de. hostmaster.xxx.yyy.de. 1559051352 7200 1800 86400 30
;; Query time: 1 msec
;; SERVER: 10.254.0.62#53(10.254.0.62)
;; WHEN: Tue May 28 15:49:12 CEST 2019
;; MSG SIZE rcvd: 137
This def. does not look correct. I'll try to distill this down to a (failing)
test case and see what the fix looks like.
We had exactly same issue and fixed etcd plugin internally.
We investigated etcd plugin and found that this line causes the issue.
https://github.com/coredns/coredns/blob/53f3f0b666821588e721ceeea4766b76333b668b/plugin/etcd/handler.go#L59
This line does not consider the existence of another type of record. So we decided to make this line to return dns.RcodeSuccess.
Should We create a PR?
[ Quoting notifications@github.com in "Re: [coredns/coredns] Coredns answe..." ]
We had exactly same issue and fixed etcd plugin internally.
We investigated etcd plugin and found that this line causes the issue.
https://github.com/coredns/coredns/blob/53f3f0b666821588e721ceeea4766b76333b668b/plugin/etcd/handler.go#L59This line does not consider the existence of another type of record. So we decided to make this line to return
dns.RcodeSuccess.
But that lines is checked above with a ... e.IsNameError(err) which should
say the record name does not exist, turning that into a Success is not
correct.
So this points to something wrong in IsNameError, or in the CNAME handling which
is explicitly done in the switch above that.
/Miek
--
Miek Gieben
Hello,
FWIW I am using CoreDNS-1.7.0 with etcd and external DNS and cannot get CNAMEs to resolve at all.
etcdctl get /skydns/red/primary/test/129f4043
/skydns/red/primary/test/129f4043
{"host":"primary.net","text":"\"heritage=external-dns,external-dns/owner=default,external-dns/resource=crd/rook-ceph/general-endpoints\"","ttl":180,"targetstrip":1}
I have an entry that has test.primary.red pointing to primary.net that was injected by external DNS. I have even tried manual injection via etcdctl put.
Though I query this, and I get NXDOMAIN as well.
[INFO] my.ip.add.res:49240 - 60686 "CNAME IN test.primary.red. udp 65 false 4096" NXDOMAIN qr,aa,rd 93 0.00247605s
dig cname @ns1.primary.red test.primary.red
; <<>> DiG 9.14.8 <<>> cname @ns1.primary.red test.primary.red
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 60686
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: ada24821582191f8 (echoed)
;; QUESTION SECTION:
;test.primary.red. IN CNAME
;; AUTHORITY SECTION:
. 30 IN SOA ns.dns. hostmaster. 1595753926 7200 1800 86400 30
;; Query time: 28 msec
;; SERVER: 51.222.70.240#53(51.222.70.240)
;; WHEN: Sun Jul 26 04:58:45 Eastern Daylight Time 2020
;; MSG SIZE rcvd: 116
md5-17d7342cd0fca058acd47ffc59b826b6
. {
log
errors
health {
lameduck 5s
}
ready
cache 30
loop
reload
dnssec primary.red {
key file /etc/coredns/keys/Kprimary-red
}
etcd {
path /skydns
endpoint http://etcd-coredns-client:2379
}
}
Other records (such as A) resolve ok. My full configuration for coredns and deployments can be found here https://github.com/mcserverhosting-net/cluster-red-flux/tree/fd309ad3d16cfe5163f2907fc52c950d67aeb9f7/infra/dns
A quick test using ectdctl put and dig can be found here https://github.com/mcserverhosting-net/cluster-red-flux/issues/2 where A records succeed but CNAME records fail.
I've run into this issue with coredns 1.6.7 in OpenSuSE with NS and AAAA queries blocking access to a name that has just an A record.
This is considered a security issue by CERT as it allows someone at a site to deny access to services by requesting the a name that is wanted with a query that the server has no data for - as I understand it (and matches what I see happen) NXDOMAIN is allowed to be cached and invalidates already known information as it's saying "this domain does not exist here, there is nothing to see, move along".
https://www.kb.cert.org/vuls/id/714121
I have found a workaround, which is to add fallthrough to the etcd plugin config. That switches it from returning the incorrect NXDOMAIN to the correct NODATA response, at the cost of getting plugin/etcd: no next plugin found complaints.
Hope this helps people!
All the best,
Chris