Coredns: Coredns answers with NXDOMAIN to CNAME queries

Created on 28 May 2019  路  5Comments  路  Source: coredns/coredns

Hi,

We see the following behaviour in our kubernetes cluster:

The company DNS Servers/Resolvers are using our coredns to resolve the subdomain dev.xxx.yyy.de.

If a clients asks for the A record of artifacts.dev.xxx.yyy.de (an ingress), the company DNS Server/Resolver asks our coredns and gets the answer, that this is a CNAME to traefik-lan.dev.xxx.yyy.de and the A record of traefik-lan.dev.xxx.yyy.de. All is fine.

But the company DNS Servers/Resovers see that artifacts.dev.xxx.yyy.de doesn't have a A record but a CNAME, and ask for the CNAME shortly before TTL runs out. Coredns answers with NXDOMAIN, and now for 30 sec nobody in the company can resolve artifacts.dev.xxx.yyy.de

The Question is:

  • is this intended?
  • did we configure something wrong?

Version: 1.3.1

Config:

.:53 {
  cache 30 {
    prefetch 20 5m
  }
  errors
  etcd xxx.yyy.de {
    path /skydns
    endpoint http://localhost:2379
    upstream
  }
  health
  loadbalance round_robin
  log
  prometheus 0.0.0.0:9153
}

Lines from etcd:

/ # ETCDCTL_API=3 etcdctl get /skydns/de/yyy/xxx/dev/artifacts/16ba5713
/skydns/de/yyy/xxx/dev/artifacts/16ba5713
{"host":"traefik-lan.dev.xxx.yyy.de","text":"\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/tools/artifactory\"","targetstrip":1}

/ # ETCDCTL_API=3 etcdctl get /skydns/de/yyy/xxx/dev/traefik-lan/62319966
/skydns/de/yyy/xxx/dev/traefik-lan/62319966
{"host":"10.245.0.0","text":"\"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/kube-system/traefik-lan\"","targetstrip":1}

Dns queries:

dig @10.254.0.62 artifacts.dev.xxx.yyy.de a

; <<>> DiG 9.9.4-RedHat-9.9.4-73.el7_6 <<>> @10.254.0.62 artifacts.dev.xxx.yyy.de a
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63013
;; flags: qr rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;artifacts.dev.xxx.yyy.de.      IN      A

;; ANSWER SECTION:
artifacts.dev.xxx.yyy.de. 27    IN      CNAME   traefik-lan.dev.xxx.yyy.de.
traefik-lan.dev.xxx.yyy.de. 27  IN      A       10.245.0.0

;; Query time: 0 msec
;; SERVER: 10.254.0.62#53(10.254.0.62)
;; WHEN: Tue May 28 15:48:32 CEST 2019
;; MSG SIZE  rcvd: 159
dig @10.254.0.62 artifacts.dev.xxx.yyy.de cname

; <<>> DiG 9.9.4-RedHat-9.9.4-73.el7_6 <<>> @10.254.0.62 artifacts.dev.xxx.yyy.de cname
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 33915
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;artifacts.dev.xxx.yyy.de.      IN      CNAME

;; AUTHORITY SECTION:
xxx.yyy.de.             30      IN      SOA     ns.dns.xxx.yyy.de. hostmaster.xxx.yyy.de. 1559051352 7200 1800 86400 30

;; Query time: 1 msec
;; SERVER: 10.254.0.62#53(10.254.0.62)
;; WHEN: Tue May 28 15:49:12 CEST 2019
;; MSG SIZE  rcvd: 137

bug plugietcd

All 5 comments

This def. does not look correct. I'll try to distill this down to a (failing)
test case and see what the fix looks like.

We had exactly same issue and fixed etcd plugin internally.

We investigated etcd plugin and found that this line causes the issue.
https://github.com/coredns/coredns/blob/53f3f0b666821588e721ceeea4766b76333b668b/plugin/etcd/handler.go#L59

This line does not consider the existence of another type of record. So we decided to make this line to return dns.RcodeSuccess.

Should We create a PR?

[ Quoting notifications@github.com in "Re: [coredns/coredns] Coredns answe..." ]

We had exactly same issue and fixed etcd plugin internally.

We investigated etcd plugin and found that this line causes the issue.
https://github.com/coredns/coredns/blob/53f3f0b666821588e721ceeea4766b76333b668b/plugin/etcd/handler.go#L59

This line does not consider the existence of another type of record. So we decided to make this line to return dns.RcodeSuccess.

But that lines is checked above with a ... e.IsNameError(err) which should
say the record name does not exist, turning that into a Success is not
correct.

So this points to something wrong in IsNameError, or in the CNAME handling which
is explicitly done in the switch above that.

/Miek

--
Miek Gieben

Hello,

FWIW I am using CoreDNS-1.7.0 with etcd and external DNS and cannot get CNAMEs to resolve at all.

etcdctl get /skydns/red/primary/test/129f4043
/skydns/red/primary/test/129f4043
{"host":"primary.net","text":"\"heritage=external-dns,external-dns/owner=default,external-dns/resource=crd/rook-ceph/general-endpoints\"","ttl":180,"targetstrip":1}

I have an entry that has test.primary.red pointing to primary.net that was injected by external DNS. I have even tried manual injection via etcdctl put.

Though I query this, and I get NXDOMAIN as well.

[INFO] my.ip.add.res:49240 - 60686 "CNAME IN test.primary.red. udp 65 false 4096" NXDOMAIN qr,aa,rd 93 0.00247605s
dig cname @ns1.primary.red test.primary.red

; <<>> DiG 9.14.8 <<>> cname @ns1.primary.red test.primary.red
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 60686
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: ada24821582191f8 (echoed)
;; QUESTION SECTION:
;test.primary.red.      IN      CNAME

;; AUTHORITY SECTION:
.                       30      IN      SOA     ns.dns. hostmaster. 1595753926 7200 1800 86400 30

;; Query time: 28 msec
;; SERVER: 51.222.70.240#53(51.222.70.240)
;; WHEN: Sun Jul 26 04:58:45 Eastern Daylight Time 2020
;; MSG SIZE  rcvd: 116



md5-17d7342cd0fca058acd47ffc59b826b6



    . {
        log
        errors
        health {
            lameduck 5s
        }
        ready
        cache 30
        loop
        reload
        dnssec primary.red {
            key file /etc/coredns/keys/Kprimary-red
        }
        etcd {
            path /skydns
            endpoint http://etcd-coredns-client:2379
        }
      }

Other records (such as A) resolve ok. My full configuration for coredns and deployments can be found here https://github.com/mcserverhosting-net/cluster-red-flux/tree/fd309ad3d16cfe5163f2907fc52c950d67aeb9f7/infra/dns

A quick test using ectdctl put and dig can be found here https://github.com/mcserverhosting-net/cluster-red-flux/issues/2 where A records succeed but CNAME records fail.

I've run into this issue with coredns 1.6.7 in OpenSuSE with NS and AAAA queries blocking access to a name that has just an A record.

This is considered a security issue by CERT as it allows someone at a site to deny access to services by requesting the a name that is wanted with a query that the server has no data for - as I understand it (and matches what I see happen) NXDOMAIN is allowed to be cached and invalidates already known information as it's saying "this domain does not exist here, there is nothing to see, move along".

https://www.kb.cert.org/vuls/id/714121

I have found a workaround, which is to add fallthrough to the etcd plugin config. That switches it from returning the incorrect NXDOMAIN to the correct NODATA response, at the cost of getting plugin/etcd: no next plugin found complaints.

Hope this helps people!

All the best,
Chris

Was this page helpful?
0 / 5 - 0 ratings