Cert-manager: Does cert-manager support delegated zones?

Created on 12 Jul 2018 · 12Comments · Source: jetstack/cert-manager

I have cert-manager up and running successfully in my GKE cluster. I recently added a delegated zone for a domain name which has its main DNS zone (MYDOMAIN.com) hosted at dnsmadeeasy, but which has delegated the store.MYDOMAIN.com zone to Google Cloud DNS.

Here's a screenshot of the Cloud DNS console (with some text redacted):
my-domain zone details

Here's what dig shows for that zone's records:

don@box: $ dig +short frontend.store.MYDOMAIN.com
153.12.10.144

don@box: $ dig +short store.MYDOMAIN.com ns
ns-cloud-b1.googledomains.com.
ns-cloud-b4.googledomains.com.
ns-cloud-b2.googledomains.com.
ns-cloud-b3.googledomains.com.

don@box: $ dig +short store.MYDOMAIN.com soa
ns-cloud-b1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300

So, clearly the delegation is working OK.

But this is the error I get in the cert-manager pod's logs:

I0712 17:44:06.431890       1 controller.go:177] certificates controller: syncing item 'my-store-master/my-store-frontend'
I0712 17:44:06.432072       1 sync.go:253] Preparing certificate my-store-master/my-store-frontend with issuer
I0712 17:44:06.432096       1 prepare.go:47] Getting ACME client
I0712 17:44:06.432117       1 acme.go:159] getting private key (letsncrypt-prod->tls.key) for acme issuer kube-system/letsencrypt-prod
I0712 17:44:06.432768       1 logger.go:27] Calling GetOrder
I0712 17:44:06.586662       1 leaderelection.go:199] successfully renewed lease kube-system/cert-manager-controller
I0712 17:44:06.747791       1 logger.go:52] Calling GetAuthorization
I0712 17:44:06.869397       1 logger.go:77] Calling DNS01ChallengeRecord
I0712 17:44:06.869538       1 prepare.go:263] Cleaning up old/expired challenges for Certificate my-store-master/my-store-frontend
I0712 17:44:06.869613       1 logger.go:47] Calling GetChallenge
I0712 17:44:06.988555       1 dns.go:78] Checking DNS propagation for "frontend.store.MYDOMAIN.com" using name servers: [10.55.240.10:53]
I0712 17:44:07.005962       1 dns.go:85] DNS record for "frontend.store.MYDOMAIN.com" not yet propagated
I0712 17:44:07.006427       1 dns.go:72] Presenting DNS01 challenge for domain "frontend.store.MYDOMAIN.com"
I0712 17:44:07.093817       1 helpers.go:188] Found status change for Certificate "my-store-frontend" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-07-12 17:44:07.09379683 +0000 UTC m=+523503.933832606
I0712 17:44:07.093886       1 sync.go:255] Error preparing issuer for certificate my-store-master/my-store-frontend: No matching GoogleCloud domain found for domain MYDOMAIN.com.
E0712 17:44:07.093921       1 sync.go:182] [my-store-master/my-store-frontend] Error getting certificate 'my-store-frontend-cert': secret "my-store-frontend-cert" not found
E0712 17:44:07.098405       1 controller.go:186] certificates controller: Re-queuing item "my-store-master/my-store-frontend" due to error processing: No matching GoogleCloud domain found for domain MYDOMAIN.com.

I suspect the issue is that cert-manager is only searching the console for the MYDOMAIN.com zone, when it should instead be attempting to find the more specific store.MYDOMAIN.com zone first. I'm not a gopher and so am probably wildly off-base, but is it possible that the following code block should implement such a recursively-descending search?

https://github.com/jetstack/cert-manager/blob/c1b34376fd584ae0ac4ec2662ac7765cb32d1b4e/pkg/issuer/acme/dns/clouddns/clouddns.go#L180-L200

Thanks for any help you can give.

kinbug

Source

donspaulding

👍2

Most helpful comment

I just encountered the same problem with v0.5.2, however running v0.4.1 fixed it. Did it creep in again?

leo-baltus on 26 Nov 2018

👍7

All 12 comments

This seems to be the case. I have a similar issue using a subdomain staging.domain.app. Upon adding domain.app to Cloud DNS the challenge record was created for domain.app, but then my verification failed as it tried to check the TXT record at staging.domain.app. It appears to be creating the TXT record under the wrong domain, not respecting subdomains.

Tybot204 on 14 Jul 2018

So it is definitely supposed to support this, and if it does not it should be considered a bug.

As per this function: https://github.com/jetstack/cert-manager/blob/c1b34376fd584ae0ac4ec2662ac7765cb32d1b4e/pkg/issuer/acme/dns/util/wait.go#L168-L170

We are supposed to recurse up the DNS hierarchy until we encounter a SOA record for the domain. I can see an SOA record in your DNS zone above, so I am not too sure why this is not the case.

Are you in a position to build your own copy of cert-manager and debug this further (i.e. by logging the parameters passed to the FindZoneByFqdn function) ? Otherwise I'll try and take a look in a few days 😄

/assign
/kind bug

munnerz on 16 Jul 2018

I'm not super familiar with Go or Kubernetes in general so I probably won't be able to take the extra time soon to debug with the learning overhead there. I did play around with some other cases though for some hopefully useful test cases.

I added a api.staging.domain.app DNS zone in addition to the existing staging.domain.app. I then recreated my certificate for solely api.staging.domain.app (no wildcards either) and the challenge record was created incorrectly in the staging.domain.app zone. Maybe it's searching the DNS hierarchy in reverse? Going down rather than up?

Tybot204 on 16 Jul 2018

Linking these issues together as they seem related: https://github.com/jetstack/cert-manager/issues/721

Tybot204 on 17 Jul 2018

I think #750 should solve this

kragniz on 20 Jul 2018

👍1

@donspaulding can you try and see if this issue is fixed with the :canary release tag of the cert-manager docker image? 😄

munnerz on 24 Jul 2018

@munnerz I've tested the canary release to see if it solves my issue #721 but no luck :( Still getting the same error.

subesokun on 25 Jul 2018

@munnerz That fixed the issue for us with no other changes on our side. Gracias!

To be a further pest: what's the plan for getting that into a released version? We had been running off the latest helm chart version, and would like to lock in a specific version in the chart so we aren't running on canary for forever.

donspaulding on 30 Jul 2018

👍1

v0.4.1 has just been released today which contains this patch 😄

I'm going to close this issue now, as the original problem has been resolved :)

munnerz on 10 Aug 2018

🎉3

I just encountered the same problem with v0.5.2, however running v0.4.1 fixed it. Did it creep in again?

leo-baltus on 26 Nov 2018

👍7

Can someone confirm that this is a regression bug or similar?