I have cert-manager up and running successfully in my GKE cluster. I recently added a delegated zone for a domain name which has its main DNS zone (MYDOMAIN.com) hosted at dnsmadeeasy, but which has delegated the store.MYDOMAIN.com zone to Google Cloud DNS.
Here's a screenshot of the Cloud DNS console (with some text redacted):

Here's what dig shows for that zone's records:
don@box: $ dig +short frontend.store.MYDOMAIN.com
153.12.10.144
don@box: $ dig +short store.MYDOMAIN.com ns
ns-cloud-b1.googledomains.com.
ns-cloud-b4.googledomains.com.
ns-cloud-b2.googledomains.com.
ns-cloud-b3.googledomains.com.
don@box: $ dig +short store.MYDOMAIN.com soa
ns-cloud-b1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
So, clearly the delegation is working OK.
But this is the error I get in the cert-manager pod's logs:
I0712 17:44:06.431890 1 controller.go:177] certificates controller: syncing item 'my-store-master/my-store-frontend'
I0712 17:44:06.432072 1 sync.go:253] Preparing certificate my-store-master/my-store-frontend with issuer
I0712 17:44:06.432096 1 prepare.go:47] Getting ACME client
I0712 17:44:06.432117 1 acme.go:159] getting private key (letsncrypt-prod->tls.key) for acme issuer kube-system/letsencrypt-prod
I0712 17:44:06.432768 1 logger.go:27] Calling GetOrder
I0712 17:44:06.586662 1 leaderelection.go:199] successfully renewed lease kube-system/cert-manager-controller
I0712 17:44:06.747791 1 logger.go:52] Calling GetAuthorization
I0712 17:44:06.869397 1 logger.go:77] Calling DNS01ChallengeRecord
I0712 17:44:06.869538 1 prepare.go:263] Cleaning up old/expired challenges for Certificate my-store-master/my-store-frontend
I0712 17:44:06.869613 1 logger.go:47] Calling GetChallenge
I0712 17:44:06.988555 1 dns.go:78] Checking DNS propagation for "frontend.store.MYDOMAIN.com" using name servers: [10.55.240.10:53]
I0712 17:44:07.005962 1 dns.go:85] DNS record for "frontend.store.MYDOMAIN.com" not yet propagated
I0712 17:44:07.006427 1 dns.go:72] Presenting DNS01 challenge for domain "frontend.store.MYDOMAIN.com"
I0712 17:44:07.093817 1 helpers.go:188] Found status change for Certificate "my-store-frontend" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-07-12 17:44:07.09379683 +0000 UTC m=+523503.933832606
I0712 17:44:07.093886 1 sync.go:255] Error preparing issuer for certificate my-store-master/my-store-frontend: No matching GoogleCloud domain found for domain MYDOMAIN.com.
E0712 17:44:07.093921 1 sync.go:182] [my-store-master/my-store-frontend] Error getting certificate 'my-store-frontend-cert': secret "my-store-frontend-cert" not found
E0712 17:44:07.098405 1 controller.go:186] certificates controller: Re-queuing item "my-store-master/my-store-frontend" due to error processing: No matching GoogleCloud domain found for domain MYDOMAIN.com.
I suspect the issue is that cert-manager is only searching the console for the MYDOMAIN.com zone, when it should instead be attempting to find the more specific store.MYDOMAIN.com zone first. I'm not a gopher and so am probably wildly off-base, but is it possible that the following code block should implement such a recursively-descending search?
Thanks for any help you can give.
This seems to be the case. I have a similar issue using a subdomain staging.domain.app. Upon adding domain.app to Cloud DNS the challenge record was created for domain.app, but then my verification failed as it tried to check the TXT record at staging.domain.app. It appears to be creating the TXT record under the wrong domain, not respecting subdomains.
So it is definitely supposed to support this, and if it does not it should be considered a bug.
As per this function: https://github.com/jetstack/cert-manager/blob/c1b34376fd584ae0ac4ec2662ac7765cb32d1b4e/pkg/issuer/acme/dns/util/wait.go#L168-L170
We are supposed to recurse up the DNS hierarchy until we encounter a SOA record for the domain. I can see an SOA record in your DNS zone above, so I am not too sure why this is not the case.
Are you in a position to build your own copy of cert-manager and debug this further (i.e. by logging the parameters passed to the FindZoneByFqdn function) ? Otherwise I'll try and take a look in a few days 馃槃
/assign
/kind bug
I'm not super familiar with Go or Kubernetes in general so I probably won't be able to take the extra time soon to debug with the learning overhead there. I did play around with some other cases though for some hopefully useful test cases.
I added a api.staging.domain.app DNS zone in addition to the existing staging.domain.app. I then recreated my certificate for solely api.staging.domain.app (no wildcards either) and the challenge record was created incorrectly in the staging.domain.app zone. Maybe it's searching the DNS hierarchy in reverse? Going down rather than up?
Linking these issues together as they seem related: https://github.com/jetstack/cert-manager/issues/721
I think #750 should solve this
@donspaulding can you try and see if this issue is fixed with the :canary release tag of the cert-manager docker image? 馃槃
@munnerz I've tested the canary release to see if it solves my issue #721 but no luck :( Still getting the same error.
@munnerz That fixed the issue for us with no other changes on our side. Gracias!
To be a further pest: what's the plan for getting that into a released version? We had been running off the latest helm chart version, and would like to lock in a specific version in the chart so we aren't running on canary for forever.
v0.4.1 has just been released today which contains this patch 馃槃
I'm going to close this issue now, as the original problem has been resolved :)
I just encountered the same problem with v0.5.2, however running v0.4.1 fixed it. Did it creep in again?
Can someone confirm that this is a regression bug or similar?
If anyone else hits this issue, I "solved" it by deleting the cert-manager pod. All worked fine when it came up again...
Most helpful comment
I just encountered the same problem with v0.5.2, however running v0.4.1 fixed it. Did it creep in again?