Cert-manager: DNS-01 self check fails for domain

Created on 9 Jul 2018 · 21Comments · Source: jetstack/cert-manager

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug
/kind feature

What happened:
Not sure if this is really a bug or just a misconfiguration but I've tried already several setups but none of them worked :/ I'm using Traefik as ingress controller and I'm trying to get a certificate for one of my services via the DNS-01 challenge for an A record demo.test.company.com in my Azure DNS Zone. But I'm always getting the error message:

dns-01 self check failed for domain "demo.test.company.com"
````

I do see that LE was able to add the ACME challenge to my Azure DNS Zone hence my DNS provider config within the certificate issuer should be ok.

[> YAMLs and logs](https://gist.github.com/subesokun/d53c0f9f51668a80dd703386cdc9e3d3)

Any help would be appreciated!

**What you expected to happen**:

Check for domain succeeds and secret gets created holding the LE certificate.

**Environment**:
Cluster: AKS (Azure Kubernetes Service)
DNS Type: Azure DNS Zone
Treafik: 1.34.0 (via Helm)
Cert-Manager: 0.3.4 (via Helm)

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T22:29:25Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
```

kinbug

Source

subesokun

👍3 😕1

All 21 comments

No idea why but the DNS propagation check always fails. I've even tried to explicitly add the nameservers of the Azure DNS but no luck.

podDnsConfig:
  nameservers:
  - 40.90.4.3
  - 64.4.48.3

I've checked the DNS record via dig @40.90.4.3 demo.test.company.com and it finds that A record in the above listed nameservers. Also kubectl exec -ti busybox -- nslookup demo.test.company.com is able to lookup my host via the default 10.0.0.10 nameserver.

Not sure if it helps you but I've enabled verbose logging and uploaded the latest logs here https://gist.github.com/subesokun/97ca43f228b3bdc416fb4c34ed3283cf

...
dns.go:78] Checking DNS propagation for "demo.test.company.com" using name servers: [10.0.0.10:53 40.90.4.3:53 64.4.48.3:53]
dns.go:85] DNS record for "demo.test.company.com" not yet propagated
...

subesokun on 10 Jul 2018

Just noticed that the DNS pre-check is checking for a TXT record but using dig it seems to be ok too.

dig -t txt @40.90.4.3 demo.test.company.com

; <<>> DiG 9.10.6 <<>> -t txt @40.90.4.3 demo.test.company.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;demo.test.company.com. IN TXT

;; AUTHORITY SECTION:
test.company.com.     300     IN      SOA     ns1-03.azure-dns.com. azuredns-hostmaster.microsoft.com. 1 3600 300 2419200 300

;; Query time: 41 msec
;; SERVER: 40.90.4.3#53(40.90.4.3)
;; WHEN: Tue Jul 10 11:25:11 CEST 2018
;; MSG SIZE  rcvd: 161

subesokun on 10 Jul 2018

I've got the same problem with the newest version of cert-manager.

dabeck on 12 Jul 2018

Having the same issue with 0.4.0 on the fresh cluster.
No problems on the cluster where I re-issued existing wildcard certificate after upgrade from 0.3.2
UPD: after 4 hours it obtained the certificate....

str1k3r on 16 Jul 2018

👍1

I also have a problem with 0.4.0. Using the - --dns01-self-check-nameservers=8.8.8.8:53,8.8.4.4:53 option because I have a split dns. I have three domains on my cloudflare account.

raymccarthy on 16 Jul 2018

Linking these issues together as they seem related: https://github.com/jetstack/cert-manager/issues/728

Tybot204 on 17 Jul 2018

Can also confirm that my issue still exists in the 0.4.0 release. Setting --dns01-self-check-nameservers had no effect.

subesokun on 25 Jul 2018

I'm trying to setup a wildcard certificate which requires a dns01 challenge. I don't have any of those providers though, and I'm stuck. Do you need to have them? I thought the challenge would just look the DNS up as usual without having to integrate to any of those providers. I am not sure if I'm just suffering from this same bug here or if it is something else. I'd love if somebody could post an example of a ClusterIssuer that does not use any of them (if it is possible, which I am not sure of due to the lack of clarity..?).

simpers on 31 Jul 2018

I have the same issue with cert-manager 0.4.0 and route 53, also tried -dns01-self-check-nameservers without success. May be it's worth mentioning that I'm also using a delegated zone in route 53 may be https://github.com/jetstack/cert-manager/issues/728 is related.

kjedamzik on 1 Aug 2018

I'm running into the same issue, however I didn't find any documentation how to use the flag (--dns01-self-check-nameservers "8.8.8.8:53,1.1.1.1:53"). My process on a new cluster:

Install cert-manager (v0.4) via kubectl
Add service account via gcloud and save secret via kubectl
Add clusterissuer with clouddns as provider
Add certificate using clusterissuer --> fails due to self check. So far what I saw was wrong NS IP.

Thanks in advance!

adirery on 2 Aug 2018

@munnerz Sorry for the direct ping but any ideas if that is a bug or a miss-configuration on my side? Any hint would be appreciated.

subesokun on 10 Aug 2018

@subesokun if you could DM me on Slack with the un-redacted copies of logs and YAML resources (including the status fields) I should be able to help you dig into this further!

munnerz on 10 Aug 2018

@munnerz @subesokun it would be nice to kmow how you set it up too if it is misconfigured. I don't understand where to put said flag from the docs, and I get the impression that you NEED a provider for the DNS checkup, but I'm not sure if that is how it is?

simpers on 10 Aug 2018

does #825 fix your issue?

stuart-warren on 13 Aug 2018

@stuart-warren I've tested now the latest canary image but still get those self check issues.

@munnerz Thanks a lot for your offer! I'll come back on you once I've the un-redacted logs and YAML resources ready.

subesokun on 20 Aug 2018

Hello, there, I've posted this in the cert-manager slack room already, but I am having the same issue using DNS01 challenges in AWS. cert-manager just spins trying to complete the self-check and never succeeds. I've provided my k8s objects and some logs in slack, but I'll post a snippet of my logs here as well:

0820 15:58:03.855717       1 helpers.go:188] Found status change for Certificate "grafana-cert" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-20 15:58:03.855701685 +0000 UTC m=+1394.361879337
I0820 15:58:03.855754       1 sync.go:244] Error preparing issuer for certificate tst-monitoring/grafana-cert: dns-01 self check failed for domain "grafana.cqms.rmntest.com"
E0820 15:58:03.855776       1 sync.go:165] [tst-monitoring/grafana-cert] Error getting certificate 'grafana-cert': secret "grafana-cert" not found
I0820 15:58:03.862353       1 round_trippers.go:405] PUT https://172.20.0.1:443/apis/certmanager.k8s.io/v1alpha1/namespaces/tst-monitoring/certificates/grafana-cert 200 OK in 6 milliseconds
E0820 15:58:03.862541       1 controller.go:190] certificates controller: Re-queuing item "tst-monitoring/grafana-cert" due to error processing: dns-01 self check failed for domain "grafana.cqms.rmntest.com"
I0820 15:58:03.862638       1 controller.go:152] ingress-shim controller: syncing item 'tst-monitoring/tst-kube-prometheus-grafana'
I0820 15:58:03.862661       1 sync.go:124] Certificate "grafana-cert" for ingress "tst-kube-prometheus-grafana" already exists
I0820 15:58:03.862697       1 sync.go:127] Certificate "grafana-cert" for ingress "tst-kube-prometheus-grafana" is up to date
I0820 15:58:03.862708       1 controller.go:166] ingress-shim controller: Finished processing work item "tst-monitoring/tst-kube-prometheus-grafana"
I0820 15:58:05.243001       1 round_trippers.go:405] GET https://172.20.0.1:443/api/v1/namespaces/kube-system/configmaps/cert-manager-controller 200 OK in 4 milliseconds
I0820 15:58:05.247717       1 round_trippers.go:405] PUT https://172.20.0.1:443/api/v1/namespaces/kube-system/configmaps/cert-manager-controller 200 OK in 4 milliseconds
I0820 15:58:05.247845       1 leaderelection.go:199] successfully renewed lease kube-system/cert-manager-controller
I0820 15:58:07.252500       1 round_trippers.go:405] GET https://172.20.0.1:443/api/v1/namespaces/kube-system/configmaps/cert-manager-controller 200 OK in 4 milliseconds
I0820 15:58:07.257492       1 round_trippers.go:405] PUT https://172.20.0.1:443/api/v1/namespaces/kube-system/configmaps/cert-manager-controller 200 OK in 4 milliseconds
I0820 15:58:07.257591       1 leaderelection.go:199] successfully renewed lease kube-system/cert-manager-controller
I0820 15:58:07.465780       1 helpers.go:188] Found status change for Certificate "prometheus-cert" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-20 15:58:07.465763241 +0000 UTC m=+1397.971940794
I0820 15:58:07.465817       1 sync.go:244] Error preparing issuer for certificate tst-monitoring/prometheus-cert: dns-01 self check failed for domain "prometheus.cqms.rmntest.com"
E0820 15:58:07.465838       1 sync.go:165] [tst-monitoring/prometheus-cert] Error getting certificate 'prometheus-cert': secret "prometheus-cert" not found
I0820 15:58:07.471508       1 round_trippers.go:405] PUT https://172.20.0.1:443/apis/certmanager.k8s.io/v1alpha1/namespaces/tst-monitoring/certificates/prometheus-cert 200 OK in 5 milliseconds
I0820 15:58:07.471887       1 controller.go:152] ingress-shim controller: syncing item 'tst-monitoring/tst-kube-prometheus'
I0820 15:58:07.471921       1 sync.go:124] Certificate "prometheus-cert" for ingress "tst-kube-prometheus" already exists
I0820 15:58:07.471964       1 sync.go:127] Certificate "prometheus-cert" for ingress "tst-kube-prometheus" is up to date
I0820 15:58:07.471980       1 controller.go:166] ingress-shim controller: Finished processing work item "tst-monitoring/tst-kube-prometheus"
E0820 15:58:07.472092       1 controller.go:190] certificates controller: Re-queuing item "tst-monitoring/prometheus-cert" due to error processing: dns-01 self check failed for domain "prometheus.cqms.rmntest.com"
I0820 15:58:09.262182       1 round_trippers.go:405] GET https://172.20.0.1:443/api/v1/namespaces/kube-system/configmaps/cert-manager-controller 200 OK in 4 milliseconds
I0820 15:58:09.267774       1 round_trippers.go:405] PUT https://172.20.0.1:443/api/v1/namespaces/kube-system/configmaps/cert-manager-controller 200 OK in 5 milliseconds
I0820 15:58:09.267909       1 leaderelection.go:199] successfully renewed lease kube-system/cert-manager-controller

Would really like to see this solved. I will be monitoring this issue and am happy to provide any other information y'all need.

norwoodj on 20 Aug 2018

Update: I figured out my issue:
The problem is that I am also running external-dns and it was deleting my TXT records as soon as cert-manager created them:

time="2018-08-20T17:59:15Z" level=info msg="config: {Master: KubeConfig: Sources:[service ingress] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false Compatibility: PublishInternal:false ConnectorSourceServer:localhost:8080 Provider:aws GoogleProject: DomainFilter:[cqms.rmntest.com cqms.rmnstage.com] ZoneIDFilter:[] AWSZoneType: AWSAssumeRole: AWSMaxChangeCount:4000 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:sync Registry:noop TXTOwnerID:eks-external-dns TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:info TXTCacheInterval:0s}"
time="2018-08-20T17:59:15Z" level=info msg="Connected to cluster at https://172.20.0.1:443"
time="2018-08-20T17:59:16Z" level=info msg="All records are already up to date"
time="2018-08-20T18:00:16Z" level=info msg="Desired change: DELETE _acme-challenge.grafana.cqms.rmntest.com TXT"
time="2018-08-20T18:00:16Z" level=info msg="Desired change: DELETE _acme-challenge.prometheus.cqms.rmntest.com TXT"
time="2018-08-20T18:00:16Z" level=info msg="Record in zone cqms.rmntest.com. were successfully updated"

I will have to investigate further on how to make cert-manager not do this, but this was actually my issue all along. I set the number of replicas for the external-dns deployment to 0 and let cert-manager do it's thing and voila, it successfully retrieved my certificates.

Apologies for blaming cert-manager.

norwoodj on 20 Aug 2018

@munnerz Finally I've also figured out my issue 🎉 In the end it was a misconfiguration of the Azure DNS01 provider as in my cert issuer the hostedZoneName property was missing which must be set. If it isn't set cert-manager will silently create a challenge looking like _acme-challenge.myapp.domain.com.domain.com which causes to DNS self check to fail. To avoid this hostedZoneName must be set to domain.com. After fixing that there was also no need for me anymore to explicitly set the --dns01-self-check-nameservers.

Unfortunately the Azure DNS01 provider hasn't been well documented (http://docs.cert-manager.io/en/latest/reference/issuers/acme/dns01.html?highlight=azure) and also I'd be great if the cert-manager would make the hostedZoneName attribute mandatory and show an error/warning if it isn't set. Also it'd be a good idea to check if the generated ACME DNS challenge record matches the expected record name to avoid miss-configurations.

subesokun on 21 Aug 2018

👍1

@subesokun Have you done anything more apart from setting the hostedZoneName parameter?
I'm still running in these issues with this config:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: cert-sub-example-com
  namespace: kube-system
spec:
  secretName: cert-sub-example-com
  issuerRef:
    name: letsencrypt
    kind: ClusterIssuer
  commonName: '*.sub.example.com'
  dnsNames:
  - sub.example.com
  acme:
    config:
    - dns01:
        provider: azuredns
        hostedZoneName: example.com
      domains:
      - '*.sub.example.com'
      - sub.example.com

This still leads to log entries like this:

I0913 13:20:52.767668       1 controller.go:171] certificates controller: syncing item 'kube-system/cert-sub-example-com'
I0913 13:20:52.768054       1 sync.go:312] Preparing certificate kube-system/cert-sub-example-com with issuer
I0913 13:20:52.768226       1 logger.go:43] Calling GetOrder
I0913 13:20:53.016526       1 logger.go:73] Calling GetAuthorization
I0913 13:20:53.194813       1 logger.go:98] Calling DNS01ChallengeRecord
I0913 13:20:53.194878       1 prepare.go:279] Cleaning up old/expired challenges for Certificate kube-system/cert-sub-example-com
I0913 13:20:53.194898       1 logger.go:68] Calling GetChallenge
I0913 13:20:53.740756       1 dns.go:99] Checking DNS propagation for "sub.example.com" using name servers: [8.8.8.8:53 1.1.1.1:53]
I0913 13:20:53.770885       1 dns.go:106] DNS record for "sub.example.com" not yet propagated
I0913 13:20:53.770967       1 dns.go:88] Presenting DNS01 challenge for domain "sub.example.com"
I0913 13:20:53.843114       1 logger.go:73] Calling GetAuthorization
I0913 13:20:54.441211       1 helpers.go:201] Found status change for Certificate "cert-sub-example-com" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-09-13 13:20:54.441200199 +0000 UTC m=+1210.869653237
I0913 13:20:54.441302       1 sync.go:314] Error preparing issuer for certificate kube-system/cert-sub-example-com: dns-01 self check failed for domain "sub.example.com"
I0913 13:20:54.441562       1 sync.go:206] Certificate kube-system/cert-sub-example-com scheduled for renewal in -652 hours
E0913 13:20:54.475915       1 controller.go:180] certificates controller: Re-queuing item "kube-system/cert-sub-example-com" due to error processing: dns-01 self check failed for domain "kubernetes.hlw-t3k-dev.azops.cloud"

DNS seems to be ready, at least a short dig TXT gives me the correct answer.

Is there a way to skip the self check once?