Cert-manager: Route53 Dns Challenge Refused

Created on 1 Jun 2018  路  8Comments  路  Source: jetstack/cert-manager

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

I get the following error logs when i specify the dns01 challenge:

I0601 07:52:13.113343       1 controller.go:177] certificates controller: syncing item 'default/alertmanager-public-tls'
I0601 07:52:13.113516       1 sync.go:239] Preparing certificate default/alertmanager-public-tls with issuer
I0601 07:52:13.113551       1 acme.go:159] getting private key (letsencrypt-staging-standard->tls.key) for acme issuer security-system/letsencrypt-staging-standard
I0601 07:52:13.114028       1 prepare.go:247] Cleaning up previous order for certificate default/alertmanager-public-tls
I0601 07:52:13.114044       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/alertmanager-public-tls
I0601 07:52:13.114056       1 logger.go:22] Calling CreateOrder
I0601 07:52:13.683746       1 acme.go:193] Created order for domains: [{dns alertmanager.default.public.kubernetes.example.com}]
I0601 07:52:13.683798       1 logger.go:52] Calling GetAuthorization
I0601 07:52:13.813640       1 logger.go:77] Calling DNS01ChallengeRecord
I0601 07:52:13.813679       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/alertmanager-public-tls
I0601 07:52:13.813686       1 logger.go:47] Calling GetChallenge
I0601 07:52:13.944988       1 dns.go:78] Checking DNS propagation for "alertmanager.default.public.kubernetes.example.com" using name servers: [10.54.0.2:53]
I0601 07:52:13.961796       1 helpers.go:155] Setting lastTransitionTime for Certificate "alertmanager-public-tls" condition "Ready" to 2018-06-01 07:52:13.961787957 +0000 UTC m=+64772.030123976
I0601 07:52:13.961824       1 sync.go:241] Error preparing issuer for certificate default/alertmanager-public-tls: NS ns-1024.awsdns-00.org. returned REFUSED for _acme-challenge.alertmanager.default.public.kubernetes.example.com.
E0601 07:52:13.970368       1 sync.go:168] [default/alertmanager-public-tls] Error getting certificate 'alertmanager-public-tls': secret "alertmanager-public-tls" not found
E0601 07:52:13.970397       1 controller.go:186] certificates controller: Re-queuing item "default/alertmanager-public-tls" due to error processing: NS ns-1024.awsdns-00.org. returned REFUSED for _acme-challenge.alertmanager.default.public.kubernetes.example.com.

What you expected to happen:

I expected cert-manager to create the TXT record for _acme-challenge.alertmanager.default.public.kubernetes.example.com. but seems this isn't happening

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.10.3
  • Cloud provider or hardware configuration**: local (vagrant)
  • Install tools: custom
  • Others:
areacme kinbug

Most helpful comment

I also bumped into this. In our case we have an AWS public zone "example.com" and a private zone as a subdomain of this "internal.example.com". Seems that cert-manager always finds the name server associated with the private zone before it finds the public one, which results in this "REFUSED" error. The solution posted by @tomislater seems to work:

podDnsPolicy: "None"
podDnsConfig:
  nameservers:
    - "1.1.1.1"
    - "8.8.8.8"

This forces cert-manager to use public DNS servers to find the SOA record for the certificate FQDN, which means it never finds the private zone at all (which only ever gets resolved by the AWS nameserver).

All 8 comments

Hitting the same issue. Does not matter if I provide an AWS key or rely on ambient credentials.

Environment:

Kubernetes version (use kubectl version): v1.8.9
Cloud provider or hardware configuration**: Tectonic on AWS
Install tools: helm

+1

I am also running into this. It appears that the initial DNS query to check for the TXT record (before it is added) is returning a REFUSED which trips and error at https://github.com/jetstack/cert-manager/blob/df4b493b38e89350015fa05c4987395f7fc6cff7/pkg/issuer/acme/dns/util/wait.go#L89

I have encountered the same problem. I had two hosted zones for the same domain name:

  • Public Hosted Zone
  • Private Hosted Zone for Amazon VPC

Removing the private zone has resolved the issue. You can also change some values in https://github.com/kubernetes/charts/blob/master/stable/cert-manager/values.yaml#L49 if you cannot delete Private Hosted Zone for Amazon VPC

Unfortunately the pod dns settings work for k8s 1.10+ which we do not have yet. There was a new startup parameter '--dns01-self-check-nameservers' I though could help, but it does not seem to matter. Even if I specify --dns01-self-check-nameservers=8.8.8.8:53 the dns query ends up at the private zone name server:

I0831 09:42:45.012506 1 dns.go:79] Checking DNS propagation for "gitlab1.payconiq.io" using name servers: [8.8.8.8:53]
I0831 09:42:45.022086 1 helpers.go:188] Found status change for Certificate "gitlab1-tls" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-31 09:42:45.02207678 +0000 UTC m=+1010.118726091
I0831 09:42:45.022119 1 sync.go:244] Error preparing issuer for certificate gitlab-apps/gitlab1-tls: NS ns-512.awsdns-00.net. returned REFUSED for _acme-challenge.gitlab1.payconiq.io.

ns-512.awsdns-00.net - is a private zone name server

I also bumped into this. In our case we have an AWS public zone "example.com" and a private zone as a subdomain of this "internal.example.com". Seems that cert-manager always finds the name server associated with the private zone before it finds the public one, which results in this "REFUSED" error. The solution posted by @tomislater seems to work:

podDnsPolicy: "None"
podDnsConfig:
  nameservers:
    - "1.1.1.1"
    - "8.8.8.8"

This forces cert-manager to use public DNS servers to find the SOA record for the certificate FQDN, which means it never finds the private zone at all (which only ever gets resolved by the AWS nameserver).

And just as an addendum for non-helm usage, in the actual manifest that works out to:

dnsPolicy: "None"
dnsConfig:
  nameservers:
    - "1.1.1.1"
    - "8.8.8.8"

Please can you retry using the latest version of cert-manager (v0.6.1), and if you run into issues either ask on the #cert-manager slack on slack.k8s.io, or open a new issue?

Thanks 馃槃

Was this page helpful?
0 / 5 - 0 ratings