Cert-manager: DNS propagation never succeeds (Route53)

Created on 1 Apr 2019 · 8Comments · Source: jetstack/cert-manager

Describe the bug:

Not a single certificate gets issued when Dns01 is used. Cert Manager's Pod never get past the propagation check:

$ kubectl logs -f -n cert-manager cert-manager-6f68b58796-z29bq --since 1s
I0401 17:57:46.366191       1 controller.go:206] challenges controller: syncing item 'namespace/example-com-4104471611-0'
I0401 17:57:46.366468       1 logger.go:103] Calling Discover
I0401 17:57:46.370909       1 dns.go:101] Checking DNS propagation for "example.com" using name servers: [8.8.8.8:53]
I0401 17:57:46.502108       1 sync.go:173] propagation check failed: DNS record for "example.com" not yet propagated
I0401 17:57:46.502588       1 controller.go:212] challenges controller: Finished processing work item "namespace/example-com-4104471611-0"

Expected behaviour:

At least it should identify the record and move on...

The record is always successfully created, as shown below:

$  aws route53 list-resource-record-sets --hosted-zone-id XXXXXXXXXXXXX
{
    "ResourceRecordSets": [
        // [...]
        {
            "Name": "example.com.",
            "Type": "A",
            "TTL": 300,
            "ResourceRecords": [
                {
                    "Value": "XXX.XXX.XXX.XXX"
                }
            ]
        },
        {
            "Name": "_acme-challenge.example.com.",
            "Type": "TXT",
            "TTL": 10,
            "ResourceRecords": [
                {
                    "Value": "\"ZWPpajuQACXP4m7giks2S8fe9KSz1TrzLUn_T5HfSwg\""
                }
            ]
        }
    ]
}

Nevertheless, cert-manager has been in a loop for 2 hours supposedly trying to get this information.

Steps to reproduce the bug:

Follow the tutorial:

Host a domain in Route53 and a cluster in GKE
Install cert-manager-v0.7.0
Create a ClusterIssuer
Create a simple certificate

---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-stage
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-stage
    http01: {}
    dns01:
      providers:
        - name: route53-dns
          cnameStrategy: "Follow"
          route53:
            accessKeyID: XXXXXXXXXXXXXXXXXXXX
            region: sa-east-1
            hostedZoneID: XXXXXXXXXXXXX
            secretAccessKeySecretRef:
              name: route53-credentials-secret
              key: secret-access-key

---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: example-com
  namespace: namespace
spec:
  secretName: example-com-tls
  issuerRef:
    name: letsencrypt-stage
    kind: ClusterIssuer
  commonName: example.com
  dnsNames:
    - example.com
  acme:
    config:
      - dns01:
          provider: route53-dns
        domains:
          - example.com

Anything else we need to know?:

ACME Http01 Works fine, but I need a wildcard certificate. The example above yields the same results I get whenever I try to add '*.example.com' to the Certificate API Object, hence the simplified version I've shown.

This was happening since at least v0.5.0, took me a while to report...

Environment details::

Kubernetes version: 1.12.6-gke.7
Cloud-provider/provisioner: GKE
cert-manager version: v0.7.0
Install method: helm

/kind bug

kinbug

Source

davi5e

Most helpful comment

Looks like --dns01-recursive-nameservers-only solved the issue here.

I did the installation as follows:

kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.8/deploy/manifests/00-crds.yaml

kubectl create namespace cert-manager

kubectl label namespace cert-manager certmanager.k8s.io/disable-validation="true"

helm repo add jetstack https://charts.jetstack.io

helm repo update

helm install \
    --name cert-manager \
    --namespace cert-manager \
    --version v0.7.2 \
    --set "podDnsPolicy"="None" \
    --set "podDnsConfig.nameservers[0]"="8.8.8.8" \
    --set extraArgs='{--dns01-recursive-nameservers-only}' \
    jetstack/cert-manager \
    --tls

Nevertheless, I had to manually intervene over in Route53 to correct the TXT record cert-manager creates. It lacked the prefix _acme-challenge so I had to edit it in order for the Pod's validation to succeed.

davi5e on 2 May 2019

👍5 👀1

All 8 comments

Could you try hopping onto the machine that runs the cert-manager pod and see if it can reach the _acme-challenge.example.com record? Propagation checks in cert-manager are done via DNS and split horizon setups can cause this check to fail

DanielMorsing on 2 Apr 2019

Apparently, the Pod uses Busybox's nslookup, which doesn't accept arguments to query TXT records...

Anyhow:

$ nslookup example.com
nslookup: can't resolve '(null)': Name does not resolve

Name:      example.com
Address 1: AAA.BBB.CCC.DDD DDD.CCC.BBB.AAA.bc.googleusercontent.com

I have no idea how to interpret these results... Not sure where DDD.CCC.BBB.AAA.bc.googleusercontent.com came from, or why it did not resolve...

By the way, this IP is the exact reverse of the A record for example.com. in Route53.

davi5e on 2 Apr 2019

I have the same setup (Route53 and GKE) and have exactly the same issue:

# On the node and the cert-manager pod
$ nslookup _acme-challenge.example.com
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve '_acme-challenge.example.com': Name does not resolve

I have other wildcard certs already setup with version prior to 0.5 and they still work.

peterfication on 3 Apr 2019

I killed the cert-manager pod and then it worked again.

I had another issue and in the end I reinstalled cert-manager version 0.6.2 completely. But I think it worked already after killing the pod.

peterfication on 3 Apr 2019

@peterfication did you use stable or jetstack when you installed 0.6.2?

Also, did you use Helm?

davi5e on 12 Apr 2019

I used helm with the jetstack repo version 0.6.0.

peterfication on 13 Apr 2019

have similar problem, TXT record with dns challenge is created on Route53, but it has short TTL: 10sec
running cert-manager with --dns01-recursive-nameservers-only=true so dns propagration would not recursively go to authoritative nameserver as we have restricted access to port 53 in firewall.
Similar setup worked with Google CloudDNS which is creating dns challege txt record with 60 sec TTL.
cert-manager v0.7.0
external-dns v0.5

gaddamas on 30 Apr 2019

Looks like --dns01-recursive-nameservers-only solved the issue here.

I did the installation as follows:

kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.8/deploy/manifests/00-crds.yaml

kubectl create namespace cert-manager

kubectl label namespace cert-manager certmanager.k8s.io/disable-validation="true"

helm repo add jetstack https://charts.jetstack.io

helm repo update

helm install \
    --name cert-manager \
    --namespace cert-manager \
    --version v0.7.2 \
    --set "podDnsPolicy"="None" \
    --set "podDnsConfig.nameservers[0]"="8.8.8.8" \
    --set extraArgs='{--dns01-recursive-nameservers-only}' \
    jetstack/cert-manager \
    --tls

davi5e on 2 May 2019

👍5 👀1

Was this page helpful?

0 / 5 - 0 ratings