Cert-manager: Delegated subdomain failing with Could not determine the zone: Unexpected response code 'SERVFAIL'

Created on 15 Sep 2018  路  11Comments  路  Source: jetstack/cert-manager

Describe the bug:
When deploying a certificate with DNS01 validation and issuer that are based on a delegated subdomain (subdomain with NS records), the cert-manager throws the following error (domain redacted):

I0914 20:26:30.766263 1 logger.go:68] Calling GetChallenge
I0914 20:26:30.927797 1 dns.go:99] Checking DNS propagation for "swissarmy.x.x.com" using name servers: [10.43.0.10:53]
I0914 20:26:31.072501 1 helpers.go:201] Found status change for Certificate "swissarmy-x-x-com" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-09-14 20:26:31.072488997 +0000 UTC m=+409.159194581
I0914 20:26:31.072541 1 sync.go:276] Error preparing issuer for certificate default/swissarmy-x-x-com: Could not determine the zone: Unexpected response code 'SERVFAIL' for swissarmy.x.x.com.
E0914 20:26:31.072560 1 sync.go:197] [default/swissarmy-x-x-com] Error getting certificate 'swissarmy-x-x-com-tls': secret "swissarmy-x-x-com-tls" not found
E0914 20:26:31.095902 1 controller.go:180] certificates controller: Re-queuing item "default/swissarmy-x-x-com" due to error processing: Could not determine the zone: Unexpected response code 'SERVFAIL' for swissarmy.x.x.com.

It seems like maybe it's checking at the parent domain and not at the subdomain's NS, but I could be wrong.

Expected behaviour:
With the sub-domain defined in the ClusterIssuer, I would expect that the records be checked at those NS entries, and not the parent domain. DNS validated, SSL created.

Steps to reproduce the bug:
Delegate a subdomains NS to another provider. (I'm using Azure DNS) Deploy a ClusterIssuer for that subdomain. Create a certificate for a sub-sub-domain.

Anything else we need to know?:
We're delegating our DNS to AzureDNS for a subdomain and using Let's Encrypt. I was able to successfully to this with a top-level domain name in the same environment and same subscription/resource group in Azure.

Environment details::

  • Kubernetes version 1.11.2 w/ Rancher
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): Azure
  • cert-manager version (e.g. v0.4.0): v0.5.0
  • Install method (e.g. helm or static manifests): helm

/kind bug

kinbug

Most helpful comment

I added

extraArgs:
  - --dns01-self-check-nameservers="8.8.8.8:53"

so worked!

All 11 comments

I'm encountering the same issue.

same issue for me

Same issue using subdomain on google cloud dns:

  • Kubernetes version: v1.11.5-gke.5
  • Cloud-provider/provisioner: GKE
  • Cert-manager version: v0.5.2
  • Install method: helm
  • Helm values

Same issue using provider acmedns

Try upgrading to v0.6.0 and add --dns01-recursive-nameservers-only=true, that worked for me using acemdns.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: cert-manager
  namespace: "cert-manager"
  labels:
    app: cert-manager
    chart: cert-manager-v0.6.0-beta.1
    release: cert-manager
    heritage: Tiller
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cert-manager
      release: cert-manager
  template:
    metadata:
      labels:
        app: cert-manager
        release: cert-manager
      annotations:
    spec:
      serviceAccountName: cert-manager
      containers:
        - name: cert-manager
          image: "quay.io/jetstack/cert-manager-controller:v0.6.0-beta.0"
          imagePullPolicy: IfNotPresent
          args:
          - --cluster-resource-namespace=$(POD_NAMESPACE)
          - --leader-election-namespace=$(POD_NAMESPACE)
          - --dns01-recursive-nameservers="1.1.1.1:53"
          - --dns01-recursive-nameservers-only=true
          env:
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          resources:
            requests:
              cpu: 10m
              memory: 32Mi

I added

extraArgs:
  - --dns01-self-check-nameservers="8.8.8.8:53"

so worked!

I added

extraArgs:
  - --dns01-self-check-nameservers="8.8.8.8:53"

so worked!

extraArgs:
  - --dns01-recursive-nameservers="1.1.1.1:53"
  - --dns01-recursive-nameservers-only=true

worked for me

I'm going to close this - v0.6.1 introduced a patch that fixed how we resolved the DNS zone to update in all DNS providers, which should help even without the dns01-recursive-nameservers flag.

Glad people found a resolution in the meantime 馃槃

Hello
I had the same issue with

  • cert-manager 0.8.1
  • route53
  • GKE cluster private cluster (with cloudNAT)
Failed to determine Route 53 hosted zone ID: error finding zone from fqdn: Unexpected response code 'SERVFAIL' for _acme-challenge

Adding

extraArgs:
  - --dns01-recursive-nameservers="8.8.8.8:53"
  - --dns01-recursive-nameservers-only=true

solved the issue

Adding these lines to cert-manager fixed all issues we had with "DNS not propagated yet" when it actually had.

@GeckoSplinter that fixed it for me aswell

Was this page helpful?
0 / 5 - 0 ratings