Cert-manager: On new AKS cluster: "Failed to clean up previous order: No existing record found"

Created on 14 Aug 2018  路  3Comments  路  Source: jetstack/cert-manager

Describe the bug:
Strange issue just occurred as we applied our helm-based cert-manager configuration to a new AKS cluster (details below). This same approach applied in the previous k8s cluster works fine.

What's really interesting is when looking at the authz url output from the certificate itself on the new cluster, it appears that letsencrypt thinks that it's a valid cert.

Authz Url:

{
  "identifier": {
    "type": "dns",
    "value": "domain2.com"
  },
  "status": "valid",
  "expires": "2018-09-12T19:37:04Z",
  "challenges": [
    {
      "type": "http-01",
      "status": "pending",
      "url": "https://acme-v02.api.letsencrypt.org/acme/challenge/<redacted>",
      "token": "<redacted>"
    },
    {
      "type": "tls-alpn-01",
      "status": "pending",
      "url": "https://acme-v02.api.letsencrypt.org/acme/challenge/<redacted>",
      "token": "<redacted>"
    },
    {
      "type": "dns-01",
      "status": "valid",
      "url": "https://acme-v02.api.letsencrypt.org/acme/challenge/<redacted>",
      "token": "<redacted>",
      "validationRecord": [
        {
          "hostname": "domain2.com"
        }
      ]
    }
  ]
}

Cert manager logs (repeating):


I0813 21:58:13.958211       1 controller.go:181] certificates controller: syncing item 'kube-system/domain2'
I0813 21:58:13.958312       1 sync.go:242] Preparing certificate kube-system/domain2 with issuer
I0813 21:58:13.958334       1 acme.go:169] getting private key (le-cloudflare-private->tls.key) for acme issuer kube-system/le-cloudflare-issuer
I0813 21:58:13.958756       1 prepare.go:247] Cleaning up previous order for certificate kube-system/domain2
I0813 21:58:13.958773       1 prepare.go:263] Cleaning up old/expired challenges for Certificate kube-system/domain2
I0813 21:58:13.958778       1 prepare.go:287] Cleaning up challenge for domain "domain2" as part of Certificate kube-system/domain2
I0813 21:58:14.221673       1 controller.go:181] certificates controller: syncing item 'kube-system/domain1'
I0813 21:58:14.221804       1 sync.go:242] Preparing certificate kube-system/domain1 with issuer
I0813 21:58:14.221812       1 acme.go:169] getting private key (le-cloudflare-private->tls.key) for acme issuer kube-system/le-cloudflare-issuer
I0813 21:58:14.222268       1 prepare.go:247] Cleaning up previous order for certificate kube-system/domain1
I0813 21:58:14.222289       1 prepare.go:263] Cleaning up old/expired challenges for Certificate kube-system/domain1
I0813 21:58:14.222295       1 prepare.go:287] Cleaning up challenge for domain "domain1" as part of Certificate kube-system/domain1
I0813 21:58:14.299267       1 sync.go:244] Error preparing issuer for certificate kube-system/domain2: No existing record found
E0813 21:58:14.299303       1 sync.go:165] [kube-system/domain2] Error getting certificate 'domain2-tls': secret "domain2-tls" not found
E0813 21:58:14.299334       1 controller.go:190] certificates controller: Re-queuing item "kube-system/domain2" due to error processing: No existing record found
I0813 21:58:14.507396       1 sync.go:244] Error preparing issuer for certificate kube-system/domain1: No existing record found
E0813 21:58:14.507436       1 sync.go:165] [kube-system/domain1] Error getting certificate 'domain1-tls': secret "domain1-tls" not found
E0813 21:58:14.507516       1 controller.go:190] certificates controller: Re-queuing item "kube-system/domain1" due to error processing: No existing record found

Kubectl describe cert:


Name:         domain2
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>
API Version:  certmanager.k8s.io/v1alpha1
Kind:         Certificate
Metadata:
  Cluster Name:
  Creation Timestamp:  2018-08-13T19:35:13Z
  Generation:          1
  Resource Version:    4520
  Self Link:           /apis/certmanager.k8s.io/v1alpha1/namespaces/kube-system/certificates/domain2
  UID:                 ff6b11e3-9f2f-11e8-ae73-0a58ac1f0558
Spec:
  Acme:
    Config:
      Dns 01:
        Provider:  cf-dns
      Domains:
        *.domain2.com
        domain2.com
  Common Name:  *.domain2.com
  Dns Names:
    domain2.com
  Issuer Ref:
    Kind:       ClusterIssuer
    Name:       le-cloudflare-issuer
  Secret Name:  domain2-tls
Status:
  Acme:
    Order:
      Challenges:
        Authz URL:  <redacted>
        Dns 01:
          Provider:  cf-dns
        Domain:      domain2.com
        Key:         <redacted>
        Token:       <redacted>
        Type:        dns-01
        URL:         <redacted>
        Wildcard:    false
        Authz URL:   <redacted>
        Dns 01:
          Provider:  cf-dns
        Domain:      domain2.com
        Key:         <redacted>
        Token:       <redacted>
        Type:        dns-01
        URL:         <redacted>
        Wildcard:    true
      URL:
  Conditions:
    Last Transition Time:  2018-08-13T19:37:36Z
    Message:               Failed to clean up previous order: No existing record found
    Reason:                ValidateError
    Status:                False
    Type:                  Ready
Events:                    <none>

Expected behaviour:
Expected a wildcard cert to be generated via the dns-01 method in our cloudflare hosted dns, as it works in the other clusters

Steps to reproduce the bug:

Config:

  • Install v0.3.3 via helm:
helm install stable/cert-manager --version v0.3.3 --set rbac.create=true

Apply the following config



---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: le-cloudflare-issuer
  namespace: kube-system
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: <redacted>

    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: le-cloudflare-private

    # ACME DNS-01 provider configurations
    dns01:

      # Here we define a list of DNS-01 providers that can solve DNS challenges
      providers:

        - name: cf-dns
          cloudflare:
            # login username
            email: <redacted>
            # A secretKeyRef to a cloudflare api key
            apiKeySecretRef:
              name: cloudflare
              key: apikey



---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: domain1
  namespace: kube-system
spec:
  secretName: domain1-tls
  issuerRef:
    name: le-cloudflare-issuer
    # name: le-cloudflare-issuer-staging
    kind: ClusterIssuer
  commonName: '*.domain1.com'
  dnsNames:
  - domain1.com
  acme:
    config:
    - dns01:
        provider: cf-dns
      domains:
      - '*.domain1.com'
      - domain1.com

Anything else we need to know?:

Environment details::

  • Kubernetes version (e.g. v1.10.2): v1.10.6
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): Azure AKS
  • cert-manager version (e.g. v0.4.0): v0.3.3
  • Install method (e.g. helm or static manifests): helm

/kind bug

areacme kinbug

Most helpful comment

It seems like there was a bug in the cloudflare DNS provider that caused it to trip up when a DNS record already exists for the challenge being solved - it is fixed in #849, which should be included in v0.5 (scheduled to be released today!).

There are some other larger changes going in on in the background for v0.6, that will prevent these sorts of problems causing a hard failure in future 馃槃

Closing this for now, as I believe the issue is fixed. Feel free to re-open if not!

All 3 comments

Quick note: I attempted upgrading to the v0.4.1 version of cert-manager, still had the same problem. May be related to #604?

Hmm, after a few days of disabling cert-manager in our cluster, and then redeploying it and enabling it, it appears that this has disappeared. I'm able to create certs in the new cluster (aside from a rate limit issue).

I'll let a contributor close if desired. If you want more info please dont hesitate to ask.

It seems like there was a bug in the cloudflare DNS provider that caused it to trip up when a DNS record already exists for the challenge being solved - it is fixed in #849, which should be included in v0.5 (scheduled to be released today!).

There are some other larger changes going in on in the background for v0.6, that will prevent these sorts of problems causing a hard failure in future 馃槃

Closing this for now, as I believe the issue is fixed. Feel free to re-open if not!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matthew-muscat picture matthew-muscat  路  4Comments

munjal-patel picture munjal-patel  路  3Comments

timblakely picture timblakely  路  4Comments

cpick picture cpick  路  3Comments

howardjohn picture howardjohn  路  3Comments