Describe the bug:
Strange issue just occurred as we applied our helm-based cert-manager configuration to a new AKS cluster (details below). This same approach applied in the previous k8s cluster works fine.
What's really interesting is when looking at the authz url output from the certificate itself on the new cluster, it appears that letsencrypt thinks that it's a valid cert.
Authz Url:
{
"identifier": {
"type": "dns",
"value": "domain2.com"
},
"status": "valid",
"expires": "2018-09-12T19:37:04Z",
"challenges": [
{
"type": "http-01",
"status": "pending",
"url": "https://acme-v02.api.letsencrypt.org/acme/challenge/<redacted>",
"token": "<redacted>"
},
{
"type": "tls-alpn-01",
"status": "pending",
"url": "https://acme-v02.api.letsencrypt.org/acme/challenge/<redacted>",
"token": "<redacted>"
},
{
"type": "dns-01",
"status": "valid",
"url": "https://acme-v02.api.letsencrypt.org/acme/challenge/<redacted>",
"token": "<redacted>",
"validationRecord": [
{
"hostname": "domain2.com"
}
]
}
]
}
Cert manager logs (repeating):
I0813 21:58:13.958211 1 controller.go:181] certificates controller: syncing item 'kube-system/domain2'
I0813 21:58:13.958312 1 sync.go:242] Preparing certificate kube-system/domain2 with issuer
I0813 21:58:13.958334 1 acme.go:169] getting private key (le-cloudflare-private->tls.key) for acme issuer kube-system/le-cloudflare-issuer
I0813 21:58:13.958756 1 prepare.go:247] Cleaning up previous order for certificate kube-system/domain2
I0813 21:58:13.958773 1 prepare.go:263] Cleaning up old/expired challenges for Certificate kube-system/domain2
I0813 21:58:13.958778 1 prepare.go:287] Cleaning up challenge for domain "domain2" as part of Certificate kube-system/domain2
I0813 21:58:14.221673 1 controller.go:181] certificates controller: syncing item 'kube-system/domain1'
I0813 21:58:14.221804 1 sync.go:242] Preparing certificate kube-system/domain1 with issuer
I0813 21:58:14.221812 1 acme.go:169] getting private key (le-cloudflare-private->tls.key) for acme issuer kube-system/le-cloudflare-issuer
I0813 21:58:14.222268 1 prepare.go:247] Cleaning up previous order for certificate kube-system/domain1
I0813 21:58:14.222289 1 prepare.go:263] Cleaning up old/expired challenges for Certificate kube-system/domain1
I0813 21:58:14.222295 1 prepare.go:287] Cleaning up challenge for domain "domain1" as part of Certificate kube-system/domain1
I0813 21:58:14.299267 1 sync.go:244] Error preparing issuer for certificate kube-system/domain2: No existing record found
E0813 21:58:14.299303 1 sync.go:165] [kube-system/domain2] Error getting certificate 'domain2-tls': secret "domain2-tls" not found
E0813 21:58:14.299334 1 controller.go:190] certificates controller: Re-queuing item "kube-system/domain2" due to error processing: No existing record found
I0813 21:58:14.507396 1 sync.go:244] Error preparing issuer for certificate kube-system/domain1: No existing record found
E0813 21:58:14.507436 1 sync.go:165] [kube-system/domain1] Error getting certificate 'domain1-tls': secret "domain1-tls" not found
E0813 21:58:14.507516 1 controller.go:190] certificates controller: Re-queuing item "kube-system/domain1" due to error processing: No existing record found
Kubectl describe cert:
Name: domain2
Namespace: kube-system
Labels: <none>
Annotations: <none>
API Version: certmanager.k8s.io/v1alpha1
Kind: Certificate
Metadata:
Cluster Name:
Creation Timestamp: 2018-08-13T19:35:13Z
Generation: 1
Resource Version: 4520
Self Link: /apis/certmanager.k8s.io/v1alpha1/namespaces/kube-system/certificates/domain2
UID: ff6b11e3-9f2f-11e8-ae73-0a58ac1f0558
Spec:
Acme:
Config:
Dns 01:
Provider: cf-dns
Domains:
*.domain2.com
domain2.com
Common Name: *.domain2.com
Dns Names:
domain2.com
Issuer Ref:
Kind: ClusterIssuer
Name: le-cloudflare-issuer
Secret Name: domain2-tls
Status:
Acme:
Order:
Challenges:
Authz URL: <redacted>
Dns 01:
Provider: cf-dns
Domain: domain2.com
Key: <redacted>
Token: <redacted>
Type: dns-01
URL: <redacted>
Wildcard: false
Authz URL: <redacted>
Dns 01:
Provider: cf-dns
Domain: domain2.com
Key: <redacted>
Token: <redacted>
Type: dns-01
URL: <redacted>
Wildcard: true
URL:
Conditions:
Last Transition Time: 2018-08-13T19:37:36Z
Message: Failed to clean up previous order: No existing record found
Reason: ValidateError
Status: False
Type: Ready
Events: <none>
Expected behaviour:
Expected a wildcard cert to be generated via the dns-01 method in our cloudflare hosted dns, as it works in the other clusters
Steps to reproduce the bug:
Config:
helm install stable/cert-manager --version v0.3.3 --set rbac.create=true
Apply the following config
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
name: le-cloudflare-issuer
namespace: kube-system
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: <redacted>
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: le-cloudflare-private
# ACME DNS-01 provider configurations
dns01:
# Here we define a list of DNS-01 providers that can solve DNS challenges
providers:
- name: cf-dns
cloudflare:
# login username
email: <redacted>
# A secretKeyRef to a cloudflare api key
apiKeySecretRef:
name: cloudflare
key: apikey
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: domain1
namespace: kube-system
spec:
secretName: domain1-tls
issuerRef:
name: le-cloudflare-issuer
# name: le-cloudflare-issuer-staging
kind: ClusterIssuer
commonName: '*.domain1.com'
dnsNames:
- domain1.com
acme:
config:
- dns01:
provider: cf-dns
domains:
- '*.domain1.com'
- domain1.com
Anything else we need to know?:
Environment details::
v1.10.6v0.3.3/kind bug
Quick note: I attempted upgrading to the v0.4.1 version of cert-manager, still had the same problem. May be related to #604?
Hmm, after a few days of disabling cert-manager in our cluster, and then redeploying it and enabling it, it appears that this has disappeared. I'm able to create certs in the new cluster (aside from a rate limit issue).
I'll let a contributor close if desired. If you want more info please dont hesitate to ask.
It seems like there was a bug in the cloudflare DNS provider that caused it to trip up when a DNS record already exists for the challenge being solved - it is fixed in #849, which should be included in v0.5 (scheduled to be released today!).
There are some other larger changes going in on in the background for v0.6, that will prevent these sorts of problems causing a hard failure in future 馃槃
Closing this for now, as I believe the issue is fixed. Feel free to re-open if not!
Most helpful comment
It seems like there was a bug in the cloudflare DNS provider that caused it to trip up when a DNS record already exists for the challenge being solved - it is fixed in #849, which should be included in v0.5 (scheduled to be released today!).
There are some other larger changes going in on in the background for v0.6, that will prevent these sorts of problems causing a hard failure in future 馃槃
Closing this for now, as I believe the issue is fixed. Feel free to re-open if not!