Describe the bug:
Certificate creation is stuck for a specific host URL
Order has no state and no challenges are created.
No api rate limit exceeded for letsencrypt in cert-manager logs
Expected behaviour:
Certificate creation either succeeds for the previously mentioned URL or Fails with a status on the order.
Anything else we need to know?:
If we change to a new host URL the certificate is created successfully and site is secure.
Environment details:
/kind bug
cert-manager logs upon creation of the ingress rule for the aforementioned specific URL
I0325 11:27:47.506645 1 controller.go:138] cert-manager/controller/ingress-shim "msg"="syncing item" "key"="pp/hybris-ingress"
I0325 11:27:47.518624 1 controller.go:144] cert-manager/controller/ingress-shim "msg"="finished processing work item" "key"="pp/hybris-ingress"
I0325 11:27:47.518907 1 controller.go:138] cert-manager/controller/certificates "msg"="syncing item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.519001 1 controller.go:138] cert-manager/controller/ingress-shim "msg"="syncing item" "key"="pp/hybris-ingress"
I0325 11:27:47.519079 1 sync.go:170] cert-manager/controller/ingress-shim "msg"="certificate already exists for ingress resource, ensuring it is up to date" "related_resource_kind"="Certifi
cate" "related_resource_name"="cert-manager-bug-tls" "related_resource_namespace"="pp" "resource_kind"="Ingress" "resource_name"="hybris-ingress" "resource_namespace"="pp"
I0325 11:27:47.519108 1 sync.go:183] cert-manager/controller/ingress-shim "msg"="certificate resource is already up to date for ingress" "related_resource_kind"="Certificate" "related_resou
rce_name"="cert-manager-bug-tls" "related_resource_namespace"="pp" "resource_kind"="Ingress" "resource_name"="hybris-ingress" "resource_namespace"="pp"
I0325 11:27:47.519151 1 controller.go:144] cert-manager/controller/ingress-shim "msg"="finished processing work item" "key"="pp/hybris-ingress"
I0325 11:27:47.892231 1 conditions.go:155] Setting lastTransitionTime for Certificate "cert-manager-bug-tls" condition "Ready" to 2020-03-25 11:27:47.89221986 +0000 UTC m=+76155.571811337
I0325 11:27:47.900712 1 controller.go:144] cert-manager/controller/certificates "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.900749 1 controller.go:138] cert-manager/controller/certificates "msg"="syncing item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.901039 1 sync.go:368] cert-manager/controller/certificates "msg"="no existing CertificateRequest resource exists, creating new request..." "related_resource_kind"="Secret" "r
elated_resource_name"="cert-manager-bug-tls" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resource_namespace"="pp"
I0325 11:27:47.901079 1 controller.go:138] cert-manager/controller/ingress-shim "msg"="syncing item" "key"="pp/hybris-ingress"
I0325 11:27:47.901169 1 sync.go:170] cert-manager/controller/ingress-shim "msg"="certificate already exists for ingress resource, ensuring it is up to date" "related_resource_kind"="Certifi
cate" "related_resource_name"="cert-manager-bug-tls" "related_resource_namespace"="pp" "resource_kind"="Ingress" "resource_name"="hybris-ingress" "resource_namespace"="pp"
I0325 11:27:47.901197 1 sync.go:183] cert-manager/controller/ingress-shim "msg"="certificate resource is already up to date for ingress" "related_resource_kind"="Certificate" "related_resou
rce_name"="cert-manager-bug-tls" "related_resource_namespace"="pp" "resource_kind"="Ingress" "resource_name"="hybris-ingress" "resource_namespace"="pp"
I0325 11:27:47.901216 1 controller.go:144] cert-manager/controller/ingress-shim "msg"="finished processing work item" "key"="pp/hybris-ingress"
I0325 11:27:47.913015 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-selfsigned "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.913062 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-selfsigned "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.913073 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-vault "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.913103 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-ca "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.913135 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-vault "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.913146 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-ca "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.913073 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-acme "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.913111 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-venafi "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.913255 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-venafi "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.913651 1 sync.go:380] cert-manager/controller/certificates "msg"="created certificate request" "related_resource_kind"="Secret" "related_resource_name"="cert-manager-bug-tls"
"related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resource_namespace"="pp" "request_name"="cert-manager-bug-tls-1268926097"
I0325 11:27:47.913845 1 conditions.go:155] Setting lastTransitionTime for Certificate "cert-manager-bug-tls" condition "Ready" to 2020-03-25 11:27:47.913834394 +0000 UTC m=+76155.593425877
E0325 11:27:47.921984 1 controller.go:140] cert-manager/controller/certificates "msg"="re-queuing item due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-
manager.io \"cert-manager-bug-tls\": the object has been modified; please apply your changes to the latest version and try again" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.922018 1 controller.go:138] cert-manager/controller/certificates "msg"="syncing item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.922241 1 sync.go:386] cert-manager/controller/certificates "msg"="validating existing CSR data" "related_resource_kind"="CertificateRequest" "related_resource_name"="cert-man
ager-bug-tls-1268926097" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resource_namespace"="pp"
I0325 11:27:47.922338 1 sync.go:511] cert-manager/controller/certificates "msg"="CertificateRequest is not in a final state, waiting until CertificateRequest is complete" "related_resource_
kind"="CertificateRequest" "related_resource_name"="cert-manager-bug-tls-1268926097" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resour
ce_namespace"="pp" "state"=""
I0325 11:27:47.923574 1 conditions.go:200] Setting lastTransitionTime for CertificateRequest "cert-manager-bug-tls-1268926097" condition "Ready" to 2020-03-25 11:27:47.923570016 +0000 UTC m
=+76155.603161475
I0325 11:27:47.932128 1 controller.go:138] cert-manager/controller/ingress-shim "msg"="syncing item" "key"="pp/hybris-ingress"
I0325 11:27:47.932203 1 sync.go:170] cert-manager/controller/ingress-shim "msg"="certificate already exists for ingress resource, ensuring it is up to date" "related_resource_kind"="Certifi
cate" "related_resource_name"="cert-manager-bug-tls" "related_resource_namespace"="pp" "resource_kind"="Ingress" "resource_name"="hybris-ingress" "resource_namespace"="pp"
I0325 11:27:47.932223 1 sync.go:183] cert-manager/controller/ingress-shim "msg"="certificate resource is already up to date for ingress" "related_resource_kind"="Certificate" "related_resou
rce_name"="cert-manager-bug-tls" "related_resource_namespace"="pp" "resource_kind"="Ingress" "resource_name"="hybris-ingress" "resource_namespace"="pp"
I0325 11:27:47.932243 1 controller.go:144] cert-manager/controller/ingress-shim "msg"="finished processing work item" "key"="pp/hybris-ingress"
I0325 11:27:47.932887 1 controller.go:144] cert-manager/controller/certificates "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.932918 1 controller.go:138] cert-manager/controller/certificates "msg"="syncing item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.933139 1 sync.go:386] cert-manager/controller/certificates "msg"="validating existing CSR data" "related_resource_kind"="CertificateRequest" "related_resource_name"="cert-man
ager-bug-tls-1268926097" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resource_namespace"="pp"
I0325 11:27:47.933227 1 sync.go:511] cert-manager/controller/certificates "msg"="CertificateRequest is not in a final state, waiting until CertificateRequest is complete" "related_resource_
kind"="CertificateRequest" "related_resource_name"="cert-manager-bug-tls-1268926097" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resour
ce_namespace"="pp" "state"=""
I0325 11:27:47.933401 1 controller.go:144] cert-manager/controller/certificates "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.933605 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-acme "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.933633 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-acme "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.933899 1 conditions.go:200] Setting lastTransitionTime for CertificateRequest "cert-manager-bug-tls-1268926097" condition "Ready" to 2020-03-25 11:27:47.933895024 +0000 UTC m
=+76155.613486485
I0325 11:27:47.933933 1 acme.go:201] cert-manager/controller/certificaterequests-issuer-acme/sign "msg"="acme Order resource is not in a ready state, waiting..." "related_resource_kind"="Or
der" "related_resource_name"="cert-manager-bug-tls-1268926097-245966590" "related_resource_namespace"="pp" "resource_kind"="CertificateRequest" "resource_name"="cert-manager-bug-tls-1268926097" "
resource_namespace"="pp"
I0325 11:27:47.934463 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-selfsigned "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.934535 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-vault "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.934565 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-venafi "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.934609 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-ca "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.934741 1 controller.go:138] cert-manager/controller/certificates "msg"="syncing item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.934821 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-ca "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.934651 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-selfsigned "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.934666 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-venafi "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.934741 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-vault "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.935016 1 sync.go:386] cert-manager/controller/certificates "msg"="validating existing CSR data" "related_resource_kind"="CertificateRequest" "related_resource_name"="cert-man
ager-bug-tls-1268926097" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resource_namespace"="pp"
I0325 11:27:47.935175 1 sync.go:511] cert-manager/controller/certificates "msg"="CertificateRequest is not in a final state, waiting until CertificateRequest is complete" "related_resource_
kind"="CertificateRequest" "related_resource_name"="cert-manager-bug-tls-1268926097" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resour
ce_namespace"="pp" "state"="Pending"
I0325 11:27:47.935308 1 controller.go:144] cert-manager/controller/certificates "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls"
E0325 11:27:47.943164 1 controller.go:140] cert-manager/controller/certificaterequests-issuer-acme "msg"="re-queuing item due to error processing" "error"="Operation cannot be fulfilled on
certificaterequests.cert-manager.io \"cert-manager-bug-tls-1268926097\": the object has been modified; please apply your changes to the latest version and try again" "key"="pp/cert-manager-bug-t
ls-1268926097"
I0325 11:27:47.943196 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-acme "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.943514 1 acme.go:201] cert-manager/controller/certificaterequests-issuer-acme/sign "msg"="acme Order resource is not in a ready state, waiting..." "related_resource_kind"="Or
der" "related_resource_name"="cert-manager-bug-tls-1268926097-245966590" "related_resource_namespace"="pp" "resource_kind"="CertificateRequest" "resource_name"="cert-manager-bug-tls-1268926097" "
resource_namespace"="pp"
I0325 11:27:47.955958 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-selfsigned "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.956105 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-selfsigned "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.956130 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-ca "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.956185 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-vault "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.956329 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-vault "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.956381 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-ca "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.956384 1 controller.go:138] cert-manager/controller/certificates "msg"="syncing item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.956717 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-venafi "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.956918 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-venafi "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.956933 1 sync.go:386] cert-manager/controller/certificates "msg"="validating existing CSR data" "related_resource_kind"="CertificateRequest" "related_resource_name"="cert-man
ager-bug-tls-1268926097" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resource_namespace"="pp"
I0325 11:27:47.958240 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-acme "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.958274 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-acme "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:47.958941 1 sync.go:511] cert-manager/controller/certificates "msg"="CertificateRequest is not in a final state, waiting until CertificateRequest is complete" "related_resource_
kind"="CertificateRequest" "related_resource_name"="cert-manager-bug-tls-1268926097" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resour
ce_namespace"="pp" "state"="Pending"
I0325 11:27:47.959273 1 controller.go:144] cert-manager/controller/certificates "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:47.960713 1 acme.go:201] cert-manager/controller/certificaterequests-issuer-acme/sign "msg"="acme Order resource is not in a ready state, waiting..." "related_resource_kind"="Or
der" "related_resource_name"="cert-manager-bug-tls-1268926097-245966590" "related_resource_namespace"="pp" "resource_kind"="CertificateRequest" "resource_name"="cert-manager-bug-tls-1268926097" "
resource_namespace"="pp"
I0325 11:27:47.960780 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-acme "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:52.922082 1 controller.go:138] cert-manager/controller/certificates "msg"="syncing item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:52.922345 1 sync.go:386] cert-manager/controller/certificates "msg"="validating existing CSR data" "related_resource_kind"="CertificateRequest" "related_resource_name"="cert-man
ager-bug-tls-1268926097" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resource_namespace"="pp"
I0325 11:27:52.922457 1 sync.go:511] cert-manager/controller/certificates "msg"="CertificateRequest is not in a final state, waiting until CertificateRequest is complete" "related_resource_
kind"="CertificateRequest" "related_resource_name"="cert-manager-bug-tls-1268926097" "related_resource_namespace"="pp" "resource_kind"="Certificate" "resource_name"="cert-manager-bug-tls" "resour
ce_namespace"="pp" "state"="Pending"
I0325 11:27:52.922737 1 controller.go:144] cert-manager/controller/certificates "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls"
I0325 11:27:52.943345 1 controller.go:138] cert-manager/controller/certificaterequests-issuer-acme "msg"="syncing item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:27:52.943768 1 acme.go:201] cert-manager/controller/certificaterequests-issuer-acme/sign "msg"="acme Order resource is not in a ready state, waiting..." "related_resource_kind"="Or
der" "related_resource_name"="cert-manager-bug-tls-1268926097-245966590" "related_resource_namespace"="pp" "resource_kind"="CertificateRequest" "resource_name"="cert-manager-bug-tls-1268926097" "
resource_namespace"="pp"
I0325 11:27:52.943828 1 controller.go:144] cert-manager/controller/certificaterequests-issuer-acme "msg"="finished processing work item" "key"="pp/cert-manager-bug-tls-1268926097"
I0325 11:28:11.512141 1 controller.go:138] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-webhook-tls"
I0325 11:28:11.512170 1 controller.go:138] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-webhook-ca"
I0325 11:28:11.512537 1 controller.go:194] cert-manager/controller/webhook-bootstrap/webhook-bootstrap/ca-secret "msg"="ca certificate already up to date" "resource_kind"="Secret" "resource
_name"="cert-manager-webhook-ca" "resource_namespace"="cert-manager"
I0325 11:28:11.512558 1 controller.go:144] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-webhook-ca"
I0325 11:28:11.512791 1 controller.go:246] cert-manager/controller/webhook-bootstrap/webhook-bootstrap/ca-secret "msg"="serving certificate already up to date" "resource_kind"="Secret" "res
ource_name"="cert-manager-webhook-tls" "resource_namespace"="cert-manager"
I0325 11:28:11.512811 1 controller.go:144] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-webhook-tls"
Facing the same issues, most certificates are getting created but a few just don't create a challenge at all.
Also EKS 1.15, cert-manager 1.14.0 and nginx-ingress
Same thing is happening to me w/ v0.13.1
My order just sits there with Events: <none> . Weird thing in the logs is its saying certificate resource is already up to date for ingress" - which is not correct. The certificate is not up to date or ready..
But if I change the ingress and use a slightly different hostname it works. It's almost like cert-manager think the cert is already registered.
The original hostname has been used in the past but we deleted it (and the entire namespace it was used with). Could it be cert-manager is not garbage collecting some resources and believing the name is still in use / being locked somehow?
Seeing the same thing in our cluster on v0.14.0. Most certs work, some don't.
Environment details:
Cloud-provider/provisioner: AWS (Kops)
Kubernetes version: v1.15.7
cert-manager version: v0.14.0
http01 with Traefik v2.1.3
letsencrypt prod URL is used
Install method: static manifests
@richstokes We are seeing the same thing. Did you ever get a resolution?
Afraid not.
@coreycollins we changed the url because it was a development environment, so its just a workaround and probably not viable for production.
We are just using for review environments and not for production so changing the URL worked for us as well. I should note that I was creating and tearing down this certificate about 10 times for testing, so maybe it is some sort of garbage collection problem?
I got the feeling it is because the generated CSR for the CertificateRequest is always the same, but didnt find a way to verify it.
@coreycollins Feels like it could be a GC problem to me too - as I first noticed this happened when I deleted a namespace containing a certificate, and then recreated it.
It feels like cert-manger doesn't "notice" the cert was deleted and thinks its still hanging around somewhere. I've dug around as best I can, but cant find any trace of the original cert.
Ran into this same bug, for the same reason @richstokes described - I deleted a namespace containing a cert, then created it in another namespace.
I'm guessing some cached info is stored in etcd somewhere - does anyone have a workaround to get individual domains unstuck by e.g. deleting certain data out of etcd?
@ericflo I spent some time digging around in etcd on my cluster and couldn't find anything leftover after removing the request/secret/etc. Clearly something is left hanging somewhere though. I'm hitting the same issue.
I tried deleting all cert-manager resources (which were previously installed using the helm chart) and re-installed it. Didn't solve the problem. I tried both 13.1 and 14.2.
Could you confirm that you only see this when using an ingress-shim?
If you get this do you see anything listed on kubectl get certificates?
Could this also be related to the issue described in https://github.com/jetstack/cert-manager/issues/2494 ?
@meyskens - when i was having the issue the certificate was not getting issued (output from kubectl get certificates showed false). The behavior that i saw was that the challenge was not getting created. Unfortunately i don't have the previous logs, but they mirrored the already reported behavior from @AlaaMansourTW.
Interestingly enough - after ~72 hours or so (left for the weekend) the certificate was issued for the FQDN that was not working.
@munnerz @meyskens I believe I have figured this out, now that I have seen it happen. The common thread here is using the Let's Encrypt production issuer and deleting namespaces, which means deleting secrets/TLS certificates as well.
The fundamental issue is that this causes us to exceed Let's Encrypt's Duplicate Certificate Limit of 5 per week. Every time the namespace or secret is deleted, the certificate is renewed. (Side question I've been trying to figure out for a long time: how do I get cert-manger to "adopt" an existing TLS secret containing a Let's Encrypt certificate? Everything I have tried to date has failed, with cert-manager requesting a new certificate even if the existing one is valid.)
Despite #1098, it appears that cert-manager does not create a special RetryBackoff function or set a Deadline in the context.Context object.
This leaves the default behavior of AuthorizeOrder (via post), which is to keep retrying the request until either success or failure, with Too Many Requests (a.k.a. rate limiting) explicitly considered retryable.
So cert-manager just gets the new order stuck in this retry loop without any visibility to the cluster. This is why @jkoyle's certificate just showed up after 72 hours. Once the 7 day rate limit window slid open, that retry loop was there to get the certificate right away.
The behavior I would like to see is
@munnerz I hope I have given you enough evidence that you can remove the priority/awaiting-more-evidence label. If not, please let me know what more you need.
Also got this with cert-manager v1.0.0-alpha.0.
The main problem is the order stuck without any state.
And there are no errors in logs. And even orders for other 'clean' domains also stuck. Cert-manager completely dies.
But time after time we can found something in logs like
"error"="429 urn:ietf:params:acme:error:rateLimited: Error creating new order :: too many certificates already issued for: _DOMAIN_._NAME_: see https://letsencrypt.org/docs/rate-limits/"
And "error"="ACME client for issuer not initialised/available".
But not for every certificate, just for ~1% of requests.
As workaround switched to https://api.buypass.com/acme/directory.
I have dug into this, the Go ACME library sees 429 too many certificates already issued as a retryable error, which in some way is true... However this will cause the controller to hang on waiting for the ACME functions to return.
While fully agreeing with @Nuru the library doesn't expose these errors to us yet :disappointed:
I made a PR to for now limit this in time: https://github.com/jetstack/cert-manager/pull/3212
But it's not a full fix for the issue.
I had similar issue.
TL;DR- I restarted deployment
kubectl rollout restart deployment -n cert-manager cert-manager
Here is what I was doing:
Here is what I experienced
<none> in events.I did step 1 to 7 twice to ensure that I was not missing anything really. But same result. Finally, I just restarted the cert-manager (after reading this issue), and it picked up the order, created challenge and cert was issued.
We are running into this issue with v0.16 and v1.0.0, where no new ACME challenges are being created. Reverting cert-manager-controller to 0.15.2 fixes the issue.
I have similar issue with 0.16.2, restarting (i.e. rolling out) cert-manager does nothing. I have upgraded to 1.0.4 and order finally got their "fine" Error status. Found nothing in previous logs but after upgrade, it shows me rate limiting issues.
I just ran into this issue. I upgraded from v0.16.2 to v1.0.0 and the order updated to acme:error:rateLimited: Error creating new order :: too many certificates already issued for exact set of domains. Thx @loganmzz for pointing this out.
Most helpful comment
@munnerz @meyskens I believe I have figured this out, now that I have seen it happen. The common thread here is using the Let's Encrypt production issuer and deleting namespaces, which means deleting secrets/TLS certificates as well.
The fundamental issue is that this causes us to exceed Let's Encrypt's Duplicate Certificate Limit of 5 per week. Every time the namespace or secret is deleted, the certificate is renewed. (Side question I've been trying to figure out for a long time: how do I get cert-manger to "adopt" an existing TLS secret containing a Let's Encrypt certificate? Everything I have tried to date has failed, with cert-manager requesting a new certificate even if the existing one is valid.)
Despite #1098, it appears that cert-manager does not create a special
RetryBackofffunction or set aDeadlinein thecontext.Contextobject.https://github.com/jetstack/cert-manager/blob/1f3b883cfd1a7a4fb5f00fcdb9e608fc0158d3bc/pkg/controller/acmeorders/sync.go#L199
This leaves the default behavior of AuthorizeOrder (via
post), which is to keep retrying the request until either success or failure, withToo Many Requests(a.k.a. rate limiting) explicitly considered retryable.So cert-manager just gets the new order stuck in this retry loop without any visibility to the cluster. This is why @jkoyle's certificate just showed up after 72 hours. Once the 7 day rate limit window slid open, that retry loop was there to get the certificate right away.
The behavior I would like to see is