Cert-manager: Order gets stuck on failed after updating a Certificate from an invalid to a valid Dns/CommonName

Created on 4 Sep 2020  Â·  13Comments  Â·  Source: jetstack/cert-manager

Bugs should be filed for issues encountered whilst operating cert-manager.
You should first attempt to resolve your issues through the community support
channels, e.g. Slack, in order to rule out individual configuration errors.
Please provide as much detail as possible.

Describe the bug:
After Creating a Certificate with a wrong CommonName / DnsName (e.g. hello.k8) an Order is created. The order can not be completed as it gets declined by the ACME Server (e.g. Let's Encrypt).
After Updating the Certificate to have a valid CommonName/DnsName (e.g. hello.k8.fischler.eu) the Order is not Updated.
You need to delete and recreate the certificate (maybe even delete the order) for the Order to get recreated and completed succesfully.

Expected behaviour:
The Order should be updated/recreated to reflect the updated DNS/CommonName of the Certificate

Steps to reproduce the bug:

  • Create a Certificate with an invalid Dns/Common Name (e.g. hello.k8)
  • Wait for the Order to be declined by the ACME Server (Let's Encrypt)
  • Update the Certificate Ressource through kubectl apply or helm with a valid DNS/CommonName (e.g. hello.k8.fischler.eu)
  • The Order does not get updated

Anything else we need to know?:
If you need any more information let me know

Environment details::

  • Kubernetes version (e.g. v1.10.2): 1.17
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): AWS
  • cert-manager version (e.g. v0.4.0): v0.16.1
  • Install method (e.g. helm or static manifests): helm

/kind bug

areacme good first issue kinbug prioritimportant-soon

Most helpful comment

I have seen this behavior in the controller that creates the orders, it doesn't properly check those. We should improve this logic.

/milestone v1.1
/area acme
/priority important-soon

All 13 comments

I have seen this behavior in the controller that creates the orders, it doesn't properly check those. We should improve this logic.

/milestone v1.1
/area acme
/priority important-soon

@meyskens: The label(s) priority/ cannot be applied, because the repository doesn't have them

In response to this:

I have seen this behavior in the controller that creates the orders, it doesn't properly check those. We should improve this logic.

/milestone v1.1
/area acme
/priority important-soon

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/assign

@sharmaansh21 how is it going? If we can help with anything let me know

@sharmaansh21 If that's OK for you I'll start investigating 😊

Sure, please do.

On Wed, 4 Nov 2020 at 10:57, Maël Valais notifications@github.com wrote:

@sharmaansh21 https://github.com/sharmaansh21 If that's OK for you I'll
start investigating 😊

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jetstack/cert-manager/issues/3250#issuecomment-721662878,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABCWEYLL6DURFCKTDAB5XBLSOEXQ7ANCNFSM4QXLAM4A
.

--
Regards,
Anshul Sharma
+31-616039562

This is the method that needs looking into: https://github.com/jetstack/cert-manager/blob/8127f0ad4228c58f37f0c0e4f5a2ef90c04f5082/pkg/controller/certificates/requestmanager/requestmanager_controller.go#L255

If a CertificateRequest contains a CSR that does not match the current Certificate spec, the CertificateRequest should be deleted and recreated (thus causing a new Order to be created). It is intentional that the Order is not _updated_, instead a new Order altogether will be created.

The function that actually performs this check is here, and it _does_ seem that it'd cover the case you've described: https://github.com/jetstack/cert-manager/blob/8127f0ad4228c58f37f0c0e4f5a2ef90c04f5082/pkg/controller/certificates/util.go#L97

We also have a test case that verifies that it _is_ possible to add an additional dnsName to a Certificate resource: https://github.com/jetstack/cert-manager/blob/8127f0ad4228c58f37f0c0e4f5a2ef90c04f5082/test/e2e/suite/issuers/acme/certificate/http01.go#L223

As a first step, I'd recommend putting together an end-to-end test case similar to above that precisely exercises the codepath/issue described here 😄

I also suspect that if this issue _does_ exist, it will _not_ just affect ACME (given that the ACME issuer does not have any special handling for this case, it solely resides in the Certificates controller)

/assign @maelvls

/unassign @sharmaansh21

I successfully reproduced the bug with a manual test. In the test, I create a Certificate with an invalid DNS name, I wait for a bit, and then I edit the DNS name with a valid one:

Manual test: instructions and results

You will need the kubectl cert-manager plugin as well as a local Kind cluster that I created with

./devel/cluster/create.sh && ./devel/addon/certmanager/install.sh

Let's create a simple certificate relying on ACME. The certificate is voluntarily invalid:

kubectl apply -f- <<EOF
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: example-issuer
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: example-issuer-account-key
    solvers:
      - http01:
          ingress:
            class: nginx
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: example-cert
spec:
  secretName: ca-key-pair
  dnsNames:
  - google.com
  issuerRef:
    name: example-issuer
    kind: Issuer
    group: cert-manager.io
EOF

Let's see the certificate & certificaterequest:

% kubectl cert-manager status certificate example-cert                
Name: example-cert
Namespace: default
Created at: 2020-11-05T15:59:26+01:00
Conditions:
  Issuing: False, Reason: Failed, Message: The certificate request has failed to complete and will be retried: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
  Ready: False, Reason: DoesNotExist, Message: Issuing certificate as Secret does not exist
DNS Names:
- google.com
Events:
  Type     Reason     Age   From          Message
  ----     ------     ----  ----          -------
  Normal   Issuing    23s   cert-manager  Issuing certificate as Secret does not exist
  Normal   Generated  22s   cert-manager  Stored new private key in temporary Secret resource "example-cert-fp76g"
  Normal   Requested  22s   cert-manager  Created new CertificateRequest resource "example-cert-6n2r5"
  Warning  Failed     20s   cert-manager  The certificate request has failed to complete and will be retried: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Issuer:
  Name: example-issuer
  Kind: Issuer
  Conditions:
    Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server
  Events:  <none>
error when finding Secret "ca-key-pair": secrets "ca-key-pair" not found
Not Before: <none>
Not After: <none>
Renewal Time: <none>
CertificateRequest:
  Name: example-cert-6n2r5
  Namespace: default
  Conditions:
    Ready: False, Reason: Failed, Message: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
  Events:
    Type     Reason          Age   From          Message
    ----     ------          ----  ----          -------
    Normal   IssuerNotReady  22s   cert-manager  Referenced issuer does not have a Ready status condition
    Warning  OrderFailed     20s   cert-manager  Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Order:
  Name: example-cert-6n2r5-2320721609
  State: errored, Reason: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
  No Authorizations for this Order
  FailureTime: 2020-11-05T15:59:29+01:00
No Challenges found for this Certificate

Now, let's use a valid dns name:

kubectl patch certificate example-cert -p '{"spec": {"dnsNames": ["example.com"]}}' --type=merge

Now, we can see that the CertificateRequest has not been updated:

Name: example-cert
Namespace: default
Created at: 2020-11-05T15:59:26+01:00
Conditions:
  Issuing: False, Reason: Failed, Message: The certificate request has failed to complete and will be retried: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
  Ready: False, Reason: DoesNotExist, Message: Issuing certificate as Secret does not exist
DNS Names:
- example.com
Events:
  Type     Reason     Age   From          Message
  ----     ------     ----  ----          -------
  Normal   Issuing    114s  cert-manager  Issuing certificate as Secret does not exist
  Normal   Generated  113s  cert-manager  Stored new private key in temporary Secret resource "example-cert-fp76g"
  Normal   Requested  113s  cert-manager  Created new CertificateRequest resource "example-cert-6n2r5"
  Warning  Failed     111s  cert-manager  The certificate request has failed to complete and will be retried: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Issuer:
  Name: example-issuer
  Kind: Issuer
  Conditions:
    Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server
  Events:  <none>
error when finding Secret "ca-key-pair": secrets "ca-key-pair" not found
Not Before: <none>
Not After: <none>
Renewal Time: <none>
CertificateRequest:
  Name: example-cert-6n2r5
  Namespace: default
  Conditions:
    Ready: False, Reason: Failed, Message: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
  Events:
    Type     Reason          Age   From          Message
    ----     ------          ----  ----          -------
    Normal   IssuerNotReady  113s  cert-manager  Referenced issuer does not have a Ready status condition
    Warning  OrderFailed     111s  cert-manager  Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Order:
  Name: example-cert-6n2r5-2320721609
  State: errored, Reason: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
  No Authorizations for this Order
  FailureTime: 2020-11-05T15:59:29+01:00
No Challenges found for this Certificate

The Challenge gets stuck on the error:

Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy

After updating the failing Certificate with a valid DNS name, the CertificateRequest, Order and Challenge don't get updated or recreated (that's the bug).


Then I tried to write an end-to-end test that would do the same as above. I opened https://github.com/jetstack/cert-manager/pull/3438 to give us a place where to discuss.


E2e test: instructions and results

git fetch origin refs/pull/3438/head:pr3438 && git checkout pr3438
./devel/cluster/create.sh && ./devel/setup-e2e-deps.sh && ./devel/addon/certmanager/install.sh

# Finally, run the test.
./devel/run-e2e.sh -ginkgo.focus "ACME Certificate.*should allow updating the dns name of a failing certificate that had a wrong dns name"

Before the edit to the certificate is made, the state is:

% kubectl cert-manager status certificate test-acme-certificate
Name: test-acme-certificate                                                                                                                                
Namespace: e2e-tests-create-acme-certificate-http01-hdb4q                                                                                                  
Created at: 2020-11-05T13:40:50+01:00                                                                                                                      
Conditions:                                                                                                                                                
  Issuing: True, Reason: DoesNotExist, Message: Issuing certificate as Secret does not exist                                                               
  Ready: False, Reason: DoesNotExist, Message: Issuing certificate as Secret does not exist                                                                
DNS Names:                                                                                                                                                 
- google.com                                                                                                                                               
Events:                                                                                                                                                    
  Type    Reason     Age   From          Message                                                                                                           
  ----    ------     ----  ----          -------                                                                                                           
  Normal  Issuing    59s   cert-manager  Issuing certificate as Secret does not exist                                                                      
  Normal  Generated  59s   cert-manager  Stored new private key in temporary Secret resource "test-acme-certificate-zhjgt"                                 
  Normal  Requested  58s   cert-manager  Created new CertificateRequest resource "test-acme-certificate-x4rfj"                                             
Issuer:                                                                                                                                                    
  Name: test-acme-issuer                                                                                                                                   
  Kind: Issuer                                                                                                                                             
  Conditions:                                                                                                                                              
    Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server                                              
  Events:  <none>                                                                                                                                          
error when finding Secret "test-acme-certificate": secrets "test-acme-certificate" not found                                                               
Not Before: <none>                                                                                                                                         
Not After: <none>                                                                                                                                          
Renewal Time: <none>                                                                                                                                       
CertificateRequest:                                                                                                                                        
  Name: test-acme-certificate-x4rfj                                                                                                                        
  Namespace: e2e-tests-create-acme-certificate-http01-hdb4q                                                                                                
  Conditions:                                                                                                                                              
    Ready: False, Reason: Pending, Message: Waiting on certificate issuance from order e2e-tests-create-acme-certificate-http01-hdb4q/test-acme-certificate
-x4rfj-463636460: "pending"                                                                                                                                
  Events:                                                                                                                                                  
    Type    Reason        Age   From          Message                                                                                                      
    ----    ------        ----  ----          -------                                                                                                      
    Normal  OrderCreated  58s   cert-manager  Created Order resource e2e-tests-create-acme-certificate-http01-hdb4q/test-acme-certificate-x4rfj-463636460  
    Normal  OrderPending  58s   cert-manager  Waiting on certificate issuance from order e2e-tests-create-acme-certificate-http01-hdb4q/test-acme-certifica
te-x4rfj-463636460: ""                                                                                                                                     
Order:                                                                                                                                                     
  Name: test-acme-certificate-x4rfj-463636460                                                                                                              
  State: pending, Reason:                                                                                                                                  
  Authorizations:                                                                                                                                          
    URL: https://pebble.pebble.svc.cluster.local/authZ/sbPj5gxbvQRbL4Mk7ZRHIPXSqUpr0eNRTJYBXun9Uxs, Identifier: google.com, Initial State: pending, Wildcar
d: false                                                                                                                                                   
Challenges:                                                                                                                                                
- Name: test-acme-certificate-x4rfj-463636460-500193737, Type: HTTP-01, Token: BqnSy9H-5-qDZqqV3ZR-REMjRuacqNXE0OVyaAG0JvA, Key: BqnSy9H-5-qDZqqV3ZR-REMjRu
acqNXE0OVyaAG0JvA.MmxIwKeh1yZc4np-i4Yh7oUgrHTdkR2weKLRGszl_NE, State: pending, Reason: Waiting for HTTP-01 challenge propagation: wrong status code '404', 
expected '200', Processing: true, Presented: true

The state of the Certificate at the end of the test is:

% kubectl cert-manager status certificate test-acme-certificate
Name: test-acme-certificate                                                                                                                                
Namespace: e2e-tests-create-acme-certificate-http01-hdb4q                                                                                                  
Created at: 2020-11-05T13:40:50+01:00                                                                                                                      
Conditions:                                                                                                                                                
  Ready: True, Reason: Ready, Message: Certificate is up to date and has not expired                                                                       
DNS Names:                                                                                                                                                 
- aeuhg.ycqov.ingress-nginx.http01.example.com                                                                                                             
Events:                                                                                                                                                    
  Type    Reason     Age   From          Message                                                                                                           
  ----    ------     ----  ----          -------                                                                                                           
  Normal  Issuing    118s  cert-manager  Issuing certificate as Secret does not exist                                                                      
  Normal  Generated  118s  cert-manager  Stored new private key in temporary Secret resource "test-acme-certificate-zhjgt"                                 
  Normal  Requested  117s  cert-manager  Created new CertificateRequest resource "test-acme-certificate-x4rfj"                                             
  Normal  Requested  57s   cert-manager  Created new CertificateRequest resource "test-acme-certificate-bqkzl"                                             
  Normal  Issuing    31s   cert-manager  The certificate has been successfully issued                                                                      
Issuer:                                                                                                                                                    
  Name: test-acme-issuer                                                                                                                                   
  Kind: Issuer                                                                                                                                             
  Conditions:                                                                                                                                              
    Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server                                              
  Events:  <none>                                                                                                                                          
Secret:                                                                                                                                                    
  Name: test-acme-certificate                                                                                                                              
  Issuer Country:                                                                                                                                          
  Issuer Organisation:                                                                                                                                     
  Issuer Common Name: Pebble Intermediate CA 75e474                                                                                                        
  Key Usage: Digital Signature, Key Encipherment                                                                                                           
  Extended Key Usages: Server Authentication, Client Authentication                                                                                        
  Public Key Algorithm: RSA                                                                                                                                
  Signature Algorithm: SHA256-RSA                                                                                                                          
  Subject Key ID: 48a0765d9bf9f7ab8d91ea6fe54545ad8ca4b3f3                                                                                                 
  Authority Key ID: b624ab4bee93602c8eff49e8d8e0042ed27a71aa                                                                                               
  Serial Number: 4f987afe3c2a013f                                                                                                                          
  Events:  <none>                                                                                                                                          
Not Before: 2020-11-05T13:42:17+01:00                                                                                                                      
Not After: 2025-11-05T13:42:17+01:00                                                                                                                       
Renewal Time: 2025-10-06T14:42:17+02:00                                                                                                                    
No CertificateRequest found for this Certificate 

Unfortunately, I wasn't able to reproduce what happens in the manual test. I ran the e2e test and ran kubectl cert-manager status certificate every two seconds to see how the certificate evolves:

asciicast

The main difference between the manual and e2e test is the state of the Challenge right before the update. The manual test shows this:

# manual test (right before the update)
Order:
  Name: example-cert-6n2r5-2320721609
  State: errored, Reason: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
  No Authorizations for this Order
  FailureTime: 2020-11-05T15:59:29+01:00
No Challenges found for this Certificate

(no new CertificateRequest is created after the Certificate edit).

On the other side, the e2e test shows this:

# e2e test (right before the update)
Order:                                                                                                                                                     
  Name: test-acme-certificate-x4rfj-463636460                                                                                                              
  State: pending, Reason:                                                                                                                                  
  Authorizations:                                                                                                                                          
    URL: https://pebble.pebble.svc.cluster.local/authZ/sbPj5gxbvQRbL4Mk7ZRHIPXSqUpr0eNRTJYBXun9Uxs, Identifier: google.com, Initial State: pending, Wildcard: false                                                                                                                                                   
Challenges:                                                                                           
- Name: test-acme-certificate-x4rfj-463636460-500193737, Type: HTTP-01, Token: BqnSy9H-5-qDZqqV3ZR-REMjRuacqNXE0OVyaAG0JvA, Key: BqnSy9H-5-qDZqqV3ZR-REMjRu
acqNXE0OVyaAG0JvA.MmxIwKeh1yZc4np-i4Yh7oUgrHTdkR2weKLRGszl_NE, State: pending,
Reason: Waiting for HTTP-01 challenge propagation: wrong status code '404', expected '200',
Processing: true, Presented: true

(and as soon as the Certificate gets edited, a new CertificateRequest is created).

I am stuck 😞
I will continue tomorrow

Digging briefly into Pebble, the Let's Encrypt test server used during e2e tests, it seems like it does not support configuring a list of domains that should return this 400 error for.

I actually believe that this may be (almost) working as intended.. this was a failed order because the request submitted to the ACME server was deemed invalid, which will trigger our regular 'back off' behaviour (which is to wait 1h before retrying again).

I think the bug here may therefore be in the logic that determines whether to back off or retry immediately. The logic for that is here: https://github.com/jetstack/cert-manager/blob/87989dbfe35bed99a9e031c71ad3a7d49030a8bf/pkg/controller/certificates/trigger/trigger_controller.go#L156-L166

I think we need to adapt this area of code to trigger an issuance immediately iff the spec of the Certificate does not match the previously failed CertificateRequest (i.e. the user has edited their Certificate resource).

This is definitely a regression, and is something we used to handle properly. There is a function you can use to determine whether a 'request matches the certificate spec' here: https://github.com/jetstack/cert-manager/blob/87989dbfe35bed99a9e031c71ad3a7d49030a8bf/pkg/controller/certificates/util.go#L97

Thanks, you are right! After reducing the 1 hour re-issuance delay to 1 second, the issue (seemingly) disappears:

https://github.com/jetstack/cert-manager/blob/db9b6448b692eb08e2bb8b8b93d86621029a6feb/pkg/controller/certificates/trigger/trigger_controller.go#L52-L56

After a conversation with James on #cert-manager-dev, the plan is now:

  1. create a failing test case in trigger_controller_test.go; we would have a certificate with LastFailureTime = true. I'm not sure how to check that a new item was added to the schedule queue though
  2. Make sure that we don't wait for 1 hour when the cert's spec doesn't match the request and tweak things a bit so that input is available and make sure input.Request isn't nil; that would look like
    diff -if crt.Status.LastFailureTime != nil { +if crt.Status.LastFailureTime != nil && certificates.RequestMatchesSpec(crt, input.Request) { now := c.clock.Now() retryAfter := crt.Status.LastFailureTime.Add(retryAfterLastFailure) if now.Before(retryAfter) { log.V(logf.InfoLevel).Info("Not re-issuing certificate as an attempt has been made in the last hour", "retry_after", retryAfter) c.scheduleRecheckOfCertificateIfRequired(log, key, retryAfter.Sub(now)) return nil } }

I'll open a PR with these changes

Was this page helpful?
0 / 5 - 0 ratings