Bugs should be filed for issues encountered whilst operating cert-manager.
You should first attempt to resolve your issues through the community support
channels, e.g. Slack, in order to rule out individual configuration errors.
Please provide as much detail as possible.
Describe the bug:
After Creating a Certificate with a wrong CommonName / DnsName (e.g. hello.k8) an Order is created. The order can not be completed as it gets declined by the ACME Server (e.g. Let's Encrypt).
After Updating the Certificate to have a valid CommonName/DnsName (e.g. hello.k8.fischler.eu) the Order is not Updated.
You need to delete and recreate the certificate (maybe even delete the order) for the Order to get recreated and completed succesfully.
Expected behaviour:
The Order should be updated/recreated to reflect the updated DNS/CommonName of the Certificate
Steps to reproduce the bug:
Anything else we need to know?:
If you need any more information let me know
Environment details::
/kind bug
I have seen this behavior in the controller that creates the orders, it doesn't properly check those. We should improve this logic.
/milestone v1.1
/area acme
/priority important-soon
@meyskens: The label(s) priority/ cannot be applied, because the repository doesn't have them
In response to this:
I have seen this behavior in the controller that creates the orders, it doesn't properly check those. We should improve this logic.
/milestone v1.1
/area acme
/priority important-soon
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign
@sharmaansh21 how is it going? If we can help with anything let me know
@sharmaansh21 If that's OK for you I'll start investigating 😊
Sure, please do.
On Wed, 4 Nov 2020 at 10:57, Maël Valais notifications@github.com wrote:
@sharmaansh21 https://github.com/sharmaansh21 If that's OK for you I'll
start investigating 😊—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jetstack/cert-manager/issues/3250#issuecomment-721662878,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABCWEYLL6DURFCKTDAB5XBLSOEXQ7ANCNFSM4QXLAM4A
.
--
Regards,
Anshul Sharma
+31-616039562
This is the method that needs looking into: https://github.com/jetstack/cert-manager/blob/8127f0ad4228c58f37f0c0e4f5a2ef90c04f5082/pkg/controller/certificates/requestmanager/requestmanager_controller.go#L255
If a CertificateRequest contains a CSR that does not match the current Certificate spec, the CertificateRequest should be deleted and recreated (thus causing a new Order to be created). It is intentional that the Order is not _updated_, instead a new Order altogether will be created.
The function that actually performs this check is here, and it _does_ seem that it'd cover the case you've described: https://github.com/jetstack/cert-manager/blob/8127f0ad4228c58f37f0c0e4f5a2ef90c04f5082/pkg/controller/certificates/util.go#L97
We also have a test case that verifies that it _is_ possible to add an additional dnsName to a Certificate resource: https://github.com/jetstack/cert-manager/blob/8127f0ad4228c58f37f0c0e4f5a2ef90c04f5082/test/e2e/suite/issuers/acme/certificate/http01.go#L223
As a first step, I'd recommend putting together an end-to-end test case similar to above that precisely exercises the codepath/issue described here 😄
I also suspect that if this issue _does_ exist, it will _not_ just affect ACME (given that the ACME issuer does not have any special handling for this case, it solely resides in the Certificates controller)
/assign @maelvls
/unassign @sharmaansh21
I successfully reproduced the bug with a manual test. In the test, I create a Certificate with an invalid DNS name, I wait for a bit, and then I edit the DNS name with a valid one:
You will need the kubectl cert-manager plugin as well as a local Kind cluster that I created with
./devel/cluster/create.sh && ./devel/addon/certmanager/install.sh
Let's create a simple certificate relying on ACME. The certificate is voluntarily invalid:
kubectl apply -f- <<EOF
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: example-issuer
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: example-issuer-account-key
solvers:
- http01:
ingress:
class: nginx
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
name: example-cert
spec:
secretName: ca-key-pair
dnsNames:
- google.com
issuerRef:
name: example-issuer
kind: Issuer
group: cert-manager.io
EOF
Let's see the certificate & certificaterequest:
% kubectl cert-manager status certificate example-cert
Name: example-cert
Namespace: default
Created at: 2020-11-05T15:59:26+01:00
Conditions:
Issuing: False, Reason: Failed, Message: The certificate request has failed to complete and will be retried: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Ready: False, Reason: DoesNotExist, Message: Issuing certificate as Secret does not exist
DNS Names:
- google.com
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Issuing 23s cert-manager Issuing certificate as Secret does not exist
Normal Generated 22s cert-manager Stored new private key in temporary Secret resource "example-cert-fp76g"
Normal Requested 22s cert-manager Created new CertificateRequest resource "example-cert-6n2r5"
Warning Failed 20s cert-manager The certificate request has failed to complete and will be retried: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Issuer:
Name: example-issuer
Kind: Issuer
Conditions:
Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server
Events: <none>
error when finding Secret "ca-key-pair": secrets "ca-key-pair" not found
Not Before: <none>
Not After: <none>
Renewal Time: <none>
CertificateRequest:
Name: example-cert-6n2r5
Namespace: default
Conditions:
Ready: False, Reason: Failed, Message: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IssuerNotReady 22s cert-manager Referenced issuer does not have a Ready status condition
Warning OrderFailed 20s cert-manager Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Order:
Name: example-cert-6n2r5-2320721609
State: errored, Reason: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
No Authorizations for this Order
FailureTime: 2020-11-05T15:59:29+01:00
No Challenges found for this Certificate
Now, let's use a valid dns name:
kubectl patch certificate example-cert -p '{"spec": {"dnsNames": ["example.com"]}}' --type=merge
Now, we can see that the CertificateRequest has not been updated:
Name: example-cert
Namespace: default
Created at: 2020-11-05T15:59:26+01:00
Conditions:
Issuing: False, Reason: Failed, Message: The certificate request has failed to complete and will be retried: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Ready: False, Reason: DoesNotExist, Message: Issuing certificate as Secret does not exist
DNS Names:
- example.com
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Issuing 114s cert-manager Issuing certificate as Secret does not exist
Normal Generated 113s cert-manager Stored new private key in temporary Secret resource "example-cert-fp76g"
Normal Requested 113s cert-manager Created new CertificateRequest resource "example-cert-6n2r5"
Warning Failed 111s cert-manager The certificate request has failed to complete and will be retried: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Issuer:
Name: example-issuer
Kind: Issuer
Conditions:
Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server
Events: <none>
error when finding Secret "ca-key-pair": secrets "ca-key-pair" not found
Not Before: <none>
Not After: <none>
Renewal Time: <none>
CertificateRequest:
Name: example-cert-6n2r5
Namespace: default
Conditions:
Ready: False, Reason: Failed, Message: Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IssuerNotReady 113s cert-manager Referenced issuer does not have a Ready status condition
Warning OrderFailed 111s cert-manager Failed to wait for order resource "example-cert-6n2r5-2320721609" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
Order:
Name: example-cert-6n2r5-2320721609
State: errored, Reason: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
No Authorizations for this Order
FailureTime: 2020-11-05T15:59:29+01:00
No Challenges found for this Certificate
The Challenge gets stuck on the error:
Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
After updating the failing Certificate with a valid DNS name, the CertificateRequest, Order and Challenge don't get updated or recreated (that's the bug).
Then I tried to write an end-to-end test that would do the same as above. I opened https://github.com/jetstack/cert-manager/pull/3438 to give us a place where to discuss.
E2e test: instructions and results
git fetch origin refs/pull/3438/head:pr3438 && git checkout pr3438
./devel/cluster/create.sh && ./devel/setup-e2e-deps.sh && ./devel/addon/certmanager/install.sh
# Finally, run the test.
./devel/run-e2e.sh -ginkgo.focus "ACME Certificate.*should allow updating the dns name of a failing certificate that had a wrong dns name"
Before the edit to the certificate is made, the state is:
% kubectl cert-manager status certificate test-acme-certificate
Name: test-acme-certificate
Namespace: e2e-tests-create-acme-certificate-http01-hdb4q
Created at: 2020-11-05T13:40:50+01:00
Conditions:
Issuing: True, Reason: DoesNotExist, Message: Issuing certificate as Secret does not exist
Ready: False, Reason: DoesNotExist, Message: Issuing certificate as Secret does not exist
DNS Names:
- google.com
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Issuing 59s cert-manager Issuing certificate as Secret does not exist
Normal Generated 59s cert-manager Stored new private key in temporary Secret resource "test-acme-certificate-zhjgt"
Normal Requested 58s cert-manager Created new CertificateRequest resource "test-acme-certificate-x4rfj"
Issuer:
Name: test-acme-issuer
Kind: Issuer
Conditions:
Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server
Events: <none>
error when finding Secret "test-acme-certificate": secrets "test-acme-certificate" not found
Not Before: <none>
Not After: <none>
Renewal Time: <none>
CertificateRequest:
Name: test-acme-certificate-x4rfj
Namespace: e2e-tests-create-acme-certificate-http01-hdb4q
Conditions:
Ready: False, Reason: Pending, Message: Waiting on certificate issuance from order e2e-tests-create-acme-certificate-http01-hdb4q/test-acme-certificate
-x4rfj-463636460: "pending"
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal OrderCreated 58s cert-manager Created Order resource e2e-tests-create-acme-certificate-http01-hdb4q/test-acme-certificate-x4rfj-463636460
Normal OrderPending 58s cert-manager Waiting on certificate issuance from order e2e-tests-create-acme-certificate-http01-hdb4q/test-acme-certifica
te-x4rfj-463636460: ""
Order:
Name: test-acme-certificate-x4rfj-463636460
State: pending, Reason:
Authorizations:
URL: https://pebble.pebble.svc.cluster.local/authZ/sbPj5gxbvQRbL4Mk7ZRHIPXSqUpr0eNRTJYBXun9Uxs, Identifier: google.com, Initial State: pending, Wildcar
d: false
Challenges:
- Name: test-acme-certificate-x4rfj-463636460-500193737, Type: HTTP-01, Token: BqnSy9H-5-qDZqqV3ZR-REMjRuacqNXE0OVyaAG0JvA, Key: BqnSy9H-5-qDZqqV3ZR-REMjRu
acqNXE0OVyaAG0JvA.MmxIwKeh1yZc4np-i4Yh7oUgrHTdkR2weKLRGszl_NE, State: pending, Reason: Waiting for HTTP-01 challenge propagation: wrong status code '404',
expected '200', Processing: true, Presented: true
The state of the Certificate at the end of the test is:
% kubectl cert-manager status certificate test-acme-certificate
Name: test-acme-certificate
Namespace: e2e-tests-create-acme-certificate-http01-hdb4q
Created at: 2020-11-05T13:40:50+01:00
Conditions:
Ready: True, Reason: Ready, Message: Certificate is up to date and has not expired
DNS Names:
- aeuhg.ycqov.ingress-nginx.http01.example.com
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Issuing 118s cert-manager Issuing certificate as Secret does not exist
Normal Generated 118s cert-manager Stored new private key in temporary Secret resource "test-acme-certificate-zhjgt"
Normal Requested 117s cert-manager Created new CertificateRequest resource "test-acme-certificate-x4rfj"
Normal Requested 57s cert-manager Created new CertificateRequest resource "test-acme-certificate-bqkzl"
Normal Issuing 31s cert-manager The certificate has been successfully issued
Issuer:
Name: test-acme-issuer
Kind: Issuer
Conditions:
Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server
Events: <none>
Secret:
Name: test-acme-certificate
Issuer Country:
Issuer Organisation:
Issuer Common Name: Pebble Intermediate CA 75e474
Key Usage: Digital Signature, Key Encipherment
Extended Key Usages: Server Authentication, Client Authentication
Public Key Algorithm: RSA
Signature Algorithm: SHA256-RSA
Subject Key ID: 48a0765d9bf9f7ab8d91ea6fe54545ad8ca4b3f3
Authority Key ID: b624ab4bee93602c8eff49e8d8e0042ed27a71aa
Serial Number: 4f987afe3c2a013f
Events: <none>
Not Before: 2020-11-05T13:42:17+01:00
Not After: 2025-11-05T13:42:17+01:00
Renewal Time: 2025-10-06T14:42:17+02:00
No CertificateRequest found for this Certificate
Unfortunately, I wasn't able to reproduce what happens in the manual test. I ran the e2e test and ran kubectl cert-manager status certificate every two seconds to see how the certificate evolves:
The main difference between the manual and e2e test is the state of the Challenge right before the update. The manual test shows this:
# manual test (right before the update)
Order:
Name: example-cert-6n2r5-2320721609
State: errored, Reason: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "google.com": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy
No Authorizations for this Order
FailureTime: 2020-11-05T15:59:29+01:00
No Challenges found for this Certificate
(no new CertificateRequest is created after the Certificate edit).
On the other side, the e2e test shows this:
# e2e test (right before the update)
Order:
Name: test-acme-certificate-x4rfj-463636460
State: pending, Reason:
Authorizations:
URL: https://pebble.pebble.svc.cluster.local/authZ/sbPj5gxbvQRbL4Mk7ZRHIPXSqUpr0eNRTJYBXun9Uxs, Identifier: google.com, Initial State: pending, Wildcard: false
Challenges:
- Name: test-acme-certificate-x4rfj-463636460-500193737, Type: HTTP-01, Token: BqnSy9H-5-qDZqqV3ZR-REMjRuacqNXE0OVyaAG0JvA, Key: BqnSy9H-5-qDZqqV3ZR-REMjRu
acqNXE0OVyaAG0JvA.MmxIwKeh1yZc4np-i4Yh7oUgrHTdkR2weKLRGszl_NE, State: pending,
Reason: Waiting for HTTP-01 challenge propagation: wrong status code '404', expected '200',
Processing: true, Presented: true
(and as soon as the Certificate gets edited, a new CertificateRequest is created).
I am stuck 😞
I will continue tomorrow
Digging briefly into Pebble, the Let's Encrypt test server used during e2e tests, it seems like it does not support configuring a list of domains that should return this 400 error for.
I actually believe that this may be (almost) working as intended.. this was a failed order because the request submitted to the ACME server was deemed invalid, which will trigger our regular 'back off' behaviour (which is to wait 1h before retrying again).
I think the bug here may therefore be in the logic that determines whether to back off or retry immediately. The logic for that is here: https://github.com/jetstack/cert-manager/blob/87989dbfe35bed99a9e031c71ad3a7d49030a8bf/pkg/controller/certificates/trigger/trigger_controller.go#L156-L166
I think we need to adapt this area of code to trigger an issuance immediately iff the spec of the Certificate does not match the previously failed CertificateRequest (i.e. the user has edited their Certificate resource).
This is definitely a regression, and is something we used to handle properly. There is a function you can use to determine whether a 'request matches the certificate spec' here: https://github.com/jetstack/cert-manager/blob/87989dbfe35bed99a9e031c71ad3a7d49030a8bf/pkg/controller/certificates/util.go#L97
Thanks, you are right! After reducing the 1 hour re-issuance delay to 1 second, the issue (seemingly) disappears:
After a conversation with James on #cert-manager-dev, the plan is now:
LastFailureTime = true. I'm not sure how to check that a new item was added to the schedule queue thoughinput is available and make sure input.Request isn't nil; that would look likediff
-if crt.Status.LastFailureTime != nil {
+if crt.Status.LastFailureTime != nil && certificates.RequestMatchesSpec(crt, input.Request) {
now := c.clock.Now()
retryAfter := crt.Status.LastFailureTime.Add(retryAfterLastFailure)
if now.Before(retryAfter) {
log.V(logf.InfoLevel).Info("Not re-issuing certificate as an attempt has been made in the last hour", "retry_after", retryAfter)
c.scheduleRecheckOfCertificateIfRequired(log, key, retryAfter.Sub(now))
return nil
}
}
I'll open a PR with these changes
Most helpful comment
I have seen this behavior in the controller that creates the orders, it doesn't properly check those. We should improve this logic.
/milestone v1.1
/area acme
/priority important-soon