Describe the bug
When using a ClusterIssuer configured as follows:
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
annotations:
generation: 3
name: letsencrypt-nginx-fake
resourceVersion: "48740658"
selfLink: /apis/cert-manager.io/v1alpha2/clusterissuers/letsencrypt-nginx-fake
uid: 3d1f9ba7-0aca-11ea-b4dd-42010a940fd5
spec:
acme:
email: [email protected]
privateKeySecretRef:
name: letsencrypt-fake
server: https://acme-staging-v02.api.letsencrypt.org/directory
solvers:
- http01:
ingress:
class: nginx
selector: {}
And attempting to issue a certificate for multiple domains:
Name: cert-bnkto
Namespace: test-0fb55da0
Labels: <none>
Annotations: <none>
API Version: cert-manager.io/v1alpha2
Kind: Certificate
Metadata:
Creation Timestamp: 2019-12-07T21:38:15Z
Generation: 1
Owner References:
API Version: extensions/v1beta1
Block Owner Deletion: true
Controller: true
Kind: Ingress
Name: bnkto
UID: e08c7396-1939-11ea-b4dd-42010a940fd5
Resource Version: 49835716
Self Link: /apis/cert-manager.io/v1alpha2/namespaces/test-0fb55da0/certificates/cert-bnkto
UID: e08eb1fb-1939-11ea-b4dd-42010a940fd5
Spec:
Dns Names:
test-0fb55da0.bnk.to
api.test-0fb55da0.bnk.to
dev.test-0fb55da0.bnk.to
doc.test-0fb55da0.bnk.to
auth.test-0fb55da0.bnk.to
idp.test-0fb55da0.bnk.to
disbursement.test-0fb55da0.bnk.to
fastcheckout.test-0fb55da0.bnk.to
fcoweb.test-0fb55da0.bnk.to
Issuer Ref:
Group: cert-manager.io
Kind: ClusterIssuer
Name: letsencrypt-nginx-fake
Secret Name: cert-bnkto
Status:
Conditions:
Last Transition Time: 2019-12-07T21:38:16Z
Message: Waiting for CertificateRequest "cert-bnkto-953187930" to complete
Reason: InProgress
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal GeneratedKey 13m cert-manager Generated a new private key
Normal Requested 13m cert-manager Created new CertificateRequest resource "cert-bnkto-953187930"
We end up with one domain, in this example doc.test-0fb55da0.bnk.to 's challenge getting stuck.
We suspect this is because the ingress created:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/whitelist-source-range: 0.0.0.0/0,::/0
creationTimestamp: "2019-12-07T21:38:24Z"
generateName: cm-acme-http-solver-
generation: 1
labels:
acme.cert-manager.io/http-domain: "1729431587"
acme.cert-manager.io/http-token: "1130827427"
acme.cert-manager.io/http01-solver: "true"
name: cm-acme-http-solver-c8qbh
namespace: test-0fb55da0
ownerReferences:
- apiVersion: cert-manager.io/v1alpha2
blockOwnerDeletion: true
controller: true
kind: Challenge
name: cert-bnkto-953187930-83349459-3245750807
uid: e3036eab-1939-11ea-b4dd-42010a940fd5
resourceVersion: "49836677"
selfLink: /apis/extensions/v1beta1/namespaces/test-0fb55da0/ingresses/cm-acme-http-solver-c8qbh
uid: e58759cb-1939-11ea-b4dd-42010a940fd5
spec:
rules:
- host: doc.test-0fb55da0.bnk.to
http:
paths:
- backend:
serviceName: cm-acme-http-solver-45jxx
servicePort: 8089
path: /.well-known/acme-challenge/tAME5vpSxIWdHSWe0YkQWvGBg0fmtmP7pe6A_QPHoaI
status:
loadBalancer:
ingress:
- ip: 35.247.188.88
points to the wrong solver service:
apiVersion: v1
kind: Service
metadata:
annotations:
auth.istio.io/8089: NONE
creationTimestamp: "2019-12-07T21:38:28Z"
generateName: cm-acme-http-solver-
labels:
acme.cert-manager.io/http-domain: "1729431587"
acme.cert-manager.io/http-token: "1130827427"
acme.cert-manager.io/http01-solver: "true"
name: cm-acme-http-solver-wt5vj
namespace: test-0fb55da0
ownerReferences:
- apiVersion: cert-manager.io/v1alpha2
blockOwnerDeletion: true
controller: true
kind: Challenge
name: cert-bnkto-953187930-83349459-3245750807
uid: e3036eab-1939-11ea-b4dd-42010a940fd5
resourceVersion: "49836226"
selfLink: /api/v1/namespaces/test-0fb55da0/services/cm-acme-http-solver-wt5vj
uid: e7c4bbb5-1939-11ea-b4dd-42010a940fd5
spec:
clusterIP: 10.0.12.245
externalTrafficPolicy: Cluster
ports:
- name: http
nodePort: 32532
port: 8089
protocol: TCP
targetPort: 8089
selector:
acme.cert-manager.io/http-domain: "1729431587"
acme.cert-manager.io/http-token: "1130827427"
acme.cert-manager.io/http01-solver: "true"
sessionAffinity: None
type: NodePort
status:
loadBalancer: {}
Notice that the ingress points to cm-acme-http-solver-45jxx but the only service active is cm-acme-http-solver-wt5vj.
This results in the certificate never getting issued.
The relevant logs show the following:
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:23.227231 1 pod.go:70] cert-manager/controller/challenges/http01/ensurePod "level"=0 "msg"="creating HTTP01 challenge solver pod" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01"
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:23.673381 1 service.go:55] cert-manager/controller/challenges/http01/ensureService "level"=0 "msg"="creating HTTP01 challenge solver service" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01"
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:24.279974 1 ingress.go:103] cert-manager/controller/challenges/http01/ensureIngress "level"=0 "msg"="creating HTTP01 challenge solver ingress" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01"
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:24.389903 1 service.go:55] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "level"=0 "msg"="creating HTTP01 challenge solver service" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01"
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:25.626365 1 service.go:47] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "level"=0 "msg"="multiple challenge solver services found for challenge. cleaning up all existing services." "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01"
cert-manager-59478b75c9-kqnmf cert-manager E1207 21:38:26.870009 1 sync.go:184] cert-manager/controller/challenges "msg"="propagation check failed" "error"="multiple existing challenge solver services found and cleaned up. retrying challenge sync" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01"
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:27.625216 1 service.go:55] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "level"=0 "msg"="creating HTTP01 challenge solver service" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01"
Notice that a solver pod is created, followed by a service, an ingress, and then a second service. Multiple services are deleted, and cleaned up, followed by yet another service being created. Beyond this point, nothing happens, and the certificate is never issued.
Expected behaviour
Our current hypothesis is that the ingress does not update if the service name changes. We would expect that if the services are all deleted, and a new one created, that the ingress is updated accordingly.
Looking at https://github.com/jetstack/cert-manager/blob/master/pkg/issuer/acme/http/ingress.go#L84 , it seems like the ingress is only updated if it's httpDomainCfg Name is empty, otherwise if the service name changes no updates are made.
Steps to reproduce
This bug is racey and we haven't been able to reproduce it. It only happens if another service is created after the ingress, causing existing service to be cleaned up, and another service to be created. We see this bug occasionly since we run tests which issue certs like this every hour or so. Sometimes we won't see this happen for a day or two, only to see it again.
/kind bug
I think you have to promote your issue on the Slack : https://kubernetes.slack.com/archives/C4NV3DWUC
I also experienced that with the latest v0.12.0.
The only way to fix the race is to run kubectl delete certificate XXXXX and then wait for another ~9 minutes and see if you "won the race" 馃槈
Good luck!
Most helpful comment
I also experienced that with the latest v0.12.0.
The only way to fix the race is to run
kubectl delete certificate XXXXXand then wait for another ~9 minutes and see if you "won the race" 馃槈Good luck!