Cert-manager: ACME HTTP01 solver: Ingress is not updated when a service name changes and certificate is never issued

Created on 9 Dec 2019  路  2Comments  路  Source: jetstack/cert-manager

Describe the bug

When using a ClusterIssuer configured as follows:

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  annotations:
  generation: 3
  name: letsencrypt-nginx-fake
  resourceVersion: "48740658"
  selfLink: /apis/cert-manager.io/v1alpha2/clusterissuers/letsencrypt-nginx-fake
  uid: 3d1f9ba7-0aca-11ea-b4dd-42010a940fd5
spec:
  acme:
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-fake
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    solvers:
    - http01:
        ingress:
          class: nginx
      selector: {}

And attempting to issue a certificate for multiple domains:

Name:         cert-bnkto
Namespace:    test-0fb55da0
Labels:       <none>
Annotations:  <none>
API Version:  cert-manager.io/v1alpha2
Kind:         Certificate
Metadata:
  Creation Timestamp:  2019-12-07T21:38:15Z
  Generation:          1
  Owner References:
    API Version:           extensions/v1beta1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Ingress
    Name:                  bnkto
    UID:                   e08c7396-1939-11ea-b4dd-42010a940fd5
  Resource Version:        49835716
  Self Link:               /apis/cert-manager.io/v1alpha2/namespaces/test-0fb55da0/certificates/cert-bnkto
  UID:                     e08eb1fb-1939-11ea-b4dd-42010a940fd5
Spec:
  Dns Names:
    test-0fb55da0.bnk.to
    api.test-0fb55da0.bnk.to
    dev.test-0fb55da0.bnk.to
    doc.test-0fb55da0.bnk.to
    auth.test-0fb55da0.bnk.to
    idp.test-0fb55da0.bnk.to
    disbursement.test-0fb55da0.bnk.to
    fastcheckout.test-0fb55da0.bnk.to
    fcoweb.test-0fb55da0.bnk.to
  Issuer Ref:
    Group:      cert-manager.io
    Kind:       ClusterIssuer
    Name:       letsencrypt-nginx-fake
  Secret Name:  cert-bnkto
Status:
  Conditions:
    Last Transition Time:  2019-12-07T21:38:16Z
    Message:               Waiting for CertificateRequest "cert-bnkto-953187930" to complete
    Reason:                InProgress
    Status:                False
    Type:                  Ready
Events:
  Type    Reason        Age   From          Message
  ----    ------        ----  ----          -------
  Normal  GeneratedKey  13m   cert-manager  Generated a new private key
  Normal  Requested     13m   cert-manager  Created new CertificateRequest resource "cert-bnkto-953187930"

We end up with one domain, in this example doc.test-0fb55da0.bnk.to 's challenge getting stuck.

We suspect this is because the ingress created:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/whitelist-source-range: 0.0.0.0/0,::/0
  creationTimestamp: "2019-12-07T21:38:24Z"
  generateName: cm-acme-http-solver-
  generation: 1
  labels:
    acme.cert-manager.io/http-domain: "1729431587"
    acme.cert-manager.io/http-token: "1130827427"
    acme.cert-manager.io/http01-solver: "true"
  name: cm-acme-http-solver-c8qbh
  namespace: test-0fb55da0
  ownerReferences:
  - apiVersion: cert-manager.io/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: Challenge
    name: cert-bnkto-953187930-83349459-3245750807
    uid: e3036eab-1939-11ea-b4dd-42010a940fd5
  resourceVersion: "49836677"
  selfLink: /apis/extensions/v1beta1/namespaces/test-0fb55da0/ingresses/cm-acme-http-solver-c8qbh
  uid: e58759cb-1939-11ea-b4dd-42010a940fd5
spec:
  rules:
  - host: doc.test-0fb55da0.bnk.to
    http:
      paths:
      - backend:
          serviceName: cm-acme-http-solver-45jxx
          servicePort: 8089
        path: /.well-known/acme-challenge/tAME5vpSxIWdHSWe0YkQWvGBg0fmtmP7pe6A_QPHoaI
status:
  loadBalancer:
    ingress:
    - ip: 35.247.188.88

points to the wrong solver service:

apiVersion: v1
kind: Service
metadata:
  annotations:
    auth.istio.io/8089: NONE
  creationTimestamp: "2019-12-07T21:38:28Z"
  generateName: cm-acme-http-solver-
  labels:
    acme.cert-manager.io/http-domain: "1729431587"
    acme.cert-manager.io/http-token: "1130827427"
    acme.cert-manager.io/http01-solver: "true"
  name: cm-acme-http-solver-wt5vj
  namespace: test-0fb55da0
  ownerReferences:
  - apiVersion: cert-manager.io/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: Challenge
    name: cert-bnkto-953187930-83349459-3245750807
    uid: e3036eab-1939-11ea-b4dd-42010a940fd5
  resourceVersion: "49836226"
  selfLink: /api/v1/namespaces/test-0fb55da0/services/cm-acme-http-solver-wt5vj
  uid: e7c4bbb5-1939-11ea-b4dd-42010a940fd5
spec:
  clusterIP: 10.0.12.245
  externalTrafficPolicy: Cluster
  ports:
  - name: http
    nodePort: 32532
    port: 8089
    protocol: TCP
    targetPort: 8089
  selector:
    acme.cert-manager.io/http-domain: "1729431587"
    acme.cert-manager.io/http-token: "1130827427"
    acme.cert-manager.io/http01-solver: "true"
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

Notice that the ingress points to cm-acme-http-solver-45jxx but the only service active is cm-acme-http-solver-wt5vj.

This results in the certificate never getting issued.

The relevant logs show the following:

cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:23.227231       1 pod.go:70] cert-manager/controller/challenges/http01/ensurePod "level"=0 "msg"="creating HTTP01 challenge solver pod" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01" 
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:23.673381       1 service.go:55] cert-manager/controller/challenges/http01/ensureService "level"=0 "msg"="creating HTTP01 challenge solver service" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01" 
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:24.279974       1 ingress.go:103] cert-manager/controller/challenges/http01/ensureIngress "level"=0 "msg"="creating HTTP01 challenge solver ingress" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01" 
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:24.389903       1 service.go:55] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "level"=0 "msg"="creating HTTP01 challenge solver service" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01" 
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:25.626365       1 service.go:47] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "level"=0 "msg"="multiple challenge solver services found for challenge. cleaning up all existing services." "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01" 
cert-manager-59478b75c9-kqnmf cert-manager E1207 21:38:26.870009       1 sync.go:184] cert-manager/controller/challenges "msg"="propagation check failed" "error"="multiple existing challenge solver services found and cleaned up. retrying challenge sync" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01" 
cert-manager-59478b75c9-kqnmf cert-manager I1207 21:38:27.625216       1 service.go:55] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "level"=0 "msg"="creating HTTP01 challenge solver service" "dnsName"="doc.test-0fb55da0.bnk.to" "resource_kind"="Challenge" "resource_name"="cert-bnkto-953187930-83349459-3245750807" "resource_namespace"="test-0fb55da0" "type"="http-01"

Notice that a solver pod is created, followed by a service, an ingress, and then a second service. Multiple services are deleted, and cleaned up, followed by yet another service being created. Beyond this point, nothing happens, and the certificate is never issued.

Expected behaviour

Our current hypothesis is that the ingress does not update if the service name changes. We would expect that if the services are all deleted, and a new one created, that the ingress is updated accordingly.

Looking at https://github.com/jetstack/cert-manager/blob/master/pkg/issuer/acme/http/ingress.go#L84 , it seems like the ingress is only updated if it's httpDomainCfg Name is empty, otherwise if the service name changes no updates are made.

Steps to reproduce

This bug is racey and we haven't been able to reproduce it. It only happens if another service is created after the ingress, causing existing service to be cleaned up, and another service to be created. We see this bug occasionly since we run tests which issue certs like this every hour or so. Sometimes we won't see this happen for a day or two, only to see it again.

/kind bug

kinbug

Most helpful comment

I also experienced that with the latest v0.12.0.
The only way to fix the race is to run kubectl delete certificate XXXXX and then wait for another ~9 minutes and see if you "won the race" 馃槈
Good luck!

All 2 comments

I think you have to promote your issue on the Slack : https://kubernetes.slack.com/archives/C4NV3DWUC

I also experienced that with the latest v0.12.0.
The only way to fix the race is to run kubectl delete certificate XXXXX and then wait for another ~9 minutes and see if you "won the race" 馃槈
Good luck!

Was this page helpful?
0 / 5 - 0 ratings