Cert-manager: Challenge fails: Waiting for http-01 challenge propagation: wrong status code '404', expected '200'

Created on 18 Mar 2020 · 12Comments · Source: jetstack/cert-manager

I am setting up a new kubernetes production cluster, that has an nginx-ingress-controller to handle incoming request to my cluster. Few months ago i was able to successfully setup an staging cluster, and is still working today. So the same setup and steps that i did couple of months ago, do not work today.

The issue that i am having is that every time i am trying to setup the certificate to enable secure communications, i am continuously getting a 404. I deleted the cluster, and started from scratch couple of times having the same results.

Expected behaviour:
Ability to acquire a valid certificate to be able to have HTTPS communications

Steps to reproduce the bug:
I am using google cloud to host the clusters. In a new cluster, i setup first the ingress controller
by executing the following helm command

helm install mozart --set controller.image.tag=1.6.0 --set controller.healthStatus=true --set controller.name="prod-ingress-controller" --set controller.service.name="prod-ingress-controller" --set controller.service.loadBalancerIP="<A RESERVED PUBLIC IP>" --set controller.prometheus.create=true nginx-stable/nginx-ingress

NOTE: I am using image 1.6.0, since this was the image i have in the staging cluster(where everything is working)

Then i link my public IP with my domain, so i am able to check the status of my current cluster by typing in the browser:
http://<RESERVED_IP>/nginx-health
or
http://<DOMAIN>/nginx-health

Next i deploy my service with helm, i have a deployment script, ingress, file, service and secrets
the ingress looks something like this:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: {{ .Values.serviceName }}-ingress
  annotations:
    # All of our ingress files need to have class 'nginx' defined, in order to enable the 
    # ingress controller collect all the rules we define
    kubernetes.io/ingress.class: nginx
    # We make a reference to the name of our global ip reserved in GCP
    # To see the names execute in your terminal: gcloud compute addresses list
    kubernetes.io/ingress.global-static-ip-name: <PUBLIC IP NAME IN GCP>
    # With this directive, we ignore the path and redirect to / of our service
    nginx.org/rewrites: "serviceName={{ .Values.serviceName }} rewrite=/"
    ############################# SSL SECTION ##########################################
    # # Forces all communication to be encrypted 
    # nginx.ingress.kubernetes.io/ssl-redirect: "true"
    ############################# SSL SECTION ##########################################
spec:
############################# SSL SECTION ##########################################
  # tls:
  # - hosts:
  #  - {{ .Values.clusterHost }}
  #  secretName: {{ .Values.certificate.secretName }}
############################# SSL SECTION ##########################################
  rules:
  - host: {{ .Values.clusterHost }} 
    http:
      paths:
      - path: /{{ .Values.serviceName }}/
        backend:
          serviceName: {{ .Values.serviceName }}
          servicePort: 80

After this point i am able to successfully access my service with http, i get a success response from the health endpoint.

Next, i set up cert-manager:

helm install \                                                                                                                ─╯
  cert-manager \
  --namespace cert-manager \
  --version v0.12.0 \
  jetstack/cert-manager

Again, i use version 0.12.0, since this is the working version i have in my staging cluster.

Next, i create the issuer, the yaml file looks like this:

apiVersion: cert-manager.io/v1alpha2
kind: issuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: <MY EMAIL>
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod
    # Enable the HTTP-01 challenge provider
    solvers:
    - http01:
        ingress:
          class: nginx

And finally i add a certificate:

apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: <NAME>
  namespace: default
spec:
  secretName: <TLS SECRET NAME>-tls
  issuerRef:
    name: letsencrypt-prod
    kind: Issuer
  commonName: <MY DOMAIN>
  dnsNames:
    - <MY DOMAIN>
  acme:
    config:
      - http01:
          ingress: <MY SERVICE NAME>-ingress
        domains:
          - <MY DOMAIN>

When i do, helm upgrate with the certificate, In my cluster the service and pod are being generated correctly, however the pod is always responding with 404. I checked the nginx-ingress-controller, to check out the ingress rules to verify that it was generated correctly, and all looks good.

When i do a kubectl get all i see:

NAME                                          READY   STATUS    RESTARTS   AGE
pod/auth-service-5cf45574c-8kl7x              1/1     Running   0          87m
pod/cm-acme-http-solver-2z9g9                 1/1     Running   0          84m
pod/prod-ingress-controller-c4bd54c69-kp8s4   1/1     Running   0          7h39m

If i do a kubectl port-forward cm-acme-http-solver-2z9g9 8089:8089 and then i do a curl with the expected route, i was expecting to get a 200, at least this will indicate that there is a problem in my routing, however, the curl returns a 404. this is the curl:

curl http://0.0.0.0:8089/.well-known/acme-challenge/WUdxZc1w7XVGC3h1ayDYmj4OhzeuOz1qGv0rmEBRTG8

I read in some other tickets that sometimes i needed to add an extra dash (/) in the url, however this still returns a 404.

Any help will be highly appreciate it! I try many things with no success, and i am not sure if this is a missconfiguration or a bug on the cm-acme-http-solver image

Anything else we need to know?:
I tried with cert-manager versions 0.13.0 and 0.14.0 and i had the same results, also i tried with the latest version of nginx-ingress-controller, no success

Environment details::

Kubernetes version: 1.14.10-gke.17
Cloud-provider/provisioner: GKE
cert-manager version: v0.12.0
nginx-ingress-controller version: 1.6.0
Install method: Helm

/kind bug

triagsupport

Source

graned

👍1

Most helpful comment

I also get empty tls.crt anc ca.crt when i run for the first time the certificate:

Name:         <TLS NAME>
Namespace:    default
Labels:       <none>
Annotations:  cert-manager.io/certificate-name: <CERTIFICATE NAME>
              cert-manager.io/issuer-kind: Issuer
              cert-manager.io/issuer-name: letsencrypt-test

Type:  kubernetes.io/tls

Data
====
ca.crt:   0 bytes
tls.crt:  0 bytes
tls.key:  1675 bytes

graned on 19 Mar 2020

👍2

All 12 comments

I am facing the same issue #2712. In the first runs the tls.crt was empty now I am having the certificate being written twice in tls.crt:

k describe  secret My-DNS-tls
Name:         My-DNS-tls
Namespace:    default
Labels:       <none>
Annotations:  cert-manager.io/alt-names: My-DNS
              cert-manager.io/certificate-name: My-DNS-tls
              cert-manager.io/common-name: My-DNS
              cert-manager.io/ip-sans:
              cert-manager.io/issuer-kind: ClusterIssuer
              cert-manager.io/issuer-name: letsencrypt-prod
              cert-manager.io/uri-sans:

Type:  kubernetes.io/tls

Data
====
ca.crt:   0 bytes
tls.crt:  3566 bytes
tls.key:  1679 bytes

altreze on 18 Mar 2020

I also get empty tls.crt anc ca.crt when i run for the first time the certificate:

Name:         <TLS NAME>
Namespace:    default
Labels:       <none>
Annotations:  cert-manager.io/certificate-name: <CERTIFICATE NAME>
              cert-manager.io/issuer-kind: Issuer
              cert-manager.io/issuer-name: letsencrypt-test

Type:  kubernetes.io/tls

Data
====
ca.crt:   0 bytes
tls.crt:  0 bytes
tls.key:  1675 bytes

graned on 19 Mar 2020

👍2

fun fact, i found a workaround, for my case, if i do helm uninstall <release name> <path to my helm files> and uninstall everything EXCEPT the certificate, for whatever reason, my certificate gets "unstock" and gets validated

graned on 19 Mar 2020

It should not be this way, executing this workaround for every certificate request is not feasible.

altreze on 19 Mar 2020

👍1

agree, this is not the correct way to do this. Just wanted to mention in case this helps on fixing this issue.

graned on 19 Mar 2020

Any updates on this? I am having the same problem.

HazemElAgaty on 22 Mar 2020

For what it's worth, I've worked around this problem by using the bitnami helm chart (bitnami/nginx-ingress-controller) instead of the nginx-stable/nginx-ingress:

helm install nginx bitnami/nginx-ingress-controller

ComeMaes on 22 Mar 2020

👍1

@HazemElAgaty you can try the work around i mention, even though is not the optimal solution, is a workaround, an ugly one, but a workaround. Or you can try the solution provided by @ComeMaes i have not tried it myself, will give it a try this week and will report back if this worked for me.

graned on 24 Mar 2020

I was able to find a new solution which is better than the workaround that i found previously, i decided to use Mergeable Ingress Types, having a master ingress file, helped me on the certificate generation. Also this strategy perfectly fits my current setup. So my issue was solved by adjusting my current ingress setup.

graned on 29 Mar 2020

@graned , with mergeable ingress only a single and unique ingress resource could be used.
The question remains where did that breaks in code as it was a working feature.

altreze on 30 Mar 2020

I'm in the same boat...

carvendy on 4 Apr 2020

Given you're using the nginxinc version of the NGINX ingress controller, I believe you're actually running into this (resolved!) issue: https://github.com/jetstack/cert-manager/issues/2517#issuecomment-618525387

In short, you need to denote ingresses as 'minions' of other ingresses, or otherwise, you can probably use the acme.cert-manager.io/http01-edit-in-place: "true" annotation on your Ingress resource to avoid a second ingress being created altogether.

Hopefully that helps/the other issue linked here provides you the help you need. Please re-open and provide more details after digging in if not!

/triage support
/remove-kind bug