Cert-manager: Propagation check failed

Created on 1 Oct 2020 · 10Comments · Source: jetstack/cert-manager

Describe the bug:
I'm getting this error when cert-manager is trying to generate a new certificate. I already deleted all the resources (https-solver od, service, ingress, etc) to restart all the process as clean as possible but I still stuck on this error. It doesn't happen all the time, and with all the certificates that I want luckily, but when it stucks looks like forever.

Expected behaviour:
LOGS FROM CERT_MANAGER POD

I1001 16:34:54.331360       1 pod.go:58] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="mirrors.domain.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-gff76" "related_resource_namespace"="filestash" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="certificate-filestash-4xtbq-1190718029-4252496785" "resource_namespace"="filestash" "resource_version"="v1" "type"="HTTP-01" 
I1001 16:34:54.332017       1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="mirrors.domain.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-nrzmt" "related_resource_namespace"="filestash" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="certificate-filestash-4xtbq-1190718029-4252496785" "resource_namespace"="filestash" "resource_version"="v1" "type"="HTTP-01" 
I1001 16:34:54.332961       1 ingress.go:91] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="mirrors.domain.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-sjnvh" "related_resource_namespace"="filestash" "related_resource_version"="v1beta1" "resource_kind"="Challenge" "resource_name"="certificate-filestash-4xtbq-1190718029-4252496785" "resource_namespace"="filestash" "resource_version"="v1" "type"="HTTP-01" 
E1001 16:34:54.347242       1 sync.go:183] cert-manager/controller/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://mirrors.domain.com/.well-known/acme-challenge/C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo': Get \"http://mirrors.domain.com/.well-known/acme-challenge/C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo\": EOF" "dnsName"="mirrors.domain.com" "resource_kind"="Challenge" "resource_name"="certificate-filestash-4xtbq-1190718029-4252496785" "resource_namespace"="filestash" "resource_version"="v1" "type"="HTTP-01"

LOGS FROM CM-ACME-HTTP-SOLVER POD:

I1001 16:31:27.014589       1 solver.go:39] cert-manager/acmesolver "msg"="starting listener"  "expected_domain"="mirrors.domain.com" "expected_key"="C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo.ssKMNkALA906yKgEEcMpgeGO9pfTcXhBqtDmN7VLpTo" "expected_token"="C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo" "listen_port"=8089

Steps to reproduce the bug:
Steps to reproduce the bug should be clear and easily reproducible to help people
gain an understanding of the problem.

Anything else we need to know?:
When I go to on Chrome: http://mirrors.domain.com/.well-known/acme-challenge/C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo, I get the rigth response I guess: C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo.ssKMNkALA906yKgEEcMpgeGO9pfTcXhBqtDmN7VLpTo

Environment details::

Kubernetes version
Client Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.8-eks-cd3eb0", GitCommit:"cd3eb06a896644a770b14acf6b3123b417fa50e9", GitTreeState:"clean", BuildDate:"2019-07-31T00:59:16Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-20T12:43:34Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Cloud-provider/provisioner (e.g. GKE, kops AWS, etc):
scaleway
cert-manager version (e.g. v0.4.0):
quay.io/jetstack/cert-manager-controller:v1.0.2
Install method (e.g. helm or static manifests):
static manifests

/kind bug

triagsupport

Source

Serrvosky

Most helpful comment

Hi, @meyskens. Finally, I found the problem. Once my nginx is served using a load balancer and with proxy protocol enabled when the cert-manager is trying to check for the HTTP challenge, it cannot reach the specified endpoint. For now, to resolve this problem I disable Nginx proxy protocol.

I found this problem, when I perform a curl from a pod inside the cluster using the flag --haproxy-protocol, and got success.

I don't know if it is possible, but can you implement this on cert-manager side? I don't know how, but something like, perform the HTTP get request to ./well-kwon/... endpoint with the proxy-protocol after some failed tentatives?

I hope that this explanation helps.

Serrvosky on 6 Oct 2020

❤2

All 10 comments

An update for this issue. I think I found the problem however I don't know how to resolve it. If I perform a GET using curl to http://mirrors.domain.com/.well-known/acme-challenge/C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo I have an answer, but when I perform the same command from a pod inside my cluster I get this response curl: (52) Empty reply from server, that's why I'm getting an EOF error, but I don't understand how I can resolve it.

Does anyone have this error too?

Serrvosky on 2 Oct 2020

What is your ingress setup? Are you by any chance having a firewall or NAT router somewhere?

meyskens on 2 Oct 2020

/triage support
/remove-kind bug

meyskens on 2 Oct 2020

Hi @meyskens.

There are no firewall or NAT.

About my ingress setup, this was my implementation:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    certmanager.k8s.io/acme-challenge-type: http01
    #nginx.ingress.kubernetes.io/auth-url: "https://$host/oauth2/auth"
    #nginx.ingress.kubernetes.io/auth-signin: "https://$host/oauth2/start?rd=$escaped_request_uri"
  name: filestash
  namespace: filestash
spec:
  rules:
  - host: mirrors.domain.com
    http:
      paths:
      - backend:
          serviceName: filestash
          servicePort: 8334
        path: /
  tls:
  - hosts:
    - mirrors.domain.com
    secretName: certificate-filestash
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: oauth2-proxy
  namespace: filestash
spec:
  rules:
  - host: mirrors.domain.com
    http:
      paths:
      - backend:
          serviceName: oauth2-proxy
          servicePort: 4180
        path: /oauth2
  tls:
  - hosts:
    - mirrors.domain.com
    secretName: certificate-filestash

Serrvosky on 6 Oct 2020

Hello again @meyskens. I was searching about this issue and I found this link: https://www.digitalocean.com/community/questions/how-to-support-internal-traffic-with-proxy-protocol-enabled-on-a-kubernetes-loadbalancer .

I think that this is the problem, however, I haven't passed it yet.

Serrvosky on 6 Oct 2020

I found this problem, when I perform a curl from a pod inside the cluster using the flag --haproxy-protocol, and got success.

I hope that this explanation helps.

Serrvosky on 6 Oct 2020

❤2

This is going to be fixed upstream in https://github.com/kubernetes/enhancements/pull/1392

meyskens on 8 Oct 2020

/close

closing this issue for now as the issue is solved. A discussion on this topic was in https://github.com/jetstack/cert-manager/issues/466

meyskens on 8 Oct 2020

@meyskens: Closing this issue.

In response to this:

/close

closing this issue for now as the issue is solved. A discussion on this topic was in https://github.com/jetstack/cert-manager/issues/466

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jetstack-bot on 8 Oct 2020

Hi @Serrvosky, I've just published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy

It accomplishes basically what you described: adding a PROXY line to requests originating from within the cluster. This allows cert-manager's self-check to pass.