Describe the bug:
I'm getting this error when cert-manager is trying to generate a new certificate. I already deleted all the resources (https-solver od, service, ingress, etc) to restart all the process as clean as possible but I still stuck on this error. It doesn't happen all the time, and with all the certificates that I want luckily, but when it stucks looks like forever.
Expected behaviour:
LOGS FROM CERT_MANAGER POD
I1001 16:34:54.331360 1 pod.go:58] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="mirrors.domain.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-gff76" "related_resource_namespace"="filestash" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="certificate-filestash-4xtbq-1190718029-4252496785" "resource_namespace"="filestash" "resource_version"="v1" "type"="HTTP-01"
I1001 16:34:54.332017 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="mirrors.domain.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-nrzmt" "related_resource_namespace"="filestash" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="certificate-filestash-4xtbq-1190718029-4252496785" "resource_namespace"="filestash" "resource_version"="v1" "type"="HTTP-01"
I1001 16:34:54.332961 1 ingress.go:91] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="mirrors.domain.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-sjnvh" "related_resource_namespace"="filestash" "related_resource_version"="v1beta1" "resource_kind"="Challenge" "resource_name"="certificate-filestash-4xtbq-1190718029-4252496785" "resource_namespace"="filestash" "resource_version"="v1" "type"="HTTP-01"
E1001 16:34:54.347242 1 sync.go:183] cert-manager/controller/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://mirrors.domain.com/.well-known/acme-challenge/C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo': Get \"http://mirrors.domain.com/.well-known/acme-challenge/C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo\": EOF" "dnsName"="mirrors.domain.com" "resource_kind"="Challenge" "resource_name"="certificate-filestash-4xtbq-1190718029-4252496785" "resource_namespace"="filestash" "resource_version"="v1" "type"="HTTP-01"
LOGS FROM CM-ACME-HTTP-SOLVER POD:
I1001 16:31:27.014589 1 solver.go:39] cert-manager/acmesolver "msg"="starting listener" "expected_domain"="mirrors.domain.com" "expected_key"="C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo.ssKMNkALA906yKgEEcMpgeGO9pfTcXhBqtDmN7VLpTo" "expected_token"="C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo" "listen_port"=8089
Steps to reproduce the bug:
Steps to reproduce the bug should be clear and easily reproducible to help people
gain an understanding of the problem.
Anything else we need to know?:
When I go to on Chrome: http://mirrors.domain.com/.well-known/acme-challenge/C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo, I get the rigth response I guess: C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo.ssKMNkALA906yKgEEcMpgeGO9pfTcXhBqtDmN7VLpTo
Environment details::
Cloud-provider/provisioner (e.g. GKE, kops AWS, etc):
scaleway
cert-manager version (e.g. v0.4.0):
quay.io/jetstack/cert-manager-controller:v1.0.2
Install method (e.g. helm or static manifests):
static manifests
/kind bug
An update for this issue. I think I found the problem however I don't know how to resolve it. If I perform a GET using curl to http://mirrors.domain.com/.well-known/acme-challenge/C1Gid2CFSA-wlCd2ipUM6D3_UiYoQejYhCVja5AFfNo I have an answer, but when I perform the same command from a pod inside my cluster I get this response curl: (52) Empty reply from server, that's why I'm getting an EOF error, but I don't understand how I can resolve it.
Does anyone have this error too?
What is your ingress setup? Are you by any chance having a firewall or NAT router somewhere?
/triage support
/remove-kind bug
Hi @meyskens.
There are no firewall or NAT.
About my ingress setup, this was my implementation:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: "nginx"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
certmanager.k8s.io/acme-challenge-type: http01
#nginx.ingress.kubernetes.io/auth-url: "https://$host/oauth2/auth"
#nginx.ingress.kubernetes.io/auth-signin: "https://$host/oauth2/start?rd=$escaped_request_uri"
name: filestash
namespace: filestash
spec:
rules:
- host: mirrors.domain.com
http:
paths:
- backend:
serviceName: filestash
servicePort: 8334
path: /
tls:
- hosts:
- mirrors.domain.com
secretName: certificate-filestash
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: oauth2-proxy
namespace: filestash
spec:
rules:
- host: mirrors.domain.com
http:
paths:
- backend:
serviceName: oauth2-proxy
servicePort: 4180
path: /oauth2
tls:
- hosts:
- mirrors.domain.com
secretName: certificate-filestash
Hello again @meyskens. I was searching about this issue and I found this link: https://www.digitalocean.com/community/questions/how-to-support-internal-traffic-with-proxy-protocol-enabled-on-a-kubernetes-loadbalancer .
I think that this is the problem, however, I haven't passed it yet.
Hi, @meyskens. Finally, I found the problem. Once my nginx is served using a load balancer and with proxy protocol enabled when the cert-manager is trying to check for the HTTP challenge, it cannot reach the specified endpoint. For now, to resolve this problem I disable Nginx proxy protocol.
I found this problem, when I perform a curl from a pod inside the cluster using the flag --haproxy-protocol, and got success.
I don't know if it is possible, but can you implement this on cert-manager side? I don't know how, but something like, perform the HTTP get request to ./well-kwon/... endpoint with the proxy-protocol after some failed tentatives?
I hope that this explanation helps.
This is going to be fixed upstream in https://github.com/kubernetes/enhancements/pull/1392
/close
closing this issue for now as the issue is solved. A discussion on this topic was in https://github.com/jetstack/cert-manager/issues/466
@meyskens: Closing this issue.
In response to this:
/close
closing this issue for now as the issue is solved. A discussion on this topic was in https://github.com/jetstack/cert-manager/issues/466
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi @Serrvosky, I've just published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy
It accomplishes basically what you described: adding a PROXY line to requests originating from within the cluster. This allows cert-manager's self-check to pass.
Most helpful comment
Hi, @meyskens. Finally, I found the problem. Once my nginx is served using a load balancer and with proxy protocol enabled when the cert-manager is trying to check for the HTTP challenge, it cannot reach the specified endpoint. For now, to resolve this problem I disable Nginx proxy protocol.
I found this problem, when I perform a curl from a pod inside the cluster using the flag --haproxy-protocol, and got success.
I don't know if it is possible, but can you implement this on cert-manager side? I don't know how, but something like, perform the HTTP get request to ./well-kwon/... endpoint with the proxy-protocol after some failed tentatives?
I hope that this explanation helps.