Describe the bug:
Unable to pass "self check" when Ingress Service is using NodePort and public IP is on HA proxy (tcp mode) outside the Kubernetes cluster. We can simulate the test from cert-manger container (kubectl exec) using curl (fetching /.well-known/...), which is successful. The same applies from outside the cluster.
Logs:
helpers.go:188 Found status change for Certificate "myip-secret" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-29 14:36:25.387757463 +0000 UTC m=+2049.620517469
sync.go:244 Error preparing issuer for certificate pwe/pwe-secret: http-01 self check failed for domain "www.example.com"
controller.go:190 certificates controller: Re-queuing item "default/myip-secret" due to error processing: http-01 self check failed for domain "www.example.com"
We replaced real domain name in this bug report for www.example.com
The cert-manager is working only when public IP is on Kubernetes cluster and Ingress Service is using LoadBalancer method.
Expected behaviour:
self check to pass with NodePort on Ingress Service
Steps to reproduce the bug:
cat <<EOF > /root/nginx-ingress.yaml
---
apiVersion: v1
kind: Service
metadata:
name: nginx-ingress
namespace: nginx-ingress
spec:
externalTrafficPolicy: Local
type: NodePort
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
nodePort: 31080
- port: 443
targetPort: 443
protocol: TCP
name: https
nodePort: 31443
selector:
app: nginx-ingress
EOF
cat <<EOF > /root/letsencrypt-staging.yml
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
# Adjust the name here accordingly
name: letsencrypt-staging
spec:
acme:
# The ACME server URL
server: https://acme-staging-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: [email protected]
# Name of a secret used to store the ACME account private key from step 3
privateKeySecretRef:
name: letsencrypt-staging-private-key
# Enable the HTTP-01 challenge provider
http01: {}
EOF
cat <<EOF > /root/myip-ingress.yml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: myip-ingress
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: "nginx"
certmanager.k8s.io/cluster-issuer: letsencrypt-staging
spec:
tls:
- hosts:
- www.example.com
secretName: myip-secret
rules:
- host: www.example.com
http:
paths:
- path: /
backend:
serviceName: myip-svc
servicePort: 80
EOF
# Nginx ingress
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/ns-and-sa.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/default-server-secret.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/nginx-config.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/rbac/rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/daemon-set/nginx-ingress.yaml
kubectl create -f /root/nginx-ingress.yaml
# CertManager
kubectl create -f https://raw.githubusercontent.com/jetstack/cert-manager/master/contrib/manifests/cert-manager/with-rbac.yaml
kubectl create -f /root/letsencrypt-staging.yml
# MyApp
kubectl run myip --image=cloudnativelabs/whats-my-ip --replicas=1 --port=8080
kubectl expose deployment myip-svc --port=8080 --target-port=8080
kubectl create -f /root/myip-ingress.yml
openssl req -x509 -nodes -days 3650 -newkey rsa:2048 -keyout /root/tls.key -out /root/tls.crt -subj "/CN=www.example.com"
kubectl create secret tls myip-secret --key /root/tls.key --cert /root/tls.crt
Anything else we need to know?:
It is not clear to us, what exactly the self check is expecting to find, because the fetch of /well-known key is successful (confirmed via wireshark), but the self check is running again and again and still failing. Some more details about the reason of fail would be great.
Wireshark captured data - request from Cluster Node to HA proxy:
GET /.well-known/acme-challenge/B2tNUfzfPgK_VOF7AAQEktKaikWxwBQlD0uL77d0N8k HTTP/1.1
Host: pwe.kube.freebox.cz
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip
HTTP/1.1 200 OK
Server: nginx/1.15.2
Date: Wed, 29 Aug 2018 14:42:26 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 87
Connection: keep-alive
B2tNUfzfPgK_VOF7AAQEktKaikWxwBQlD0uL77d0N8k.6RElade5K0jHqS1ysziuv2Gm3_LgD-D9APNRg5k8sak
Environment details::
/kind bug
It happens the same to me. I'm setting a HA cluster and this is blocking us from moving the apps.
We have used the helm package for installing it. Is there any workaround that could help us to continue deploying our infrastructure?
Fixed in my case. The problem was that the nginx configuration in the load balancer was redirecting connections to port 80 to 443.
the same here.
i got a ha cluster with nginx reverse proxy (pointing dns entry on it) and i redirect http/https port on public ips of the kubernetes nodes
Then i have my kubernetes cluster with ingress-nginx controller configured like this:
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx
namespace: ingress-nginx
spec:
type: NodePort
ports:
This way when i use cert-manager to ger my cert, i have always a self check error (by the way, all acme challenge are checked if i do it manually inside and outside the cluster.
if i change my dns entry for one of the kubernetes nodes public ip, all is good and the certificate is issuing (but this is a big SPOF if the node where is the dns entry is going down)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale
The same happens here, but using DNAT on public IP to internal MetalLB load balance configuration.
I found out that the problem was that the cluster wasn't able to resolve the DNS. I solved that and it worked.
Solve this myself too after a long time of messing about. Self-check is kinda tricky on your network confirmation. The certificate-mgr resolver tries to connect to itself to verify LetsEncrypt can access data at .well-known/acme-challenge/. This is often deceptively complicated in many networks. It requires the resolver being able to connect to itself using what would often resolve to a public IP address. Do a wget/curl to the .well-known/acme-challenge to see if it succeeds from the resolver container. In my case, I had to setup hairpin NAT at the router.
Is it a good idea to optionally skip self-check?
I'm going to close this issue out as it seems to be more related to network configuration than anything else. Let's Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges, and exposing your ingress controller to the public internet (either via a LoadBalancer service or a NodePort) is outside the scope of cert-manager itself. We just need port 80 to work 馃槃
Port 80 isn't the issue, that's a given. The IP address is though. All installations behind NAT is likely going to fail without hairpin config. If not allow self-check be disabled, maybe mention it in docs?
Let's Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges
I guess this means "Cloudflare Always Use HTTPS" was causing this for me. Perhaps something about requiring port 80 and HTTP access to the domain here would be good: https://docs.cert-manager.io/en/latest/getting-started/troubleshooting.html
Same issue here. I would like to disable self-check or provide the ip address of the loadbalancer because of hairpinning
The problem is in Kubernetes networking if you use LoadBalancer that is provided by the hosting. I use DigitalOcean. Kubernetes is not routing network through LB public interface so there is no adding PROXY protocol header or SSL if you are setting it outside Kubernetes. I use PROXY protocol and the moment when I enable it and update Nginx to handle it everything works but cert-manager fails as it is trying to connect to public domain name and that fails. It works from my computer as I am outside and LB is adding needed headers, but not from within the cluster.
Cert-manager is not guilty for this, but if we can add some switches where we can instruct validator to add PROXY protocol or to disable validation for that domain it would help a lot.
For curl if I do (from inside the cluster):
curl -I https://myhost.domain.com
it fails.
If I do (from inside the cluster):
curl -I https://myhost.domain.com --haproxy-protocol
it works.
The problem is in Kubernetes networking if you use LoadBalancer that is provided by the hosting. I use DigitalOcean. Kubernetes is not routing network through LB public interface so there is no adding PROXY protocol header or SSL if you are setting it outside Kubernetes. I use PROXY protocol and the moment when I enable it and update Nginx to handle it everything works but cert-manager fails as it is trying to connect to public domain name and that fails. It works from my computer as I am outside and LB is adding needed headers, but not from within the cluster.
Cert-manager is not guilty for this, but if we can add some switches where we can instruct validator to add PROXY protocol or to disable validation for that domain it would help a lot.
For curl if I do (from inside the cluster):
curl -I https://myhost.domain.comit fails.
If I do (from inside the cluster):
curl -I https://myhost.domain.com --haproxy-protocolit works.
I was informed by DigitalOcean team that there is a fix for this behavior. They added an additional annotation to nxinx-ingress controller service that forces Kubernetes to use domain name of public IP instead of IP and that tricks Kubernetes to think that it is not "ours" and routes network around through LB.
https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/examples/README.md#accessing-pods-over-a-managed-load-balancer-from-inside-the-cluster
This is it: (I just added this one)
kind: Service
apiVersion: v1
metadata:
name: nginx-ingress-controller
annotations:
service.beta.kubernetes.io/do-loadbalancer-hostname: "hello.example.com"
@MichaelOrtho Hi, do you know if a similar workaround exists for Scaleway? I am testing their managed Kubernetes and am having the same problem. Thanks
@vitobotta I have found on Scaleway you need to restart coredns and it will usually succeed.
@AlexsJones Not for me. I had to add the annotation below
"service.beta.kubernetes.io/scw-loadbalancer-use-hostname": "true"
... apiVersion: v1 kind: Service metadata: name: nginx-ingress namespace: nginx-ingress spec: externalTrafficPolicy: Local type: NodePort ...
After changing externalTrafficPolicy: Local to externalTrafficPolicy: Cluster, I was able to perform self check.
Reason being, pod with the certificate-issuer wound up on a different node than the load balancer did, so it couldn鈥檛 talk to itself through the ingress.
Hi all, I ran into the same issue. I've recently published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy
It uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager's self-check to pass.
@munnerz I think you misunderstood the problem here. You wrote:
I'm going to close this issue out as it seems to be more related to network configuration than anything else. Let's Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges, and exposing your ingress controller to the public internet (either via a LoadBalancer service or a NodePort) is outside the scope of cert-manager itself. We just need port 80 to work smile
The problem is not that Let's Encrypt can't reach the LoadBalancer... the problem is that certificate manager self-check can't reach it. The connection from LE to the LoadBalancer is fine, due to Destination NAT. The certificate manager inside the cluster how ever tries to resolve the domain name with the external IP and this will fail in DNAT scenarios.
@munnerz there is already a whole project just for fixing this issue. Is there really no option to just disable self-checks?
Here is another possible solution:
You can use coredns for broadcasting wrong DNS records. Just create host aliases for the domains and link them to the internal cluster IPs. Then propagate these host/IP tuples via:
hosts {
fallthrough
}
in your coredns config. This way you can use the internal IP addresses inside of your cluster. You just have to maintain another list (or you might just automate this via a custom operator or script).
In DNAT Scenarios just set externalIP of a ingress Service to your external IP Addresses.
apiVersion: v1
kind: Service
metadata:
name: nginx-ingress-ext
namespace: nginx-ingress
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
- port: 443
targetPort: 443
protocol: TCP
name: https
selector:
app: nginx-ingress-ext
externalIPs:
- 11.22.33.44
kubernetes, configured with iptables, mostly standard setup,
creates iptables rules to redirect cluster internal requests to external ip's to apropriate services.
$ sudo iptables-save | grep 11.22.33.44
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-VMPDTJD5TKOUD6KL
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -m addrtype --dst-type LOCAL -j KUBE-SVC-VMPDTJD5TKOUD6KL
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-SUC36V4R4VKNMIWK
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -m addrtype --dst-type LOCAL -j KUBE-SVC-SUC36V4R4VKNMIWK
Most helpful comment
I was informed by DigitalOcean team that there is a fix for this behavior. They added an additional annotation to nxinx-ingress controller service that forces Kubernetes to use domain name of public IP instead of IP and that tricks Kubernetes to think that it is not "ours" and routes network around through LB.
https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/examples/README.md#accessing-pods-over-a-managed-load-balancer-from-inside-the-cluster
This is it: (I just added this one)